A matter of style

Introduction

The open source community does a lot of things right. The internals of a program is one of them. The people who write code do so because they are proud of what they do and want the respect of other people in the community. The beauty of the code is a very important aspect in this acceptance.

The same isn’t necessarily true in the commercial world. Time-scales are much more important than how the guts of a computer system looks and it’s generally not good to be seen spending a bit of time making your code look pretty.

This column is not going to say that style is more important than substance, but that this interest is something that can bring a large increase in productivity.

Indentation

I know people that hate me for this, but something that really bugs me about some code is the indentation style. I can handle a poorly thought-out, hacked algorithm if there’s a good reason (time usually), but I can never see a good excuse for badly formatted code.

Initially I thought I was being anal, just getting very hung-up on a not-very-important detail, but then I started noticing things. Usually the people that produced the worst formatted code also included the largest number of defects and, although they initially appeared to write code more quickly than me, finished late much more frequently.

Why does that matter?

There are two fundamental reasons why the ‘look’ of the code matters:

  1. Comprehension. How quickly can people understand the code?
  2. Structure. Is it obvious how the program is split into units? Is it obvious what the next statement to be executed is?

I guess they are quite similar, but I feel that it’s important to separate them out. Hopefully you’ll see why in a minute.

Let’s talk about comprehension first. Here’s a snippet of code:

if (x < foo (y, z))
   haha = bar[4] + 5;
 else
   {
     while (z)
       {
         haha += foo (z, z);
         z--;
       }
     return ++x + bar ();
   }

If you've read the GNU coding standards, you might recognise this as an example of good formatting. It isn't.

But there are some redeeming factors. Firstly they've used two spaces for each level of indentation and not a tab. Studies have shown that while people prefer (aesthetically speaking) tabs in programs, those same people actually understand the code more effectively with between two and four spaces. Also, the formatting of the mathematical expressions is clear, with good use of whitespace.

Now the bad. My main problem is with the use of the braces. The lesser crime is immediately after the if condition where they haven't used braces at all. That's bad because a novice programmer might add an extra line below the 'haha' assignment. It probably won't be a valid program any more, but it looks okay at first glance. Since maintenance is the largest part of the software life-cycle, anything that makes updating code more difficult is bad news.

Structure

The indentation of the second half is dreadful, though. Why have two levels of indentation to to indicate one block of code? Again, there are studies that show that formatting like this actually reduces the readability of the code.

My preferred method of formatting the same code would be:

if (x < foo (y, z)) {
   haha = bar[4] + 5;
}
else {
   while (z) {
     haha += foo (z, z);
     z--;
   }
   x++;
   return x + bar ();
}

The main difference is the formatting, but I've also altered the 'return' statement at the end. Side-effects (like using '++x' in an expression) seriously reduce readability as it's much more complex to figure out what it's trying to evaluate. And the solution only takes an extra line...

But why is this a better way to format the code? Simple: it shows the structure of the code more effectively. Since there are only three levels of indentation, your brain doesn't have to work as hard to figure out where, for example, the end of the "else" block is. (It's not so easy to see the advantage with ten lines of code. Try to imagine a page full of code with a large number of indentation levels.)

Some languages appear to be more tricky than the C example here, too. Where does the exception block in Java go? Should the procedure definitions in a package header be indented in PL/SQL?

If you don't really understand block-structure in modern programming languages and are just trying to follow the example of fellow developers, these are probably difficult questions. They're not supposed to be.

A computer will understand any syntactically valid program, but not necessarily in the same way that you or I would. This makes it vitally important that everyone involved has a common understanding of how the program is supposed to work. The first steps to doing this are structuring the code sensibly and making it easy for others to comprehend.

The last important comment is about timing: none of this takes any longer than building your code with poor formatting and structure. Why? Well, if you understand how it's structured it'll be more likely to work the first time. And if it doesn't, you should be able to find the buy more quickly because your code is easier to comprehend.

Summary

To summarise, the neatest, best formatted code takes no longer to write and is much easier to maintain than code that has bad 'style' (although it's much less likely to need fixing).

So the next time that you come across some dreadful, untidy code, try to make the person that wrote it start again. If they don't understand how their own code is structured, neither will anyone else.

You'll note that I keep mentioning studies, but don't reference the source. That's because I've read all about it in Code Complete (follow the link to by it from Amazon.com) and not from the horses mouth. I recommend you also read it for the full story.