Indentation: A Rationale

Updated 2016-11-11

This article sketches a rationale on source code indentation.

Indentation is a never-ending holy war between programmers. The divergence that resulted led to the development of many complex tools (GNU indent, Uncrustify, to name a few) and every decent text editor to implement a complex indentation engine.

Teams have to reach an agreement on indentation. If you happen to be part of several teams, chances are high that you have to work with different conventions.

This pushes editors to implement yet another complex and unreliable feature: indentation guessing.

Versioning and diff tools bring yet more trouble to the case. A simple open/save on a file with a different editor than the original can lead to a huge diff.

Discussions on how to solve this problem have proven to inevitably reach a dead-end. The following sketch of rationale does not claim to solve it either, however I would like to point out how simple measures taken right from the inception of a programming language can help reducing all the hassle the indentation question creates.

Rationale

Main arguments:

Lesser arguments:

The Go example

Let us consider the Go language (first issued in 2010). The official implementation ships with an indentation/formatting tool called gofmt. From the official website:

Gofmt’d code is:

This basically brings an end to any of the aforementioned conflicts on indentation.

gofmt is a tool for both indenting and style. Important things to note however, it will not

However, it will:

Let us analyze these choices.

Now let us come back to our rationale.

A solution

gofmt choices look reasonable in regard to our rationale. But the quality of the style are not worth discussing in face of the invaluable advantages it provides. Its ultimate goal, as stated in the above quote, is not to format code, it is to relieve the programmer from the burden of style and indentation, leaving focus on content.

Being part of the standard distribution helps a great deal to bring an end to the style and indentation wars.

The solution to the style and indentation problem is not lying in a helpless attempt to find the best rules, it lies in providing a standard formatting tool for a language that follows some rational criteria.

If a language has been out for years without a formatting tool, as it is the case for C, it is problably already too late. Current formatting tools for C are fragmented. They run in a significantly different manner than Go: instead of enforcing one common style, they are made configurable to adapt everone’s style. It makes sense since none of these tools are standard, they cannot pretend to impose a style.

Some other language took a different approach by only publishing a rationale on style. While this is a respectable decision, it is not as helpful as providing a standard formatting tool: programmers still have to set up their editor properly in accordance to the rationale, and some are simply not willing to make the effort.

In the next sections, I will argue more on specific indentation choices.

Tabulations

The use of tabs over spaces is a source of flaming debates. However it seems like there is a (not so overwhelming) majority of programmers using tabs only or tabs + spaces for alignment.

Pros:

Cons:

Back to the rationale:

Alignment

Alignment is mostly seen in tables and in function calls:

foo = { bar,
        baz }

foo ( bar,
	  baz )

Pro:

Cons:

Alignment can bring some clarity. As such, our rationale sounds very favourable:

Readability matters. Code is read more than it is written.

However it should be noted that unaligned but indented code is notwithstanding readable:

foo = {
	bar,
	baz
}

And the other parts of the rationale are not as positive:

Code changes should be versioning-friendly.

This is never the case with alignment:

foo = { bar,
	    baz
}


foobar = { bar,
	       baz
}

foo = {
	bar,
	baz
}

Freedom of form for clarity where that matters. Tools should not constrain you.

This is not the case if your editor is not customizable enough to let you have fine-grained control over the alignment rules.

The less bytes the better.

Alignment is always using more bytes than unaligned code.

The indentation engine should be short and simple.

Alignment is obviously harder to implement than no alignment at all; moreover, it can be quite tricky to implement a generic alignment engine that is customizable enough so as not to impede freedom of form.

In practice, I believe that the fanciness alignment provides is not worth its downsides.

Other considerations

Inner alignment

map {
  key       = value1,
  longerkey = value2,
}

Inner alignment is independent of indentation, and should use spaces only. It often increases clarity in long and complex structure. The downside is that as soon as you add an entry that does not fit the alignment, you will have to re-align everything. This is a bit tricky to do automatically with the most advanced editors, while being impossible with all the others.

End-of-line alignment

This is usually restricted to comments. One should never spread end-of-line comments over different indentation levels. In the following example with a tab width of 2,

if (foo) { // Long comment
  bar;     // spread over
}          // Multiple line

switching to a tab width of 4 will break the alignment.

if (foo) { // Long comment
    bar;     // spread over
}          // Multiple line

Tools

The only help you can find for incurable languages like C lies in formatting tools. They will help a team working with a consistently formatted code base.

GNU indent

It uses the GNU style by default, and every option is meant to change its behaviour starting from there. This is hardly transparent or convenient.

GNU indent comes with the possibility to split or join lines. If this option is set, and maximum line-width is set to, let us say 70 characters, it will transform

if (long_condition1 && long_condition2 && long_condition3 && long_condition4) {

to

if (long_condition1
    && long_condition2
    && long_condition3
    && long_condition4) {

If line splitting is off, then it will force joining lines! I.e. it will transform

if (long_condition1
  && long_condition2
  && long_condition3
  && long_condition4) {

to

if (long_condition1 && long_condition2 && long_condition3 && long_condition4) {

and does not leave the formatting to the user. It can be very annoying for long lines.

Astyle

As for GNU indent, Astyle comes with a default formatting (this is not a good thing). Beside it cannot toggle alignment off.

Its options have some yet unseen intricacies:

Using the k&r option may cause problems because of the &. This can be resolved by enclosing the k&r in quotes (e.g. –style=“k&r”) or by using one of the alternates –style=kr or –style=k/r.

The documentation is of debatable quality:

Also known as Kernel Normal Form (KNF) style, this is the style used in the Linux kernel.

KNF is used for the *BSD kernels. (See the external links.)

“One True Brace Style” formatting/indenting uses linux brackets and adds brackets to unbracketed one line conditional statements.

The Linux kernel style already enforces the use of brackets on one-line statements. Note that it is not only restricted to conditional statements. See the Linux kernel coding style.

To put it together, it is maintained with Subversion.

Uncrustify

It is much more complete than its competitors and comes with an option for virtually everything. It will not let you run it without specifying an option file. As such it is totally transparent. Alignment is customizable.

Algorithm

One of our points in the rationale is dedicated to the simplicity of the algorithm. We will investigate a few implementation challenges to help us understand why additional features such as alignment can impede simplicity.

Terms

Edge cases

Before going any further, let us review a few generic edge cases.

Stacked openers and closers

What if several openers appear on the same line? Shall we indent once or stack the indent values? Indenting once looks fine here:

if cond1 { if cond2 {
	// code
}}

but not right there:

if cond1 { if cond2 {
	// code
}
}

Besides it breaks the rule of versioning-friendliness from the rationale:

if cond1 {
	 if cond2 {
		// code
}}

This leads us to the closers: should all closers appearing on the same line unindent once? The choice is this

if cond1 { if cond2 { if cond3 {
			// code
		}}}

versus that:

if cond1 { if cond2 { if cond3 {
			// code
}}}

It is hard to see after 2 indentation levels which statement the closers are actually closing. Lisp hackers will probably not mind.

However, it should be noted that closers should unindent their own line only if they appear first. It is important for the indentation to be meaningful in regard to block nesting. In the following it is clearer that foo() is called inside the condition on cond2:

if cond1 then
	 if cond2
	 then foo() end end

than here:

if cond1 then
	 if cond2
then foo() end end

or there in C:

if (cond1) {
	 if (cond2)
{ foo() } }

Continuation

Line continuation is one of the trickiest part of indentation. It is considered good practice to keep line width within a reasonable range, typically around 80. Some lines will eventually end up being too long to fit within the desired width, and will have to get split.

Determining if a line is continuing or not involves some analysis. A naive approach would be:

Continuing lines should typically be indented one level up.

foo = bar +
	baz

For statement conditions, both the opener and the continuation will increase the indentation level:

if foo &&
		bar
then ...

The double indentation is not really desired. The only reason for considering if as an opener is for cases like this:

if
	foo &&
		bar
then ...

We can easily solve this issue by considering then as an opener and if as a continuation token.

In Go, the curly braces are obvious openers and closers. The if and for keywords should be continuation tokens. In C, the parenthesis adds up to the indentation, so the problem is still not solved:

if (foo &&
		bar) {
}

Workaround suggestions:

Another tricky problem:

a = foo +
	bar(baz,
	barbaz
) +
	foobar

After the function call, the line is not seen as continuing anymore, so indentation gets decreased by 1 level, while being increased by 1 level because of the opener (so no change overall).

It gets even worse when nested:

a = foo +
	bar(baz +
			bazbaz
	) +
	foobar

b = foo +
	bar(baz(
			bazbaz
		)
	) +
	foobar

The solution to this problem involves a more complex continuation algorithm. We need to use a stack to remember the level of indentation of the nested continuations.

The cases not matched by the previous conditions do not change the indentation level.

An example algorithm

The following algorithm yields an indentation engine close in practice to the one found in gofmt. The indentation is computed by looking at current and previous lines only, thus being efficient.

What if previous line indentation is wrong? It does not matter, as indenting the whole file at once will make sure every line is indented properly.

Unmatched means not matched on the same line. Openers match their respective closer forward and vice-versa.

First pass:

On current line:
-1 on every closer until an unmatched opener is met, if first token is a closer.
-1 if first token is a middler.

On previous line:
+ current indentation.
+1 on every unmatched opener.
+1 if first token is a middler.
-1 on every unmatched closer if first token is not a closer.

For this last case, it is worth noting that if the first token is a middler, then the first unmatched closer is compensating the indentation change.

Second pass:

+ continuation indentation.

For the continuation indentation, we use the algorithm from the previous section.

References