Clifford on Programming Style

Some discussions with kyrah and Rusty Russell's keynote at Linuxkongress 2004, as well as reading too much ugly code (others' and my own), have inspired me to create this page. These are my personal opinions and thoughts, as well as others' opinions and thoughts that I can agree with.

Some rules can be bent, others can be broken.

This is about programming style and programming languages. It's not about algorithms themselves. Good programming style can't fix a broken algorithm - but the best algorithm doesn't help you at all if it is implemented in bad style.

In fact this document is more about philosophy than actual guidelines for good coding style. Read "Elements of programming style" (by Kernighan and Plauger) if you are looking for guidelines to follow.

/* You are not expected to understand this */

The comments are IMO the most misunderstood coding-style issue. Most people think "the more comments the better". In fact, for me most so-called "well commented" source code is much harder to read than uncommented source code.

Some code explains itself and any additional comment would just make the code ugly and harder to read. The comments should always be one abstraction layer above the code. Let's have a look at the following C99 code snippet:

	// pi is a float, start with 0
        float pi = 0;
	// i is an integer, go from 0 to 10000
        for (int i=0; i<10000; i++) {
		// initialize the floats x an y with random numbers
                float x = (float)rand() / RAND_MAX;
                float y = (float)rand() / RAND_MAX;
		// add .0004 to pi if x*x + y*y is smaller than 1
                if ( x*x + y*y < 1 ) pi += .0004;
        }

The comments above are absolutely useless! Now let's have a 2nd look at the same code fragment with different comments:

	// estimate pi by using a monte carlo algorithm:
	// get 10000 random points in the [0..1] x/y range and use pythagoras to check
	// if they are inside the 1/4 circle. Add 4/10000 for each point within the circle.
        float pi = 0;
        for (int i=0; i<10000; i++) {
                float x = (float)rand() / RAND_MAX;
                float y = (float)rand() / RAND_MAX;
                if ( x*x + y*y < 1 ) pi += .0004;
        }

This is actually one line shorter than the code above - and this time the comments do actually help. So here are the first rules:

For sure that only works if your code does explain itself...

if ( (!!strcmp(input, "no")) != 0 ) printf("You did not type 'no'.\n");

... I will come back to that point soon.

What if you implement your own language or API? It would be stupid to expect other people to know it already. But languages and APIs are to be implemented once and then used often. Re-documenting them whenever using them would be stupid, too.

If Python is executable pseudocode, then perl is executable line noise.

How to write code which explains itself well? Well, you could do it the hard way:

You think I'm kidding? I'm not! Sure - the idea isn't to rewrite your code all the time. The idea is to write it right in the first place - and "right" in this context is code that would be produced by using the procedure described above. If comments are needed to understand the program, something has gone wrong. After your program is self-explanatory you can add comments for the big picture.

OK, I think I'm done with the comments now.

If it still doesn't work, re-write it in assembler. This won't fix the bug,
but it will make sure no one else finds it and makes you look bad.

The good thing about high level languages is that they allow you to choose random names for almost everything. The bad thing about high level languages is that they allow you to choose random names for almost everything.

Calling a loop iterator variable "loop_counter_int" when calling it "i" can't be misunderstood either is as bad as naming all variables from x0 to x999.

If it's not possible to choose right names, you are possibly using too complex functions or using a variable for various different purposes.

Inlining code and allocating variable storage is the compiler's job. Don't try to be smarter - usually you are not.

When evaluating bigger expressions, the compiler adds unnamed temporary variables (one for each node in the DAG, read "The Dragonbook" if you want to know more about that). Let's have a look at this code from a self-modifying hashing algorithm:

	hash = (hash << ((hash % 7) + 9))) ^ (hash >> (32 - ((hash % 7) + 9))) ^ data[i];

This expression is hard to understand? Because of all the temporary variables with no names. If we split this code up into pieces with dedicated variable names, it becomes much easier to read:

	unsigned int shifting_level = (hash % 7) + 9;
	unsigned int cross_shifted_hash = (hash << shifting_level) ^ (hash >> (32 - shifting_level));
	hash = cross_shifted_hash ^ data[i];

Now the temporary values have names and it's much easier to understand the algorithm. With modern C compilers both variations produce exactly the same assembler code.

Some languages are designed to solve a problem. Others are designed to prove a point.

An issue that is IMO underestimated in almost all programming style publications is the process of choosing the right language.

	ht pu setxy -115 -200 pd

	to koch :length :depth
	if :depth = 1 [ fd :length stop ]
	koch :length / 3 :depth - 1 lt 60
	koch :length / 3 :depth - 1 rt 120
	koch :length / 3 :depth - 1 lt 60
	koch :length / 3 :depth - 1
	end

	repeat 3 [ koch 400 6 rt 120 ]

This LOGO program draws a "koch flake" (a very simple fractal). It is small, clean and (at least for logo programmers) very easy to read. While I believe that LOGO is almost the perfect language for this program, I also believe that Oracle SQL*Forms would be a pretty stupid choice.

Here comes the most important of all programming style rules:

Profanity is the one language all programmers know best.

One final word about API design: APIs don't need to be easy to use! Complex things sometime need complex APIs. But APIs must not be easy to use wrong!

And the two most important rules to reach this goal are IMO:

E.g. if it is not obvious that a function can fail, add a _try to the end of the function name, don't expect void pointers to be int-aligned, etc.

E.g. when a C function generates a string, it should always be null terminated. strncpy() and readlink() are good examples for bad APIs.

That's it for now. Maybe I will add more later.

Now get yourself a copy of "Elements of programming style" (by Kernighan and Plauger; McGraw-Hill 1974, 1978 ISBN 0-07-034207-5) and read it carefully. A summary of the rules from the book can be found here. If you are programming C (or any other language which looks like C), also read the Linux kernel Coding Style document (linux/Documentation/CodingStyle).