Monday, 10 June 2013

Code complexity

A comment from Dennis in my last blog brought up the subject of refactoring to reduce complexity.

It's not until you make a concerted effort to reduce code complexity that you begin to get a real feel for just how much bad coding practices lead to poor code.  It's pretty normal when starting out that you write code with no thought for maintainability or complexity. In fact as a noob you'll have no idea what complexity is. And like all bad habits that we get into when starting out, it takes concerted effort to change.

I rely on cyclomatic complexity to get a feel for how complex my code is becoming, either using the cccc command line tool, or metrics provided in plugins for IDEs such as Eclipse.

Cyclomatic complexity measures the number of paths code can take though a single function, so for example a simple if statement can either execute or not, so there are two paths and the complexity number of the function is two.

The general rule of thumb for cyclomatic complexity is if a function has a value over 10 then it should be refactored. Personally I find this too low, particularly when you have code that verifies its input arguments - you can end up exceeding the threshold even before your function proper has started. I prefer a threshold of 15. As a rule even with argument checking a value of 20 is a good sign that you're causing trouble for yourself later when it comes to maintainability. Should you get to a value of 40 or more, you are morbidly unmaintainable.

In one of the greatest articles of all time, OpenSSL is written by monkeysMarco Peereboom charts his gradual meltdown as he comes face to face with the awfulness that is the OpenSSL library source code. OpenSSL is a very important library and it's contribution to the Internet as a whole cannot be overstated, but that does not mean it's above criticism. It's code uses a style that assaults the eyes and leaves me needing to leave the room for a while to calm down. But on top of that it is dense. Really, really dense. I ran it against a complexity analysis tool and discovered functions that had complexity values over 100, with the largest reaching 142. For any code that is awful, for a project that is relied upon by thousands of products and millions of users worldwide that's unforgivable.

Two of the things you discover when you refactor to reduce complexity is the reduction of lines of code per function and the need for comments in the code is also reduced. The first point is self evident, with code moved out to smaller, maintainable, functions it's obvious that the number of lines of code per function will decrease. As for the remark about comments this comes down to sensible function names. A call to the new function is as good as a comment, assuming you have given your function a sensible name, but you're just as interested in code readability as much as code complexity, so of course you've done this. 

I find I rarely need to comment Objective-C code, because the expressive function names means I don't have to, but with complexity reduction and sensible naming even boring old C code can be written that doesn't need much commenting.

Cyclomatic complexity isn't perfect, it can result in large numbers if you have a switch statement with lots of possible cases, but these instances can be reviewed and ignored. Overall it is a very powerful metric and one that will bring your code up a level in terms of professionalism.

No comments:

Post a Comment