Monday, July 21, 2014

OMG! It's OGF! How to Gauge Code Quality

Little while ago we were having trouble figuring out a way of determining code quality. Sure you could use metrics produced by tools like check style or unit test code coverage tools but I've never found these metrics to tell the whole picture. Technical debt and code quality are multifaceted problems that require the skills and experience of senior engineers. It's unlikely that any computer program will ever be devised that can give an accurate picture on code quality.

It's very easy to create a computer program that can find monstrously awful code. Some might call it the compiler. It's more difficult to create a computer program that can find merely mediocre code. If you're at an organization that's worried about code quality odds are you don't have monstrously awful code to worry about.

In QA they use a metric called overall good feeling (or OGM for the acronym obsessed. ie: aka OGF). The concept behind this is very simple: you just give your overall feeling as to the quality of the product as a number from 1 to 5. Five is very high confidence and one is no confidence. The reason we use this system is because we had trouble determining the quality of our products using metrics alone. You could use bug counts, regression counts and similar things to try and create an objective measure of the quality of a product but this will never give you the full picture. Why not just ask? OGF is a great way of polling the intuition of the people who are responsible for testing the product. Why not use the same technique for measuring code quality?

Let's say we wanted to figure out the quality of the code that makes up some module, let's call it module A. First we gather the relevant developers together in a room. The next step is we asked them all to come up with a number between one and five (where five is excellent code quality and one is terrible code quality) that best encapsulates code quality of the overall module. All the developers would then produce a number at the same time. The best way to do this is to use a system similar to planning poker where all the developers have five playing cards that go between one and five. Why not using playing cards? The developers first select the card that corresponds to their number and put put it on the table. When everyone has chosen, the cards are turned over at the same time. The point of doing it this way is you want all developers to poll their intuitions without being affected (infected?) by the views of their peers.

Of course, this number doesn't tell the whole story. It's also important to know how familiar developer is a piece of code. This familiarity quotient can give us insight into the developer's choice. Code quality number (aka CQN) the developers can rate their own familiarity with the code (cards aren't needed fo this step).

Let's assume that we have a group of developers who have given code quality and familiarity numbers for our module A. We now graph each developer's point on a graph like below:

If a developer is very familiar with the code and rates the code quality very highly we would get a point on the graph like this:

If the developer is not very familiar with the code and thinks the code quality is terrible we would get a point on the graph like this:

Once all the points of all the developers are graph we can see patterns very easily. For instance, this is good code:

This, on the other hand, is bad code:

However, I expect other patterns as well. These "other" patterns indicate a lack of convergence. But why?

Code with a steep learning curve might look like this:

Graphs like the above might also indicate that the cluster of developers who wrote the code like it but no one else can make heads or tails of it. It could be that the developers have written it in an idiosyncratic style (I know! I'll parse this using Perl and a banana!) or it might simply be intrinsically complex. Either way it's going to cause problems because new developers will have a tremendously difficult time learning how to interact with the code (What? I need a banana?).

If a module is simple and easy to understand but doesn't address edge cases in the design space we might get a graph that looks like this:

This code is more dangerous because developers just coming to the code feel that it should be easy to change and modify. However, anyone who's spent time with code will realize that it's a pain to make any changes work.

Controversial code would look like this:

There are many potential reasons why this could happen. All of them are worth investigating.

The following is code that everyone is afraid to touch. I call it "Haunted House Code":

...because no one goes in there. Most likely all the developers who wrote this code have moved on. Graphs like this imply estimates will be random numbers and that much work will be done before the real difficulty of the task emerges.

So, in conclusion, I believe that while code metrics are very useful I don't believe they can give a completely accurate story. I think that simply asking the developers what they think of the code quality is a valid metric. They are using it every day, after all. They are the most qualified people to give an assessment. It's important to know where the crap is buried because it is these pieces of code that will give you problems when you try to add new features. Software development is a minefield, think of these graphs is a mine detector.

No comments: