Wednesday, December 31, 2008

The 99.99999% done problem

I don't know what exactly causes it but some things are hard to finish.

Consider programming projects. Have you ever been on a project where... Actually, maybe I should stop right there. Have you ever been on a programming project before? Oh.. Well, in that case I'm going to pick another analogy.

Have you ever written a story or book or something? Well, as you probably have observed, getting the first draft completed, while nice, doesn't mean you're finished. Depending on how much you care about the final version, you'll go through many other drafts. The text will be written and rewritten each time getting better. It may take more time to go from the first draft to the final copy than it took to get the first draft in the first place.

From the point of view of someone who has never written, this may seem weird. How can it take more time to go from the first draft to the final version than it took to get the first draft? Well, for professional writing it can. It depends on how good you want the end product to be.

This sort of thing happens on software projects too. In fact this sort of thing almost always happens on software projects. The bigger the project the more likely, in fact. What happens is that there is a big difference between having all of features in a project working then having all the features in a project working to the point where you can sell it as a product.

Here are the usual things that need work before you can release it to the general public:

1) Error Handling.

Error handling is one of those things that doesn't add any obvious features, that no one wants to plan or think about, but that people yell about when it's missing.

When you're working on the code, you'll notice that some function or other can fail. Any network call, any disk IO function can fail, for instance. What do you do if there's an error? The most common thing to do is just to put up a dialog box that says "I can't do that because something bad happened." but this is rarely the right thing to do. The right thing to do is to try and recover.

For example, if you talk to the central server and there's an error, can you try another server? Can you simply reconnect to the server and restart the communication? This sort of thing is hard to plan out and implement but the users love it. It means that the software can automatically recover from errors without the user having to do anything.

2) "Minor" UI tweaking.

When finishing up an application "minor" UI tweaking often gets dropped by the wayside. The reason for this is once you have a bare-bones basic UI to access a feature everything else just adds time to the schedule. What adds to this, is that either nobody cares about the UI or everybody gets to add their two cents to the UI. In both cases the UI will end up sucking.

3) Bugs!

The difference between the number of bugs you find if you have one, ten, one hundred, one thousand and one million users is impressive. We seen this happen with our product at Intelerad. The more places we deploy our product to, the more bugs we get back. What's impressive, is that this product has been on the market and in use for over five years. The bugs we get back are ones that were introduced five years ago. These bugs have lurked in the code for over five years. The reason we are only getting them now is because the larger user base the more people there are using it in different ways.

People don't report bugs as a rule. When they do report bugs it's because either they're the kind of person that reports bugs or because the bug is blocking their work. The more people use your product the more likely that someone will find data but is blocking their work. Also, the more likely that you'll stumble upon someone who will just happen to report a bug. People who report bugs for the fun of it are extremely rare. I would say there much less than one in a thousand. As a result, you are still gaining many of these people as the total number of people using your product goes past a hundred thousand.

A you could release when there were only a thousand people using it can be more buggy than if there are one million people using it.

A product that is usable for demoing purposes can have many more very hard to find bugs that I product that has to be released to the general public.

4) Scalability:

It is far easier to build a web site/ service for use by a tiny number of people than for a large number of people.

Intelerad know how hard scalability is; it's one of our specialties.

Jeff Atwood does too.

.. so I'm not going to dwell on it.

If you want to cut corners, not making your site/application scalable is a great way of doing it.. unless you need it then you're doomed.

No comments: