Thursday, June 11, 2009

The speed size and dependability of a reddit traffic spike

Humm.. It looks like my blog posting on progress bars has taken off on reddit. I actually posted the article to reddit after a friend's blog posting had hit the big time on that site as well. although it didn't work.. yeah..

He had just written a blog posting comparing the speed, size and dependability of almost all programming languages using a novel graphing technique. He is a computer languages guy and ever since reading tufte he's been into using graphs to represent data in new and enlightening ways. I thought the blog posting was impressive. Apparently somebody else did too and posted it to reddit. From there it took on a life of its own and found its way to multiple sites including the venerable Slashdot.

I've always loved how that happens. I used to write a peer-to-peer application called Myster. When introducing Myster to the world, we didn't do much advertising. The only thing we did was to post it to one or two Macintosh oriented websites. One of which was the venerable MacInTouch. Oh wow. I woke up the morning that it was posted to a set of crashed computers. Apparently my p2p network wasn't as scalable as I thought.

After frantically fixing the accidental thread bomb (and about a dozen scalability issues for which the application's new found fame turned out to be very useful) I started to track where Myster was being mentioned. The answer, it turned out, was everywhere. It was showing up on news sites I'd never heard of. It was in application repositories I'd never added it to. Myster had been available for about a year and had never really gotten anywhere. Then, one morning we posted it to the right place and it explodes.

I quickly realized that in order to have any control over how my application was going to be presented I needed to move quickly or else somebody would do it for me. This, being an important task, was delegated to a neighbor of mine who turned out to be very good at it. :-) He is now a bike courier. Completely wasted talent if you ask me. He did the website for Myster as well is all the graphics. Just look at the logo. It's gorgeous! I mean argh! irk! egh! Anyway, enough of that.



The speed that news travels on the Internet is impressive. It's scary how fast you can go from "Oh, someone has posted my article on reddit " to "Oh crap, how much is all this bandwidth going to cost me?".

My friend had gotten his page to the financial-panic level of Internet popularity and I was curious to know if I could do the same with my article on progress bars. So, I dutifully signed up for a brand, spanking new account on reddit and tried submitting my article. It didn't get much love.

Actually, it didn't get any love. It got exactly zero votes. This was not the reception I was hoping for but I figured it was probably just the content. I mean who would want to read a 5000 ish word article on the intricacies of progress bars anyway?

My friend said I was completely wrong.

In his opinion my reddit submission title completely sucked. He was sure he could do better. No, that doesn't accurately reflect his scorn for my reddit abilities. He was sure he could do better with one hand tied behind his back and a family of monkeys beating his head with old IBM clickety keyboards. But first he had to wait a few days because reddit doesn't like duplicate submissions...

Well, today he took a stab at it and last time I checked it was top of the heap.. so I guess he's right. I have my doubts as to whether the monkeys were using real, genuine, old-school IBM clickety keyboards but I must admit it's still impressive.

Just for the record, I'm also still glad I delegated Myster's PR fanfare to that neighbor-friend of mine. Perhaps he should start a blog? I'd read it and I promise not to submit it to reddit. :-)

Sunday, May 31, 2009

How to work with progress bars, part 2

On the last episode of "As Andrew Rambles About Progress Bars" (or whatever I'm calling this series) I raised the following questions:

What if, you have a task for which it is completely impossible to gauge how long the task will take?

What if, you have a task for which you have a rough estimate for how long it will take but, you have no way to increment the progress bar because the library/process/whatever doesn't provide any feedback. For example, maybe you're running a third-party process and there's no way for you to get any feedback about what that process is doing. Now would you do?

What if there's a bomb on a bus. Once the bus goes 50 miles an hour, the bomb is armed. If it drops below 50, it blows up. What do you do? What do you do?

I have answers to these questions. Well, all questions except the last one. But first: task estimation.

In the comments of my last article, a friend of mine pointed out that I never actually mentioned how you come up with the numbers you pass to compositeProgressBar.buildSubProgressBar(). Typically, all you need to do is run your program a few times and time how long each sub task takes. You can then figure out how long the task is spending in each one of the subtasks. You then use that to figure out what to send to compositeProgressBar.buildSubProgressBar().

For example:


sub task | time in seconds | value to send to compositeProgressBar.buildSubProgressBar()
1 | 10 | 10 / 45 = 0.222
2 | 20 | 20 / 45 = 0.444
3 | 15 | 15 / 45 = 0.333

Total time: 45 seconds


And there you go. I would suggest doing more than one timing with many sets of data.

Sometimes the ratios don't stay fixed and instead vary depending on what sort of machine you're running on or what set of data you have.

The programmers I've talked to are always concerned about giving inaccurate information in a progress bar. They've all experienced the frustration of an inaccurate progress bar. It's important to not fall into the trap of thinking that the progress bar needs to be super accurate. At the same time, you need to be careful you don't make the classic error of having a progress bar that jumps to 90% done right away and stays there forever. It can be a delicate balancing act but in my experience, programmers often underestimate the value and accuracy of the information they have.

Consider the following: as a programmer, you almost always know approximately how long the task will take to complete to complete. Put yourself in the user's shoes. The user has absolutely no clue whatsoever how long the task will take to complete. As far as the user is concerned it can take a second, an hour, a day or a millions years. If you know the task will take, at most, one minute, then you already know far more than your user. Communicate this very valuable information to them.

Before we continue remember that there is no solution to determining how long a task will take in the general case. To solve that you would need to solve the the halting problem. That's not going to happen. Instead I can show you a few tricks I've learned about displaying a progress bar in circumstances where you wouldn't think it would be possible.

First question: what is the probability that the task will finish at any given time?



If the probability graph looks like this:



That's good. It's an easy candidate for progress bar because the progress bar tends to finish in one range of values of time.

If you have a task made up of subtasks, it's also makes things easy if the time taken by each sub task stays in a fixed ratio. That is, if one sub task takes 1 second then you know that the next sub task will take approximately 1.5 seconds and the sub task after that will always be a little shorter. In this case, let's say half a second.

If each of the subtasks doesn't stay in a fixed ratio than that's a bummer. When estimating the time a task will take, it's good to consider whether it's likely your task is network bound, disk bound, CPU bound or other. This is important because if you have a large task that consists of, say, a network bound task and a CPU bound task then it's going to be difficult to give an accurate (at least with respect to time) progress bar. On one computer, you may have a very fast CPU with a very slow network connection and on another computer the situation may be reversed. This will cause the progress bar to zip through one section of the meta- task and then crawl through the other section.



The situation above are fairly rare. Most tasks tend to use some combination of the resources of a machine and as a result the differences in the speed of various components tend to cancel out so that the ratios between the different subtasks tend to remain constant (to some reasonable approximation).

If you have run your progress bar a few times and are not happy with its accuracy you have two choices. 1) Run a calibration loop or otherwise figure out how to compensate for these things on that particular machine or 2) live with the fact your progress bar isn't going to be super accurate.

There's something of a black art to writing calibration loops. You have to be careful of things like caches and you have to make sure that you add the right fudge factors to make the progress bar behave correctly. It's surprisingly time-consuming to do this right and most times, in my experience anyway, it's not worth the effort. If you find yourself having to do one of these then it is going to take you a while.

A more common problem is that a program might have no way of getting any feedback as to the progress of one of its subtasks. This can happen if you have to hand off a task to a library.

If you actually do know, approximately, how long that task will take to complete in real time then you still have a chance. You may know that it will take approximately 2 seconds for every item that it's processing. Or you may be able to guess at how long the task will take given the time it took to complete some previous processing step. For example, if it took 5 seconds to complete the first step, it should only take about 10 seconds to complete the second step.

In these situations I like to use a technique that I call Zeno's progress bar. It's a way of providing a progress bar when you have no idea what the real progress is. It's terrifyingly convincing.

Before I continue I would just like to say that I am serious when I say it's terrifyingly convincing. I've actually been duped by my own Zeno's progress bar at least once.

While using the product I work on (Inteleviewer) one day, I noticed that it was providing a progress bar. However, I was under the impression that this was impossible because the task it was using a library that had no way of providing any feedback. I was curious to see if someone on the team had fixed the library or had found some other clever trick to get the library to provide feedback. Someone had found a trick. It was me! It was my own Zeno's progress bar! I had made this error knowing that there were these sorts of progress bars lurking in the code. It was then I knew I had to share this idea with the world.

I came up with the idea based on watching how people would provide their time estimates for tasks that they weren't being informed on the progress of but had some idea of how long it should take.

An example would be an online merchant that knows that something is supposed to ship within two weeks but since it's being shipped by the manufacturer of the product, they really have no good idea of the order's real status. They will tell you it will ship in two weeks and if it doesn't then they will tell you it will ship next week and if it doesn't then they will tell you will ship in the next few days. It still hasn't shipped they will always keep repeating that it will ship in the next few days. Only at this point you start to realize that it may never ship at all.

If you think about it you can probably come up with other examples of this sort.

The technique works really well when your estimate is off slightly. It works even better when your estimate is dead on. This is what it looks like in a progress bar:



Let's say you have a task that could take 10 seconds. You start off with the progress bar moving at a speed that is consistent with the task taking 10 seconds. At the 5 second mark you slow the speed of the progress bar by half. So that by the 10 second mark the progress bar will only be at 75% complete. If all goes well, and your estimate of 10 seconds is correct, then the progress bar will just jump to 100% at this point. If it doesn't and your task is going to take longer then you slow the speed of the progress bar by half again. By the 15 second mark the progress bar will now be at 87.5% complete. This goes on until your task completes.

Obviously, you should try to avoid giving overly optimistic estimates. Being completely off results of one of those annoying 99% complete progress bar that we all love to hate. Being off by only a factor of two, however, actually still gives a half decent effect.

Okay, so how do you code this puppy?

Well, the first step is to create an object to update the progress bar without any help from the task itself. You see, usually the task would tell the progress bar to update from its own thread. In this case it won't so we'll have to reproduce the effect using a timer. It looks like this:



/**
* Makes a progress bar that can show progress if all you know is approximately
* how long the task will take.
*/
public class MagicProgressBar {
private static double BASE = 0.5;

private final Progress progress;
private final int estimatedTime;
private final Timer timer;

private long startTime;

public MagicProgressBar(Progress progressBar, int estimatedTimeOfTask) {
this.progress = progressBar;
this.estimatedTime = estimatedTimeOfTask;
timer = new Timer(30, new ActionListener() {
public void actionPerformed(ActionEvent e) {
//progress.advance( magic??? );
}
});
}

public void start() {
startTime = System.currentTimeMillis();
timer.start();
}

public void stop() {
timer.stop();
}
}


You would use it like this:


private static void doMagic(Progress masterProgress) {
MagicProgressBar magicBar = new MagicProgressBar(masterProgress, TIME_ESTIMATE);

magicBar.start();
doStuff(); // doesn't provide progress
magicBar.stop();

masterProgress.advance(1);
}


Where "doStuff()" is the task were waiting to complete. Note that TIME_ESTIMATE is not usually a constant. In many cases it would be calculated by using a weighting number, like we sent to compositeProgressBar.buildSubProgressBar(), and the time it's taken to do some earlier step in the task.

Well, we still need to add the implementation of the timer in the MagicProgressBar class. Let's do that now.


timer = new Timer(30, new ActionListener() {
public void actionPerformed(ActionEvent e) {
long timeSinceStart = System.currentTimeMillis() - startTime;
int unitLengthInMillis = estimatedTime / 2;
int units = (int) (timeSinceStart / unitLengthInMillis);

double percent = ( timeSinceStart % unitLengthInMillis ) / (double) unitLengthInMillis;
progress.advance(findSum(units) + percent * Math.pow(BASE, units + 1) );
}
});

private static double findSum(int n) {
return (Math.pow(BASE, n + 1) - BASE) / (BASE - 1);
}


"findSum()" is calculating how far complete the progress bar should be given at a given time unit index. In our example, where we expect the task to take 10 seconds, unit index 1 would be the first 5 seconds, unit 2 would be the next 5 seconds, unit 3 would be the 5 seconds after that etc... a unit's length is equal to one half of the estimated time of the task.

The rest of the code is either trying to figure out which time unit we're at or how far through the current time unit we are.

Let's see what the object looks like when we put it all together:


/**
* Makes a progress bar that can show progress if all you know is approximately
* how long the task will take.
*/
public class MagicProgressBar {
private static double BASE = 0.5;

private final Progress progress;
private final int estimatedTime;
private final Timer timer;

private long startTime;

public MagicProgressBar(Progress progressBar, int estimatedTimeOfTask) {
this.progress = progressBar;
this.estimatedTime = estimatedTimeOfTask;
timer = new Timer(30, new ActionListener() {
public void actionPerformed(ActionEvent e) {
long timeSinceStart = System.currentTimeMillis() - startTime;
int unitLengthInMillis = estimatedTime / 2;
int units = (int) (timeSinceStart / unitLengthInMillis);

double percent = ( timeSinceStart % unitLengthInMillis ) / (double) unitLengthInMillis;
progress.advance(findSum(units) + percent * Math.pow(BASE, units + 1) );
}
});
}

public void start() {
startTime = System.currentTimeMillis();
timer.start();
}

public void stop() {
timer.stop();
}

private static double findSu5m(int n) {
return (Math.pow(BASE, n + 1) - BASE) / (BASE - 1);
}
}


Now you have the power of Zeno's progress bar. Use it wisely, young grasshopper. Zeno's progress bar has no mind. Isn't it wiser to seek proper progress reporting then to desire the swift completion of the programming exercise? Oops, I'm channeling David Carradine again. All I'm trying to say is don't use it for the hell of it. That's all.

Okay, on to the next part which is all about indeterminate progress bars.

This is what an indeterminate progress bar looks like:



The one of the left is used on the Macintosh. The one on the right is used by Windows.

Indeterminate progress bars are used to signal that we have absolutely no idea how long a task is going to take. Both the Macintosh and Windows version of this type of progress bar use animation to signal that, yes, there really is something happening. The computer has not fallen asleep. It has not started daydreaming. It hasn't quit and joined a hippie commune where it's not asked to do tasks that are difficult to estimate tasks all day.

If you really, really don't know how long the task is going to take then this is the progress bar to use.

The decision to use an indeterminate progress bar is a hard one. Most of the time you'll want to avoid using it. Users don't like being in the dark about how long their computer might be occupied. Going back to my online merchant example above, this would be the equivalent of an indeterminate progress bar:

Customer: How long is it going to take the ship?
Merchant: I don't know.
Customer: Is it going to take about two weeks? Or two days? Or two years?
Merchant: I don't know.
Customer: Are you saying it may take two years for me to get my package?
Merchant: It could. I don't think that's very likely though. Well, I don't know, it could take two years. Yes. Maybe.

Despite the fact that indeterminate progress bars are annoyingly, I can think of two common uses for them..

The first use is for the relatively short-lived progress bar that shows up because the computer is taking a little longer than it should to do something and the programmer thought that he should provide some obvious feedback that the computer hasn't crashed. These indeterminate progress bars are often short-lived. Most the time the programmers didn't even think it would be shown at all. I haven't seen one of these types of progress bars in a while. In this case the indeterminate progress bar means "hang on a second".

Another common use for the indeterminate progress bar when it's used at the beginning of a long task when the computer is still calculating how long it will take for it to do the task. For example, if you're running a calibration loop at the beginning of your process, you might show an indeterminate progress. Another example would be the OS might show an indeterminate progress when it's calculating the number of files involved in a file copy operation. It's also often used at the beginning of a task when trying to connect to a remote server. Connecting to a remote server is one of these things that will either be instantaneous or take a while due to the server being down or a network error.

In these examples you go from an indeterminate progress bar to a regular progress bar. This sort of transition is okay. You're essentially saying to the user "I have no idea how long this is going to take" because you're trying to figure out how long the task is going to take. So long as this step is relatively short it's ok.

Going from a regular progress bar to an indeterminate progress bar is less ok. It's like saying you know how long a task is going to take then saying you have no idea then claiming you know again.



I have seen this behavior when it's used to mean "hang on a second, I've got an unexpected delay". A task might display this when it tries to connect to a server that's taking a while to respond. I've also seen it during a compression task that would occasionally start an optimization step. If the optimization step was taking too long the progress bar would become indeterminate for a while.

Progress bars that go from a determinant progress to an indeterminate and back are very rare. It's probably not a pattern you'll ever need to use.

Well, that's all I have to say. If you want to read more about progress bars (and who doesn't.. pfff..) then I'd suggest visiting Jeff Atwood's blog post on the subject. It's a great starting off point. He also goes into how to make your task look like it ran faster without actually making the task run any faster.

You can also check out the miscellaneous human interface guidelines documents by Microsoft and Apple. Perhaps you want to expand your horizons into other kinds of progress indicators.

Oh, and here's the source code for all the examples I've given.

Until next time bye.

Thursday, May 28, 2009

Cross Canada Bike Trip

Well, my parents have officially left for their cross canada bike tour. The tour is to raise money for the JDRF. They are making a blog of their trip. They will posting as they find wi-fi spots. I wish them luck.

Monday, May 18, 2009

Canadian Unemployment Rate

Hello everyone. I thought I'd give an update to what's been happening to the unemployment rate since my last post.

The unemployment rate for this month is 8%. That's pretty high but not completely ridiculous. The last time the unemployment rate was at 8% was January 2002. The highest unemployment rate on my graph is 8.2% during April 1999.

It's nice to know that the unemployment rate is not increasing anymore. If you look at the graph you can see how the rise in the unemployment rate is unprecedented over the last ten years or so.

It's a pity I don't have the unemployment rate graph going back longer than 1999. It would be nice to see how this latest recession compares historically to others.

Monday, April 20, 2009

CRTC feedback

Recently a local Internet service provider called TekSavvy e-mailed its clients about the fact that Bell was asking the CRTC to allow a to put a cap of 60 GB per month on its wholesale DSL.

Bell sells DSL services to Internet service providers. Essentially, it sells the link between your house and Internet service provider. The Internet service provider is responsible for linking you to the rest of the Internet. Bell is already throttling everyone claiming that it's trying to control congestion on the network. I am mighty suspicious of this claim.

This is what I sent to the CRTC:


It's important to foster competition between ISP in order to make sure that consumers can have a wide selection of billing plans / possibilities available so they can make the best decision for them. Bell should not be allowed to set monthly caps per user. The decision to do this should be up to the internet service provider. Mandating this destroys all sorts of potential pricing structures.


Pricing for wholesale DSL rate should be based on the scarcity of the resource in question. Capping bandwidth per user is an attempt to deal with bandwidth issues. If it's the number of megs per second causing the slowdown then it makes sense to bill wholesale DSL service based on that. It doesn't make sense to bill on that *and* some other thing, however.


If we are to have DSL service whole selling then it must be considered to be part of the spec that the DSL network be data neutral (no preferential throttling speedup or slowdown based on the data flying over the network; pretend it's encrypted) and user neutral (rate doesn't change based on which account I use or how much any account has used). This configuration sends the right supply/demand signals back to the ISP.


Consider that I don't use Sympatico or any other DSL ISP because they violate network neutrality by throttling certain specific protocols. If Sympatico continued to throttle certain protocols but other DSL ISPs did not I would have an alternative ISP to go to. This despite the fact that that other, none throttling ISP may have a badwidth cap in place at, say 30 or 40 gigs per month. The third party ISP in this example chose an different approach to dealing with their bandwidth issue.... The argument works for many other potential plans. Currently I'm on a plan with that limits maximum bandwidth speed to a relatively modest level but also provides a large download cap for a relatively high price. This sort of trade off is possible with the simple system mentioned above.


I would've kept going but there was a 2000 character limite and so I had to stop there. Here there was some of the stuff that was cut out.

Consider this:


By putting all of its throttling on the *DSL* half of the link it *guarantees* that ISPs won't be able to find innovative solutions to the bandwidth crunch. Given that there are many independent ISPs and only one Bell ISP (Sympatico) it's likely that if an innovative (and counter intuitive) approach to managing the bandwidth crunch were to be developed, it would be developed by a third party DSL ISP. This would essentially mean that sympatico would be vulnerable.



What bandwidth shortage?

How to work with progress bars, part 1



Have you ever seen a progress bar that goes from empty to full then starts over at the beginning again? I really hate those.

I had always assumed that progress bars were pretty uncontroversial. It wasn't until I worked at Intelerad for about a year that I found out that many people don't actually understand what progress bars are for, how they should work or how to write code to implement them.

When you present a progress bar to the user you're telling them:

1) That the computer is doing something that'll take a while.
2) That the computer hasn't frozen or otherwise become unresponsive.
3) Approximately how long the computer will be busy.


It's important to remember that a progress bar is not to display the progress some arbitrary chunk of code. If you think of a progress bar showing the progress through some function or chunk of work, you're likely to show multiple progress bars in a row where each filling of the progress bars will represent the progress through a particular task. Don't do this.

What the user actually wants to know is how long the computer will remain busy doing its thing. Alternatively, how long it will be until the computer returns a result. When you've filled up a progress bar and restarted it, you're messing with the user's head.


(Don't mess with the user's head)


A common worry when creating code that has a progress bar is that if you show progress bar for the entire time the computer is busy, and not for individual subtasks, then that code will not be reusable. It won't be reusable because you'll have to hard code the values you're setting the progressbar to in that function. As a result you won't be able to use that code with, say, another progressbar because the values that you need to set the progress bar to in that context will be different. 80% done for this progress bar might be only 40% done in a different context.

It's actually very easy to create little subroutines that only worry about the progress though their portion of the task and then wrap those progress meters into meta-progress meters to show the progress through any larger task. Here's how you do it.

Progress bar-fu:
(n.: the ancient Japanese art of making progress bars that don't jerk the end-user around)




Think of the progress through the overall task as being the sum of the progress through each individual task. Looking at the problem this way it should become apparent that each sub task can look at its own progress as going from 0 to 100% with this value being a smaller proportion of the progress of the overall task.

Let's look at an example. Let's say we have an overall task composed of three subtasks. The first sub task is 40% of the overall task. The second sub task is 50% of the overall task. The last task is 10% of the overall task. Drawing this out your great little chart like the following:



Here we can see that the first subtask's progress, as a value that goes from 0% to 100%, is equal to the overall task progress going from 0% to 40%. All we need to do to convert the sub task progress to the progress through the overall task is to multiply it by .0.4 (or 40%).

This strategy composes nicely.

Let's say that our first sub task is composed of two subtasks. The first sub sub task goes from 0 to 30%.



We can calculate the value of the subtasks progress by multiplying the sub-subtask progress by 0.3. We can then see the sub sub task's contribution to the overall task by multiplying it by 0.3 then 0.4 (or 0.4 * 0.3 = 0.12). So when the sub sub task gets to 100% we will have completed 12% of the overall task.

Here' how you do this in code. I'm going to use java because lots of people use it, understand it and it's what I know.

First you need an interface like this:

public interface Progress {
/**
* @param progress number between 0 an 1 that signifies the progress through a task.
*/
void advance( double progress );
}

You pass an object of this type to any function you want to track progress for. Like this:

/**
* Tells a progress bar to go from 0 to 1 (complete) in steps
*
* @param progress
* to set
* @param steps
* to take.. More takes longer.
*/
private static void pretendToDoSomething(Progress progress, int steps) {
progress.advance(0);
for (int i = 0; i < steps; i++) {
sleepAWhile();
progress.advance((double) i / steps);
}
progress.advance(1);
}

Oh course with real code you'd be something useful, but you get the idea.

Next you'll want an object to represent the overall progress. Here's what the interface would look like:

/**
* Allows for a progress bar made up of multiple sub progress bars.
*/
public class CompositeProgressBar {
public CompositeProgressBar(Progress progressBar) {
}

/**
* Builds a new sub progress bar. A sub progress bar is a progress bar whose
* full length maps to the param subProgressBarSize of the
* {@link CompositeProgressBar#masterProgress}.
*
* THis method will also advance the previous sub progress bar to 100%.
*
* @param subProgressBarSize
* @return a {@link Progress} that represents param
*/
public Progress buildSubProgressBar(final double subProgressBarSize) {
}
}

The idea is you'd pass the Progress object representing the overall progress and you'd call buildSubProgressBar to, well, build the sub-progress Progress objects.

Here's a typical example:

/**
* Demonstrates how the progress bar can be used recursively.
*
* @param progress
* - could be any progress - goes from 0 to 1
* @param label
* for text output
*/
private static void moreSubTasks(Progress progress, Sayable sayable) {
CompositeProgressBar compositeProgressBar = new CompositeProgressBar(progress);
sayable.say("part 1");
pretendToDoSomething(compositeProgressBar.buildSubProgressBar(0.2), 100);

sayable.say("part 2");
pretendToDoSomething(compositeProgressBar.buildSubProgressBar(0.2), 102);

sayable.say("part 3");
pretendToDoSomething(compositeProgressBar.buildSubProgressBar(0.6), 103);
}

Here's the implementation of CompositeProgressBar.pretendToDoSomething(Progress progress, int steps);

/**
* Builds a new sub progress bar. A sub progress bar is a progress bar whose
* full length maps to the param subProgressBarSize of the
* {@link CompositeProgressBar#masterProgress}.
*
* THis method will also advance the previous sub progress bar to 100%.
*
* @param subProgressBarSize
* @return a {@link Progress} that represents param
*/
public Progress buildSubProgressBar(final double subProgressBarSize) {
progresSoFar += currentSubProgressBarSize;
currentSubProgressBarSize = subProgressBarSize;
return new Progress() {
public void advance(double progress) {
if (progress < 0 || progress > 1)
throw new IllegalAccessError("\"progress\" should be less "
+ "between 0 and 1 but was: " + progress);
masterProgress.advance(subProgressBarSize * progress + progresSoFar);
}
};
}

For context here' what the entire object looks like:

/**
* Allows for a progress bar made up of multiple sub progress bars.
*/
public class CompositeProgressBar {
/** The master progress bar we're splitting into sub progress bars. */
private final Progress masterProgress;

/**
* Amount of progress we've gone though so far, not counting the amount is
* the latest sub progress bar
*/
private double progresSoFar = 0;

/**
* The length of the current sub progress bar.
*/
private double currentSubProgressBarSize = 0;

public CompositeProgressBar(Progress progressBar) {
this.masterProgress = progressBar;
}

/**
* Builds a new sub progress bar. A sub progress bar is a progress bar whose
* full length maps to the param subProgressBarSize of the
* {@link CompositeProgressBar#masterProgress}.
*
* THis method will also advance the previous sub progress bar to 100%.
*
* @param subProgressBarSize
* @return a {@link Progress} that represents param
*/
public Progress buildSubProgressBar(final double subProgressBarSize) {
progresSoFar += currentSubProgressBarSize;
currentSubProgressBarSize = subProgressBarSize;
return new Progress() {
public void advance(double progress) {
if (progress < 0 || progress > 1)
throw new IllegalAccessError("\"progress\" should be less "
+ "between 0 and 1 but was: " + progress);
masterProgress.advance(subProgressBarSize * progress + progresSoFar);
}
};
}
}

.. and, as I've just mentioned, you use the object like this:


/**
* Demonstrates how the progress bar can be used recursively.
*
* @param progress
* - could be any progress - goes from 0 to 1
* @param label
* for text output
*/
private static void moreSubTasks(Progress progress, Sayable sayable) {
CompositeProgressBar compositeProgressBar = new CompositeProgressBar(progress);
sayable.say("part 1");
pretendToDoSomething(compositeProgressBar.buildSubProgressBar(0.2), 100);

sayable.say("part 2");
pretendToDoSomething(compositeProgressBar.buildSubProgressBar(0.2), 102);

sayable.say("part 3");
pretendToDoSomething(compositeProgressBar.buildSubProgressBar(0.6), 103);
}



All we have to do now is look up this composite progress bar to some sort of GUI component. To do this we have to create a progress object that wraps a JProgress instance. Here's how you this:


private static Progress convertToProgress(final JProgressBar progressBar) {
return new Progress() {
public void advance(double progress) {
progressBar.setValue((int) (PROGRESS_MAX * progress));
progressBar.setString((int) (100 * progress) + "%");
progressBar.repaint();
}
};
}


You can then place this JProgress into a JFrame and send the Progress object to the CompositeProgressBar constructor and you're all set.

Congratulations. You are now masters of the first level of progress bar Fu. You can write a progress bar that accurately reflects the progress of the overall task, even when the overall task is made out of little, tiny pluggable pieces of code. What is more, those little tiny pluggable pieces of code can now be reused in different contexts, with different progress bars. This is truly a great day for the user.

The remaining question is, how do you get to level 2 of progress bar Fu? Ah, that is a good question young grasshopper.

What if, you have a task for which it is completely impossible to gauge how long the task will take?

What if, you have a task for which you have a rough estimate for how long it will take but, you have no way to increment the progress bar because your code is blocked doing something else. For example, it may be doing some I/O in a different thread. Alternatively, maybe running a third-party process and there's no way for you to get any feedback about what that process is doing. Now would you do?

For the answers to those questions you'll have to click here to go to part II


Part II

Thursday, April 9, 2009

New desktop

This is the sort of thing I was dreaming about when I was a kid waiting for the release of Apple's "Copland" OS.

http://arstechnica.com/software/news/2009/04/hands-on-bumptop-may-be-the-desktop-revamp-you-waited-for.ars

Three dimensional desktop. Awesome new spatial OS or yet another gimmick?

Tuesday, February 24, 2009

CRTC hearings on internet throttling and network neutrality

The CRTC is holding hearings on internet throttling. The union voice is running a handy submission form where you can post your messages. Normally I'd write a letter but I'm just about out of time.

Here's what I posted:

I submit that the CRTC should stop Internet Service Providers from discriminatory traffic-shaping practices.

Given the near monopoly enjoyed by the high speed internet service providers, they should not be imposing any traffic shaping rules on data flowing over their networks.

The lack of vigorous market competition amongst last mile internet service providers means that Bell has effective control over which applications can be used.

Bit torrent is not a content provider, it is a content delivery system. It allows a single entity to effectively host a huge amount of data without having to directly server all that content directly to each user.

This allows for the hosting of huge amount of data by a single individual to be done very cheaply without having to involve a third party "free" distributer like youtube, flicker or download hosting service.

Bit torrent is an example of a disruptive technology that can change the balance of power between publishers and content providers. It allows anyone to host content directly for no cost as extremely high speed. It is important that this technology not be throttled.

Additionally other technologies are waiting in the wings. Things like Voice over ip and radio / video streaming (such as internet radio and things like the BBC world service) are both high bandwidth application that could end up competing with services that Bell provides (cable-TV like services, video on demand, phone services). It would also be easy to fabricate reasons why these too should be throttled. This is to say nothing about any new service or capability not yet developed.

Many of the largest network providers have also branched out into providing other types of content and services on their networks. These services can be looked at as being proprietary versions of capabilities that are provided or could be provided by providers on the internet. Bell and Videotron can be seen as being in competition with services on the internet. As a result they should not be allowed to throttle or
packet shape any one particular service or protocol.

If consumers could get away from Bell's throttling by leaving Bell's sympatico ISP and joining another third party DSL ISP (like TekSavvy, for example) then it would not be a problem. The third party ISPs could decide their own strategy to deal with an over-loaded network. They would almost certainly compete with one another to find innovative way of solving their congestion issues.

Since no one has yet been successful in splitting the responsibility between the companies that run the wiring and the companies that use the wires to send and receive data over those wires, ISPs must remain neutral and should be allowed to squelch new protocols and services as they appear on the internet.

As Canada moves deeper into recession, we must create new opportunities for innovative new companies to create new products and services not protect incumbent network providers attempts to control which new services and products are provided to consumers and on what terms.

The CRTC can do its part by enacting and enforcing policies that help build an open, fast, and accessible Internet in Canada.

Sunday, February 15, 2009

Blogging with an email address

Neat, I just found out that you can add a blog posting by emailing an address. This is built into blogger.

This would meant that if you wanted to add a blog posting all you would need to do is send an to blogger and it would convert it into a blog posting. I wonder if images would get added too?

Sunday, February 8, 2009

Historical Canadian Unemployment Rate

There's been a lot of talk recently about Canada's unemployment rate. It's now 7.2%. I've been following this for a while and I think it's a good idea to put this jump in the unemployment rate in perspective.

The graph below was generated using the statistics Canada data available on the Canadian economy website.

The Canadian economy website is incredibly useful. It gives you all the important economic trends including historical data. Trying to find all this stuff on the statistics Canada website is incredibly difficult. I'm glad they put it all in one place for you.


(Click on the image to make it larger)


Notice the overall trend in the unemployment rate. I can remember the unemployment rate being something like 10%. I'm very glad it's no longer that high. I'm also very glad that I happen to be in an industry where the unemployment rate is much, much less than even 6%. Those were bad times.

It looks like the an unemployment rate hasn't been this high since November 2004.

Sunday, January 4, 2009

Emotions and Ekman

Paul Ekman is featured in the book "Blink". He works on how the face produces emotions, what faces correspond to which emotion and which facial expressions are learned vs inate. Here's an interview I found with him.
http://www.youtube.com/watch?v=IA8nYZg4VnI

After that you can cleanse your pallet with this:
http://www.youtube.com/watch?v=xpcUxwpOQ_A

Friday, January 2, 2009

NaturallySpeaking 10 test drive

Hello.

I suppose I should actually do a blog posting about Dragon NaturallySpeaking. I've been using Dragon NaturallySpeaking, now, for about six months. The version I'm using is version 10. I got at the moment it came out. In fact I actually bought it before Amazon Canada managed to get any stock of it.

I have to say that the speech to text engine is really quite impressive. And I can easily get two hundred words per minute. I suspect a lot of the reason for this is because I speak with a fairly common northeastern American accent. A friend of mine who has a sort of Québecois French Canadian accent and he had a lot of trouble getting Dragon NaturallySpeaking to work for him. I think he had to take pronunciation courses in order to get it to work at all.

Installation went fairly smoothly. I don't really like their installer, though. It doesn't have a progress indicator on it. Well, that's not completely true. It does have a progress indicator on it but the progress indicator doesn't actually show progress. It starts at the beginning and then fills through to the halfway and then goes through to 100% and then starts over again. I didn't count how many times it did this but it was enough to be annoying. The first time the progress indicator came up I was using it to gauge approximately how long it would take to do the install. I could just hear it laughing at me when it started back at zero again. Ha ha. You thought I was done.

When you use Dragon for the first time it does sound checks and make sure that the microphone you're using is of good enough quality. I use the included Dragon NaturallySpeaking microphone and was rather surprised to find out that it failed the quality test. This turned out to be because I had a second microphone plugged into the back panel of my computer. It was using the crappy microphone that was included in the computer instead of a Dragon microphone. Doh!

My dad was curious to try it as well. Since he has a different accent from mine I was curious to know if it would work as well. At first he tried use Dragon on 64-bit Vista. It didn't work. Apparently it doesn't support 64-bit Vista. It won't even install. booo.

My dad then tried it on his desktop computer. While it installed fine, we couldn't get past the voice quality test. Unlike me he didn't have another microphone plugged in. I spent some time trying to figure out what was wrong. I could record just fine in a third-party audio application. The microphone sounded great! Unfortunately Dragon and just wouldn't work with it. I still don't understand why.

In desperation we tried it on his other laptop computer. This time it worked flawlessly. I'm still annoyed that it didn't work on his 64-bit Vista laptop or his desktop computer. I don't understand why didn't work on his desktop machine. It's most perplexing. All of this means that Dragon only successfully installed on half the computers I tried it on. That's kind of depressing. I would make sure you can get your money back just in case it doesn't work.

My dad reports that it works great on his older laptop computer. It recognized what he was saying almost perfectly. At least this means it works with a hybrid British and American accent (which is what my dad has) without problems.

Integration with random applications doesn't work as well as with the included DragonPad application (which looks like a modified version of WordPad). I'm actually okay with this. It's not a problem for me to dictate all my text and then copy and paste it where I need it. It would be nice if DragonPad would transparently save as you go, though. I've actually had Dragon crash for no apparent reason on my machine and I lost a bit of work.

That brings us to stability. The software seems kind of flaky, in my opinion. I have a fairly high standards when it comes to quality of applications. Most of the time it's not unstable enough to become a bother. The most common error I get is that every once in a while it pops up with some kind of dialogue about how it can't run the speech analyzer or something. I have no idea what the error means. If I close the application, restart it then run the speech analyzer manually it works just fine.

You do find yourself fiddling around with a microphone and other settings in order to get it to do recognize your speech better/faster. I've been having problems with Dragon inserting the word "him" every once in a while when I'm not saying anything. You may have seen this on one of my past posts. Go ahead, laugh. I don't mind. I fixed this problem by tweaking the sound driver I was using and also changing the "speed versus accuracy" slider setting in Dragon NaturallySpeaking's preferences.

The package I got was the most basic one. It didn't have things like integration with Microsoft Word. As a result I can't tell you how well it works with Microsoft Word. It works tolerably well with most applications and text entry boxes, however. Sometimes it doesn't insert a space when it should. Some other times it doesn't capitalize a letter like you would expect. Sometimes it just bugs out in a way that's so weird I can describe it. These problems don't exist to the same extent in DragonPad. I would expect that integration with word would be similar.

Dragon NaturallySpeaking does tend to use a lot of RAM. It looks like its using about 256 megs. My machine has 2 gigs of RAM. I didn't notice any problems I could trace back to lack of RAM. I'm surprised to find out that a program that uses as much ram as it does isn't 64-bit yet. On the other hand maybe I shouldn't talk since the program I work on, InteleViewer, is even more ram hungry but isn't 64-bit ready yet either.. at least we *run* on a 64-bit OS, though.

My processor is a 3.2 Ghz Core 2 Duo monster machine. Dragon runs fast on this machine as you would exepect. One thing I did notice, however, was that it wasn't threaded. If you're running on a multiprocessor machine Dragon will only use one of the processors to do it's thing. That means if you're trying to figure out if your machine is fast enough, don't count the number of processors you have. It really doesn't matter. Dragon recommends a 1Ghz CPU with SSE2 support and at least 1 Gig of ram.

My dad's old laptop uses a 1 Ghz laptop processor and, reportedly, works fine.

Dragon NaturallySpeaking definitely improves with training. Going through all the included training texts will improve accuracy. Dragon can also go through your old e-mails and documents to learn your writing style, which help quite a bit too.

The biggest problem I have with Dragon NaturallySpeaking is that I often speak too quickly. Dragon seems to be limited to about two hundred words a second. Faster than that and it tends to lose words. I tend to speak considerably faster than the average person and slowing doing to only 200 words a minute is quite an exercise in self control. Dictating at this speed give me two realizations. The first is "Wow, I speak really fast". The second is "200 words per second is insanely fast compared to typing!". Writing these blogs posts now takes a fraction of the time.

So there you have it. Apart from the odd quirk or two it actually works very well. I'm a happy camper.

By the way, this post, and the last few posts too, were dictated using Dragon.. so any mistakes you find our it's felt. :-) <= real dragon error there

Thursday, January 1, 2009

Market caculus


I am very surprised to see that Microsoft is still having trouble selling Vista. I would've expected, by this point, it would have been the uncontested operating system.

Every time that Microsoft has released an operating system there has been some resistance to it. Others, like Windows ME were a flop. Operating systems, like windows 98 SE and Windows XP were, eventually, a great success.

What does it take to have a successful operating system? Why did some operating systems do better than others? What I want to concentrate specifically on is how a company can release a follow-up and have it fail to catch on.

Let's look at word processors for moment. Word processors are like operating systems when it comes to whether or not their follow-ups will be successful. If I buy version 1 of a word processor I make a natural assumption that version 2 will be similar, better, more stable, have more features, and most importantly will be backwards compatible with all the files that I wrote in the previous version of the word processor. It will, in essence, be a better version of the product that I already own.

Let's say I buy version 2 of the word processor I really like. Version 2, however, does not work at all like version 1. It's, in fact, completely different. It doesn't have the bugs of the old version but then again it has brand-new ones. It has new features that the previous one doesn't but then again it's missing features that the previous one had. Finally, it doesn't read the files created by the old version. In a situation like this, I would argue that this is in fact a brand new product.

What is the difference between a version 2 of an existing product and a competitor (completely different product)?

Is it the name? Is it that I know I'm getting something with at least as many features as the previous version? Is it that I know will work with my old files? Is it that I know it will work in a way that I like and am used to?

If any one of these things is missing will it be a true version 2? How many of these things does it need to miss before calling it version 2 becomes a lie?

Instead of looking at version 1 and version 2 of the product why don't we just think of each version as a completely new product. By looking at it this way, version 2 of a product is actually in competition with version 1.

Version 2 has a lot going for it in its competition with the existing version of itself. For one thing it's called version 2. This naturally implies that it is like the previous version in every way that matters only better. While we could be a bit pedantic and assert that this is not necessarily the case, this is exactly the impression we give when we name a new version of a product version 2.

What does version 2 have to its advantage when in competition with other word processors in the marketplace? Well, if you didn't like version 1 because it was missing a certain feature then why not try version 2? It might have the things you need.

When someone is upgrading from version 1 to version 2 things that they are interested in is

1 -- I assume it still works the way I like.
2 -- Have they finally fixed that bug that annoys me?
3 -- I assume it is compatible with everything else that I'm running.
4 -- I assume it has all the features I need.
5 -- Does it have any new, useful features that I want or can make use of?

When someone is looking to switch to a competitor the things that they are interested in are

1 -- Does it do everything I need it to? Does it have all the features I want?
2 -- Will allow me to use my existing files?
3 -- Is it robust enough to be usable?

(Note: someone buying their first version of the work process or does something very similar to the person looking to switch from a competitor. The main difference is the competitor is something that is not a word processor. Another possibility is that this is the first time they have done anything that would require a word processor. In this case what they can do is ask people's advice and ask only question one.)

The barrier to go from version 1 to version 2 of an existing product is almost always lower than to switch to a competitor. The reason for this is because upgrading an existing version tends to be a drop-in replacement. So long as you're version 2 is a drop-in replacement of version 1 and doesn't do anything stupid like take away existing feature or make things that used to works just fine in the previous version now not work due to bugs, people will always upgrade.

If the barrier from version 1 to version 2 is high then the mask used by consumers to decide whether or not to upgrade becomes progressively more similar to the math they use in order to decide whether or not to switch to a competing product. If version 1 and version 2 are different in ways that matter to the consumer than the decision to upgrade or not becomes indiscernible from a decision to switch to a competitor.

The opposite is also true. If a competitor comes out with a product that is a drop-in replacement to an existing product but with more features, fewer bugs, fully compatible and a streamlined workflow that consumer is more likely to see a competing product in terms of an upgrade to their existing product. If you have such a product, all you need to do is wait for your competitor annoying their user base enough and you are practically guaranteed that your market share will increase (assuming you are known as an competitor).

Over the lifetime of a product the calculus of whether or not to invest developer time in a new feature or refinement of a good and existing feature slowly changes. When a market is new the overwhelming drive should be to add new features and to try to grab as much market share as possible by adding the features required by each segment of the marketplace. As the product matures it will have enough features for every segment of the marketplace. When that happens the competition becomes more about refinement to those features and about the features themselves.

This switch from competing on features to competing on the refinement of those features can occur very rapidly. As soon as a segment of the market (a niche, if you will) is satisfied with a given feature set they will tend to start to switch their purchasing decisions to be based more on refinements over features. It's not always obvious what the niche is our because nieces are dependent on feature sets rather than any other kind of categorization.

I think the way to play this game is to map out the niches available to a product is in terms of feature sets used (structured in terms of workflows used). Using this map you can then get a handle on which niches are the easiest to target in terms of how easy it is to implement the new features required to have a product for that niche. You then compared this to how many users (and how much money) you can make in that niche.

Once the market is saturated you need to shift to targeting your competitors by making the transition to and from your competitors as painless as possible.

Notice I said to and from your competitors and not simply from your competitors. The reason I say this is because it's very easy to switch to and from your competitors and there is no risk for your clients to switch to. It is difficult to go back then you need to be significantly better for your competitors before your clients consider going on with you. The reason is because it is far riskier for your clients. If you're competing in the marketplace we are far, far superior then you don't need to worry about going back to your competitors. If you're working in a marketplace where the differentiation between your competitors and yourself is small then you need to worry about the fact your client is taking a risk. If you're product was drastically better they would go with you but since it's not a fear of switching will win out over the potential benefits.

(I could go on on this topic for long time. If you're interested in this sort of thing I would recommend reading "Crossing the Chasm", and "Inside the Tornado" by Geoffrey A. Moore.)

How does all this play out with Vista and XP? Well, let me ask you this: what is the new feature in Vista that will make you upgrade from XP? Let me also ask you this: what's the downside to upgrading to Vista? Will your drivers work? Will your programs all work? Will your hardware, scanners, printers etc... all work? Is Vista more robust than XP? Is Vista faster than XP? Does Vista to work like XP? (I actually know the answer that question! It's no. Sure, it's similar but the answer is still no.) Is there any compatibilities with Vista and XP? Like CD-ROM formats? Like network problems?

And there you have it. People aren't buying Vista because it creates problems and it doesn't offer anything compelling enough to justify fighting through the hassle.

I think Microsoft is too concerned with coming out with something different. If they just came out with something slightly better than XP they be able to sell it. If they reduce the memory, increase the speed, improve the experience of setting up a network (!!!) And they probably have enough to endear themselves to the consumer space. If you do that, though you can't break backwards compatibility. The thing is, I don't think Microsoft has the ability to come up with something slightly more compelling than XP. The company is undoubtedly geared towards large features and sculpting its software based on user feedback. In order to go beyond that they would need to pull user feedback and go do some real usability testing. Anyway, maybe I'll elaborate on that later.

How will this Vista upgrade disaster interact with the 2 gig / 64 bit barrier? Will the need to address more memory be Vista killer feature? yuck.



(Funny enough, years ago when Apple transitioned from Mac OS nine to MacOS X I switch to Windows. I've never regretted that decision.)

(Yay, no wikipedia links in this one!)