Thursday, December 21, 2006

too much reliability?

I've started tuning into David Brin's blog. I've come across his views from time to time. Sometimes I agree with them, often I don't, but they almost always get me thinking. Case in point, in this entry he discusses the failure of the cell network in New Orleans and advocates an emergency packet relay mode built into cell phones:
How about a simple back-up mode for text messaging? One that could use packet-switching to bypass the cell towers when they are down, and pass messages from phone to phone -- or peer-to-peer -- at least among phones that are of the same type?
It got me thinking. Maybe the problem with the cell networks, and the power grid for that matter, is not that they're not reliable enough. Maybe the problem is that they're too reliable. You see, one of the big problems with a back-up system is that, after you get it in place, you still have to test it to make sure it'll work. You might be religious about backing up the files on your computer, for instance, but how many times have you tried to restore them?

It reminds me of the pattern with preventing forest fires. For years, fire suppression was the goal. That caused dead wood to accumulate. Now, when fires do break out, they're a lot nastier because there's all this dead wood. If we'd had small fires more often, the dead wood never would have had the chance to accumulate.

It also reminds me of Y2K planning. Someone was surveying Y2K preparations in a developing country. They asked where person X got his food. Answer: I go next door and slaughter a goat. Question: what would you do if the power went out and your refrigerator stopped working. Answer: I wouldn't slaughter the goat. In areas where service is less reliable, people not only create their own back-up systems, but they also know how to use them because the unreliable service forces them to "test" the back-ups fairly often.

Obviously, even if decreasing reliability makes the system more rugged, there's going to be a trade-off. Electricity and communication are Good Things. They make the economy work. Make them too unreliable and the economy starts to deteriorate. Perhaps there's a nice mathematical model that could estimate how much reliability is optimal. But in any case, I wonder if designing for "five 9's" reliability (99.999% reliable, or five minutes of outage per year) might actually be counterproductive in the long run.

No comments: