You often hear vendors mentioning their system has five 9s of uptime, but what exactly is uptime?
Some define uptime only for their own specific piece of technology. For example, a storage array with five 9s uptime, can only tolerate 5 minutes and 15 seconds of downtime per year, but if your network vendor also has a five 9 uptime specification and your power company and your data center and your internet provider and a whole lot of other components…. do the math!
- storage array
- Host (with virtual machines for example)
- data center Network
- data center power feed / power components
- internet / VPN / any connection to your office
- office network
- office power feed / power components
- your desktop pc / laptop
In my example there’s 9 consecutive failure domains. Suppose each “domain” has five 9s of guaranteed uptime, the math about the uptime for the end-user is:
0.99999ˆ9 (to the power of) = 0.99991
But let’s face it: not all components actually have five 9s uptime.
How much downtime is five 9s availability per year?
Well, this list shows how much downtime a certain number of nines is:
- 99.999% = 5 m 15 sec
- 99.99% = 52 m 36 sec
- 99.9% = 8 h 46 m
- 99% = 87 h 40 m –> I don’t know any component that has this much downtime per year
Let’s make a few assumptions:
- storage array: 99.999%
- SAN: 99.999%
- Host (with virtual machines for example): 99.99%
- data center Network: 99.99%
- data center power feed / power components: 99.999%
- internet / VPN / any connection to your office: 99.9%
- office network: 99.95%
- office power feed / power components: 99.95%
- your desktop pc / laptop: 99.5%
The uptime for this specific example is:
99.999% x 99.999% x 99.99% x 99.99% x 99.999% x 99.9% x 99.95% x 99.95% x 99.5%
This equals to 99.27% uptime in total and this equals to 63 hours of downtime per year.
As you can see that the weakest link weighs heavily on the amount of downtime per year. Improving uptime for a component that already has five 9s (Storage, SAN and DC power) makes the downtime go down to 62.4 hours.
But if the uptime of the desktop is improved from 99.5% to 99.9% the downtime goes down from 63 to 28 hours!!! Wait, WHAT?
So improving the weakest link has a much larger impact on uptime than hardening the toughest product?
In my opinion improving from five to 6 or even 7 nines uptime for a single component, when the whole “network” of component still has serious flaws, has little to no effect. It’s better to invest in buying better products that have 3 instead of 2 nines uptime (or similar).