If anyone had any doubts about the importance of technology to the modern business, they only have to ask one of Virgin Blue’s staff or customers about the last three days of disruption.
“An external supplier’s hardware failure” is the given reason for the problems and it shows how we all need to be conscious of the key “choke points” in our business processes where a disruption will quickly bring operations to a crawl or stop.
For any organisation risk arises when those choke points rely on one thing — it could be a person, a computer or a physical widget — for the system to keep running. Should that one item fail, then the organisation stops. In Virgin’s case that thing appears to have been a router or server controlling their booking systems.
A single point of failure is the Achilles heel of any organisation, any one item that can disrupt operations has to be identified and contingencies developed so when a failure happens, and it will, the organisation can quickly move to work around it.
In Virgin’s case it appears they were prepared for a disruption of up to three hours but when the booking system outage dragged on for 21 hours their fallback procedures were simply overwhelmed.
We often think of these things as technically related but often it’s something more mundane like a burst watermain blocking access to your shop or only one person, who happens to be driving along the Gunbarrel Highway for the next six weeks, has the keys to the fuse box.
In fact, those human points of failure, where only one person in the organisation knows the combination to the safe, the bank account PIN or the password to the company’s servers, are probably the riskiest points of failure of all.
Another common point of failure is relying on supplier contracts and service level agreements. Warranties and indemnities are nice to have, assuming they are enforceable when you need them, but they won’t fix the damage to a company’s reputation when a crisis on Virgin’s scale hits.
Even if you have a guaranteed response time, as it appears Virgin had, you need to have something in place to keep the business running in the meantime. Also “response time” is how long it takes your supplier to start doing something about the problem, not the actual time to fix.
Regardless of how well we plan and how watertight our supplier contracts and SLAs are, crises happen and that’s when the quality of a business and its management are tested. One sure indicator of a poorly run, bureaucratic organisation is when management hide at the first sign of trouble.
For Virgin, that’s a good sign. I had to reluctantly call them yesterday to deal with a problem and ended up with a good customer experience.
The very helpful Ruby not only called me back when the line dropped out but she also revealed she was a PA, not a regular call centre worker and all the office staff, including managers, were manning the phones.
Ruby turned out to be a real gem, not only quickly fixing my problem but also wiping out the additional charges without prompting.
That at least is an encouraging sign about their organisation and I hope Ruby and her colleagues get a thank you from the man with the beard when the problems settle down.
Virgin’s problems though show us that as business owners and managers, we need to understand where the points of failure are in our organisations and how we would deal with them should bad luck strike. You might want to walk around your business, sit down with your staff and work through where the points of failure in your organisation may be.
Should you want to read of a really scary point of failure, the tale of the virus infecting Iran’s nuclear program is a good place to start. The writers of the Stuxnet worm have done a spectacular job in bringing together a number of security problems and then using two weak links, unpatched Windows servers and poorly designed programmable logic controller software, to create a mighty mess in the target organisation.
The scary thing with a rootkit like Stuxnet is that once it has got into the system, you can never be sure whether you’ve properly got rid of it. What’s worse, you’ll never know what changes it might have carried out on those industrial controllers essential for you plant’s operations and safety.
So thank your lucky stars you aren’t running an Iranian nuclear facility or an Australian domestic airline today, you at least have time to think about where your points of failure might be.
For more Tech Talk blogs, click here.
Paul Wallbank will be teaming up with fellow SmartCompany blogger Lara Solomon and Business Angels’ Michelle Gamble to present Small Business Internet Marketing Secrets in Sydney on September 28. Spaces are filling fast so book quickly for the early bird price.
Comments