It's an unfortunate truth but downtime does happen. This is not due to complacent IT staff but occurs because their networks are not optimised for peak performance - more on this in a second.
Downtime means many different things to many different people. For a customer, it is an indicator to take their business elsewhere and, for a business user, it's a frustrating but forgiveable offence. But to a business owner, and this includes IT departments, the impact on downtime means breaking an SLA and potentially losing millions in revenue.
Downtime can cost the average modern day company in excess of $500,000 per hour. And the productivity impact is estimated at more than $46 million per year for a Fortune 500 company, according to business information experts Dunn and Bradstreet. That's a lot of money, making downtime an unaffordable option for any company. As reliance on IT systems increases, downtime also becomes an increasing risk for many companies.
Another study in the US quantified the average cost of an unplanned data centre outage at just less than $8,000 per minute. This is a 41% increase from the $5,600 it was in 2010. We have no doubt that the number has risen in subsequent years and will continue to climb.
So we should not be asking whether downtime is an option (as it is clearly not), the real question is: why does downtime still occur? There are several reasons behind a period of downtime but a major contributing factor is network overload. IT complexity has exploded in recent times as more devices are being used, consuming more data, and businesses adopt an increasing range of diverse and disparate systems that can be difficult to monitor effectively.
With the exception of unscrupulous security incidents, such as a DoS attack, downtime usually occurs because of bottlenecks in IT architecture as systems run over capacity. In other words, a company's IT landscape must prioritise its bandwidth to business-critical applications to remove bottlenecks so that slowdowns do not turn into downtime.
This can be achieved with the right monitoring tool; for example, ELK stack - the trio of Elasticsearch, Logstash and Kibana 4 - can identify bottlenecks and pinpoint other such problems quickly. Such proactive tools do not just prevent crippling business impacts, they also help IT departments allocate resources effectively to mitigate issues before users are impacted, and rapidly find and fix any problems that do occur. Plus, of course, there are huge potential savings to be made from preventing downtime.