Resilience has become a hot topic for CGNET’s customers. Resilient IT services – ones that continue to run 24/7 to serve users in every time zone – have become critical for IT departments of every kind. CGNET had the opportunity to work with a large international organization recently, helping to enhance the resilience of their services. In a series of three blog posts, we will describe a few things we learned about IT planning for disaster recovery and availability.
Our customer is a large non-governmental organization with field offices in dozens of countries around the world. Its headquarters is in Europe, and in the last few years it has been consolidating its IT services into headquarters. In the past, when Internet services were slow and expensive in many countries, this organization had kept servers in field offices and hired staff in each location to maintain them. This resulted in some problems, as mobile users had trouble accessing those servers, and local staff often could not maintain them. So, as communications improved they decided to centralize IT services in headquarters, both to save money and to provide a higher level of service.
Causes of Downtime
To make this transition, it was important to ensure that all IT services like email, intranets and databases were provided online with as little downtime as possible. There were many reasons why a service might become unavailable. It might be the result of an equipment failure, or a communications failure, or a regularly scheduled maintenance task such as a data backup or software upgrade. In moving services from the field to headquarters it was important to make services operate despite these potential problems. Making IT services resilient to potential sources of disruption became a high priority.
New Opportunities as Costs Decline
Fortunately the technology to support resilience has become much more affordable in the last few years. At the core of IT resilience is redundancy: having multiple components or systems available so that when one component is not in service others can pick up the load. This has become much more affordable because the cost of equipment and communications has gone down while the technology for keeping systems running has improved. Critically important has been the arrival of system virtualization using software such as that from VMware. Thanks to virtualization, having a second copy of a server running as a virtual machine can be quite inexpensive. Furthermore, by running software as virtual machines, it is easier to duplicate the software in multiple physical servers or even in multiple data centers.
In the next two blog posts we’ll provide some details how this organization applied redundancy and virtualization to achieve resilience. First we will look at what they did inside their data center, and then what they did to make the data center itself more resilient.