EVOLUTION OF MODERN DATACENTER
Business drivers and technological constraints that are shaping up the modern datacenters
The datacenter has been evolving from mainframes to the current cloud model due to various business drivers and technological constraints. There are certain aspects of datacenter without which modern datacenters will be of no use to the business organizations. Availability: According to NARA, 93% of businesses that have lost availability in their datacenter for 10 days or more have filed for bankruptcy within one year. On the other hand, when financial data provider Bloomberg went dark one morning in early April 2015, it interrupted the sale of three billion pounds of treasury bills by the United Kingdom’s Debt Management Office. The data center outage was caused by a combination of hardware and software failures in the network, which led to disconnec- tions that lasted one to two hours for most customers. These instances indicate how serious is the problem of data center downtime (unavailability). The downtimes can cost lot of money and significantly affect how the customers perceive a company.
Reliability is the ability of a system to perform its required functions under stated conditions for a specified period of time. Where as availability is the proportion of time a system is in a functioning condition. This is often mathematically expressed as 100% minus unavailability.
The goal for many companies is 99.9999% availability, but with each nine you add, costs can increase greatly. Moving from one level to another can encompass things from redundant servers to redundant storage frames or even duplicate datacenters. This availability journey
can cost thousands or millions of dollars to reach the 99.9999% uptime level. The decision to move forward with this level of uptime should not be an IT decision, but a business decision.
AVAILABILITY IN THE MODERN DATACENTER
Now what it takes to maintain high levels of uptime in the datacenter? Is it that buying highly availability infrastructure enough? However, it doesn’t look so and here is why? Reputable studies have shown that 75% of downtime is the result of some sort of human error and the rest is due to some sort of equipment or software failure. Even the well trained IT people do mistakes when they are in a rush, are tired, weren’t really thinking, or just took a shortcut. With ever growing data center complexity, it would be impossible to prevent every human error or equipment failure leading to an outage. The question in front of us is whether our datacenters are really resilient as desired? Or the occasional outages just a fact of life?
The application dependence on 100% infrastructure availability is where the fundamental problem stems from. Think if application designers had choice of relaxing infrastructure availability but designed with the idea that the outage is normal in the data center. Embracing failure gives us true application resiliency because the failure protection is no longer a infrastructure problem alone. This shift in thinking made Google to produce Google File System, back in 2003, a distributed file system for their data center designed with system failures in mind.
Alleviation of availability requirements from infrastructure means, one doesn’t require high-end costly systems. The continued innovations in CPUs (multicore) have made them more powerful, disks of Terabyte capacity are more common now a days (lesser $/GB) and networks have become much faster (10/40 GbE). Thus, today, commodity servers have power of mainframe computer in just 1 or 2U form factor at the fraction of Mainframe computer cost. These modular commodity servers, packed with redundant components for availability and with better supply chain make them well suited for scaling data center capacity growth on-demand.
In summary, the big shift in the data center is on how the availability was viewed from application point of view. Today’s applications are distributed, designed with failure in mind and can scale to 1000+ nodes on commodity servers. This is also apparent with Netflix and its Chaos Monkey Engineering group. Netflix faced a massive reboot of their application instances on cloud. Their group repeatedly and regularly exercises failure of their distributed application, continually testing and correcting the is- sues before they can create widespread outages; Netflix has created a service designed with failure in mind to ensure availability at lower costs.
Agility: A simple way to measure agility of an organization is to assess how fast it can respond to the changing business circumstances. For the Data Center, it means how fast a new application deployment request can be fulfilled either by buying, building or repurposing the existing IT infrastructure. For example, by adopting agile IT infrastructure PayPal was able to execute product cycles 7 times faster than a year ago whereas earlier it took them 100 tickets and 3 weeks to provision new servers.
Traditionally, IT managers are tasked with planning capacity requirements ahead of time to avoid unplanned downtimes, procurement delays etc. Planning is done to avoid these overheads so that IT staff can concentrate on developing new applications that bring new business to the organization. Capacity planning usually has following steps: Determine the SLAs required for the business Analyze how the current infrastructure are meeting those SLAs
Forward projecting the future capacity requirements through modeling
There is always a risk of underestimating future requirements, hence the modeling includes headroom capacity for unplanned capacity requirements. In reality, most of the time the allocated capacity is higher than the actually required resulting in waste of capacity and money spent on unused IT. In a nutshell, such capacity planning usually end up spending more dollars than required and in the event of business changes it would take humongous task for IT to repurpose the existing infrastructure and can also lead into undersupply sometimes.
AGILITY IN THE MODERN DATACENTER
The advent of distributed/decentralized systems and their ability to scale in small increments to 1000 of nodes using commodity hardware on-demand has made the capacity planning a thing of past. Distributed systems provide the ability to start small and then grow at the pace of organizational growth at higher scale leading to ‘pay as you grow’ services. This is the basis on which Software as a Service (SaaS) cloud computing is being offered. The distributed architecture enables them to grow quickly, shrink and repurpose for something else in just few clicks.
Today, many distributed applications (e.g. Hadoop, Spark, Mongo DB and Cassandra, etc) are churning big data to produce actionable business value to the organizations. The need of the hour is that data center should
be able scale to these application demands seamlessly. Apache Mesos is one such framework, which fixes the static partitioning problem in distributed applications via API for dynamic sharing of resources.
In summary, going forward distributed applications and commodity hardware will dominate the datacenters providing the much-needed agility to organization to quickly respond to changing business requirements; all in just few clicks.
Efficiency: The first thing to measure efficiency of modern datacenter is to measure Power Usage Effectiveness (PUE). (PUE is defined as Total Facility Energy/IT Equipment Energy).
It’s a measure of how effectively you deliver power and cooling to the IT equipment. According to the Uptime Institute, the typical data center has an average PUE of 2.5. This means that for every 2.5 watts coming ‘in’ at the utility meter, only one watt is delivered out to the IT load. Uptime estimates most facilities could achieve 1.6 PUE by using the most efficient equipment and best practices. Ideally, reducing the overhead and getting PUE to 1.0 is what one would like to achieve for the datacenter.
EFFICIENCY IN THE MODERN DATACENTER
Modern datacenters use best practices to reduce the power overheads by managing the airflow, utilizing free cooling, etc. That’s just one angle to the efficiency, the other being use of energy saving software and hardware technologies the data center energy requirements can be reduced to some extent. Following technologies are used in the modern datacenters for higher equipment utilization
Server Virtualization: Share physical resources among application instances thus increasing the effective utilization of the server Compression and Dedupe: These are the data reduction techniques used and for certain applications can yield 10 times higher usable capacity, thus reducing the amount of power required to host those capacities without these techniques
Flash Storage: The power consumption of flash is much lesser than disk systems. Hence drastic power bills could be reduced for some applications like big data analytics.