Busi­ness driv­ers and tech­no­log­i­cal con­straints that are shap­ing up the mod­ern dat­a­cen­ters

Dataquest - - FRONT PAGE - The au­thor is Se­nior Tech­ni­cal An­a­lyst, NetApp In­dia

The datacenter has been evolv­ing from main­frames to the cur­rent cloud model due to var­i­ous busi­ness driv­ers and tech­no­log­i­cal con­straints. There are cer­tain as­pects of datacenter without which mod­ern dat­a­cen­ters will be of no use to the busi­ness or­ga­ni­za­tions. Avail­abil­ity: Ac­cord­ing to NARA, 93% of busi­nesses that have lost avail­abil­ity in their datacenter for 10 days or more have filed for bankruptcy within one year. On the other hand, when fi­nan­cial data provider Bloomberg went dark one morn­ing in early April 2015, it in­ter­rupted the sale of three bil­lion pounds of trea­sury bills by the United King­dom’s Debt Man­age­ment Of­fice. The data cen­ter out­age was caused by a com­bi­na­tion of hard­ware and soft­ware fail­ures in the net­work, which led to dis­con­nec- tions that lasted one to two hours for most cus­tomers. Th­ese in­stances in­di­cate how se­ri­ous is the prob­lem of data cen­ter down­time (un­avail­abil­ity). The down­times can cost lot of money and sig­nif­i­cantly af­fect how the cus­tomers per­ceive a com­pany.

Re­li­a­bil­ity is the abil­ity of a sys­tem to per­form its re­quired func­tions un­der stated con­di­tions for a spec­i­fied pe­riod of time. Where as avail­abil­ity is the pro­por­tion of time a sys­tem is in a func­tion­ing con­di­tion. This is of­ten math­e­mat­i­cally ex­pressed as 100% mi­nus un­avail­abil­ity.

The goal for many com­pa­nies is 99.9999% avail­abil­ity, but with each nine you add, costs can in­crease greatly. Mov­ing from one level to an­other can en­com­pass things from re­dun­dant servers to re­dun­dant stor­age frames or even du­pli­cate dat­a­cen­ters. This avail­abil­ity jour­ney

can cost thou­sands or mil­lions of dol­lars to reach the 99.9999% up­time level. The de­ci­sion to move for­ward with this level of up­time should not be an IT de­ci­sion, but a busi­ness de­ci­sion.


Now what it takes to main­tain high lev­els of up­time in the datacenter? Is it that buy­ing highly avail­abil­ity in­fra­struc­ture enough? How­ever, it doesn’t look so and here is why? Rep­utable stud­ies have shown that 75% of down­time is the re­sult of some sort of hu­man er­ror and the rest is due to some sort of equip­ment or soft­ware fail­ure. Even the well trained IT peo­ple do mis­takes when they are in a rush, are tired, weren’t re­ally think­ing, or just took a short­cut. With ever grow­ing data cen­ter com­plex­ity, it would be im­pos­si­ble to pre­vent ev­ery hu­man er­ror or equip­ment fail­ure lead­ing to an out­age. The ques­tion in front of us is whether our dat­a­cen­ters are re­ally re­silient as de­sired? Or the oc­ca­sional out­ages just a fact of life?

The ap­pli­ca­tion de­pen­dence on 100% in­fra­struc­ture avail­abil­ity is where the fun­da­men­tal prob­lem stems from. Think if ap­pli­ca­tion de­sign­ers had choice of re­lax­ing in­fra­struc­ture avail­abil­ity but de­signed with the idea that the out­age is nor­mal in the data cen­ter. Em­brac­ing fail­ure gives us true ap­pli­ca­tion re­siliency be­cause the fail­ure pro­tec­tion is no longer a in­fra­struc­ture prob­lem alone. This shift in think­ing made Google to pro­duce Google File Sys­tem, back in 2003, a dis­trib­uted file sys­tem for their data cen­ter de­signed with sys­tem fail­ures in mind.

Al­le­vi­a­tion of avail­abil­ity re­quire­ments from in­fra­struc­ture means, one doesn’t re­quire high-end costly sys­tems. The con­tin­ued in­no­va­tions in CPUs (mul­ti­core) have made them more pow­er­ful, disks of Ter­abyte ca­pac­ity are more com­mon now a days (lesser $/GB) and net­works have be­come much faster (10/40 GbE). Thus, to­day, com­mod­ity servers have power of main­frame com­puter in just 1 or 2U form fac­tor at the frac­tion of Main­frame com­puter cost. Th­ese mod­u­lar com­mod­ity servers, packed with re­dun­dant com­po­nents for avail­abil­ity and with bet­ter sup­ply chain make them well suited for scal­ing data cen­ter ca­pac­ity growth on-de­mand.

In sum­mary, the big shift in the data cen­ter is on how the avail­abil­ity was viewed from ap­pli­ca­tion point of view. To­day’s ap­pli­ca­tions are dis­trib­uted, de­signed with fail­ure in mind and can scale to 1000+ nodes on com­mod­ity servers. This is also ap­par­ent with Net­flix and its Chaos Mon­key En­gi­neer­ing group. Net­flix faced a mas­sive re­boot of their ap­pli­ca­tion in­stances on cloud. Their group re­peat­edly and reg­u­larly ex­er­cises fail­ure of their dis­trib­uted ap­pli­ca­tion, con­tin­u­ally test­ing and cor­rect­ing the is- sues be­fore they can cre­ate wide­spread out­ages; Net­flix has cre­ated a ser­vice de­signed with fail­ure in mind to en­sure avail­abil­ity at lower costs.

Agility: A sim­ple way to mea­sure agility of an or­ga­ni­za­tion is to as­sess how fast it can re­spond to the chang­ing busi­ness cir­cum­stances. For the Data Cen­ter, it means how fast a new ap­pli­ca­tion de­ploy­ment re­quest can be ful­filled ei­ther by buy­ing, build­ing or re­pur­pos­ing the ex­ist­ing IT in­fra­struc­ture. For ex­am­ple, by adopt­ing ag­ile IT in­fra­struc­ture PayPal was able to ex­e­cute prod­uct cy­cles 7 times faster than a year ago whereas ear­lier it took them 100 tick­ets and 3 weeks to pro­vi­sion new servers.

Tra­di­tion­ally, IT man­agers are tasked with planning ca­pac­ity re­quire­ments ahead of time to avoid un­planned down­times, procurement de­lays etc. Planning is done to avoid th­ese over­heads so that IT staff can con­cen­trate on de­vel­op­ing new ap­pli­ca­tions that bring new busi­ness to the or­ga­ni­za­tion. Ca­pac­ity planning usu­ally has fol­low­ing steps: De­ter­mine the SLAs re­quired for the busi­ness An­a­lyze how the cur­rent in­fra­struc­ture are meet­ing those SLAs

For­ward pro­ject­ing the fu­ture ca­pac­ity re­quire­ments through model­ing

There is al­ways a risk of un­der­es­ti­mat­ing fu­ture re­quire­ments, hence the model­ing in­cludes head­room ca­pac­ity for un­planned ca­pac­ity re­quire­ments. In re­al­ity, most of the time the al­lo­cated ca­pac­ity is higher than the ac­tu­ally re­quired re­sult­ing in waste of ca­pac­ity and money spent on un­used IT. In a nut­shell, such ca­pac­ity planning usu­ally end up spend­ing more dol­lars than re­quired and in the event of busi­ness changes it would take hu­mon­gous task for IT to re­pur­pose the ex­ist­ing in­fra­struc­ture and can also lead into un­der­sup­ply some­times.


The ad­vent of dis­trib­uted/de­cen­tral­ized sys­tems and their abil­ity to scale in small in­cre­ments to 1000 of nodes us­ing com­mod­ity hard­ware on-de­mand has made the ca­pac­ity planning a thing of past. Dis­trib­uted sys­tems pro­vide the abil­ity to start small and then grow at the pace of or­ga­ni­za­tional growth at higher scale lead­ing to ‘pay as you grow’ ser­vices. This is the ba­sis on which Soft­ware as a Ser­vice (SaaS) cloud com­put­ing is be­ing of­fered. The dis­trib­uted ar­chi­tec­ture en­ables them to grow quickly, shrink and re­pur­pose for some­thing else in just few clicks.

To­day, many dis­trib­uted ap­pli­ca­tions (e.g. Hadoop, Spark, Mongo DB and Cas­san­dra, etc) are churn­ing big data to pro­duce ac­tion­able busi­ness value to the or­ga­ni­za­tions. The need of the hour is that data cen­ter should

be able scale to th­ese ap­pli­ca­tion de­mands seam­lessly. Apache Me­sos is one such frame­work, which fixes the static par­ti­tion­ing prob­lem in dis­trib­uted ap­pli­ca­tions via API for dy­namic shar­ing of re­sources.

In sum­mary, go­ing for­ward dis­trib­uted ap­pli­ca­tions and com­mod­ity hard­ware will dom­i­nate the dat­a­cen­ters pro­vid­ing the much-needed agility to or­ga­ni­za­tion to quickly re­spond to chang­ing busi­ness re­quire­ments; all in just few clicks.

Ef­fi­ciency: The first thing to mea­sure ef­fi­ciency of mod­ern datacenter is to mea­sure Power Us­age Ef­fec­tive­ness (PUE). (PUE is de­fined as To­tal Fa­cil­ity En­ergy/IT Equip­ment En­ergy).

It’s a mea­sure of how ef­fec­tively you de­liver power and cool­ing to the IT equip­ment. Ac­cord­ing to the Up­time In­sti­tute, the typ­i­cal data cen­ter has an av­er­age PUE of 2.5. This means that for ev­ery 2.5 watts com­ing ‘in’ at the util­ity me­ter, only one watt is de­liv­ered out to the IT load. Up­time es­ti­mates most fa­cil­i­ties could achieve 1.6 PUE by us­ing the most ef­fi­cient equip­ment and best prac­tices. Ideally, re­duc­ing the over­head and get­ting PUE to 1.0 is what one would like to achieve for the datacenter.


Mod­ern dat­a­cen­ters use best prac­tices to re­duce the power over­heads by manag­ing the air­flow, uti­liz­ing free cool­ing, etc. That’s just one an­gle to the ef­fi­ciency, the other be­ing use of en­ergy sav­ing soft­ware and hard­ware tech­nolo­gies the data cen­ter en­ergy re­quire­ments can be re­duced to some ex­tent. Fol­low­ing tech­nolo­gies are used in the mod­ern dat­a­cen­ters for higher equip­ment uti­liza­tion

Server Vir­tu­al­iza­tion: Share phys­i­cal re­sources among ap­pli­ca­tion in­stances thus in­creas­ing the ef­fec­tive uti­liza­tion of the server Com­pres­sion and Dedupe: Th­ese are the data re­duc­tion tech­niques used and for cer­tain ap­pli­ca­tions can yield 10 times higher us­able ca­pac­ity, thus re­duc­ing the amount of power re­quired to host those ca­pac­i­ties without th­ese tech­niques

Flash Stor­age: The power con­sump­tion of flash is much lesser than disk sys­tems. Hence dras­tic power bills could be re­duced for some ap­pli­ca­tions like big data an­a­lyt­ics.

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.