BY GE­ORGE CRUMP Ge­orge Crump is Pres­i­dent and Founder of Stor­age Switzer­land

umans are quickly be­ing out­num­bered by In­ter­net­con­nected de­vices that are con­stantly col­lect­ing and trans­mit­ting data. The term used to de­scribe this is the In­ter­net of Things. Re­gard­less of how you feel about it, the ex­plo­sion in ma­chine-gen­er­ated data is chang­ing stor­age and data pro­tec­tion for­ever. These ma­chines — or things — per­form a range of tasks, rel­a­tively sim­ple func­tions like cap­tur­ing im­ages and up­load­ing them to so­cial shar­ing sites to cap­tur­ing and trans­mit­ting more com­pli­cated sen­sor data and send­ing real-time in­for­ma­tion on an or­ga­ni­za­tion’s var­i­ous as­sets. Thanks to an­a­lyt­ics, businesses now want the abil­ity to, say, com­pare the cur­rent con­di­tion of their as­sets com­pared with five years ago.


The im­pact on stor­age at first seems fairly ob­vi­ous: There is more data to store. The less ob­vi­ous part is that ma­chine-gen­er­ated data comes in two dis­tinct types, cre­at­ing two en­tirely dif­fer­ent chal­lenges. First, there is large­file data, such as im­ages and videos cap­tured from smart­phones and other de­vices. This data type is typ­i­cally ac­cessed se­quen­tially. The sec­ond data type is very small, for ex­am­ple, log-file data cap­tured from sen­sors. These sen­sors, while small in size, can cre­ate bil­lions of files that must be ac­cessed ran­domly.

It used to be that a data cen­ter would have only one of these data types: They were ei­ther in the busi­ness of cap­tur­ing im­age-based data or they were not. Now, how­ever, data cen­ters must deal with both data types, and the two usu­ally re­quire dif­fer­ent stor­age sys­tems — one de­signed for large-file se­quen­tial I/O and the other for small­file ran­dom I/O.

His­tor­i­cally, im­age-based data has typ­i­cally been placed on large-ca­pac­ity NAS sys­tems, but we are see­ing a shift to ob­ject-based stor­age, es­pe­cially at scale. Sen­sor data, usu­ally stored on high-per­for­mance NAS sys­tems, is mov­ing to all-flash ar­rays, pri­mar­ily to al­low faster an­a­lyt­ics.


The data gen­er­ated from the In­ter­net of Things also has a big im­pact on data pro­tec­tion. Most, if not all, of this data can never be recre­ated — an im­age and soil sam­ple from last year, for ex­am­ple, will never be the same as it was on the day it was col­lected. There­fore, data pro­tec­tion is po­ten­tially even more crit­i­cal than it is for more con­ven­tional data. The chal­lenges such data brings to stor­age also im­pact data pro­tec­tion — es­pe­cially when deal­ing with sen­sor data, as most backup ap­pli­ca­tions don’t han­dle bil­lions of files well.

The an­swer may be to not back this data up at all, but in­stead in­te­grate data pro­tec­tion to the ar­chive process. An ideal ap­proach might in­volve a tape-in­te­grated NAS so­lu­tion that has a disk or flash cache in front of a large tape li­brary. As data lands on the disk cache, copies can be made to mul­ti­ple tape de­vices. This pro­vides high-per­for­mance ac­cess for an­a­lyt­ics pro­cess­ing, but also ex­cel­lent pro­tec­tion and long-term re­ten­tion.

One con­sis­tent ex­pe­ri­ence I have ob­served from IT man­agers who deal with the In­ter­net of Things is how quickly data trans­forms from merely in­ter­est­ing to mis­sion-crit­i­cal and in need of anal­y­sis, re­ten­tion, and pro­tec­tion. The stor­age sys­tems for these ini­tia­tives al­most al­ways start out ad hoc and then be­come a fo­cal point. If you have sen­sors, or things, that are cre­at­ing data, keep an eye on that data now. Pro­tect it and be pre­pared for it to be­come more im­por­tant to the or­ga­ni­za­tion.

