Open Source Stor­age So­lu­tions You Can De­pend On

Stor­age space is at a pre­mium with petabytes and ter­abytes of data be­ing gen­er­ated al­most on a daily ba­sis due to mod­ern day liv­ing. Open source stor­age so­lu­tions can help mit­i­gate the stor­age prob­lems of in­di­vid­u­als as well as small and large scale en­ter

OpenSource For You - - Contents - By: Vivek Ratan The au­thor has com­pleted his B.Tech in elec­tron­ics and in­stru­men­ta­tion en­gi­neer­ing. He is cur­rently work­ing as an au­to­ma­tion test en­gi­neer at In­fosys, Pune and as a free­lance ed­u­ca­tor at Learn­erKul, Pune. He can be reached at ratan­vivek14@

We have all been ob­serv­ing a sud­den surge in the pro­duc­tion of data in the re­cent past and this will un­doubt­edly in­crease in the years ahead. Al­most all the ap­pli­ca­tions on our smart­phones (like Facebook, In­sta­gram, What­sApp, Ola, etc) gen­er­ate data in dif­fer­ent forms like text and im­ages, or de­pend on data to work upon. With around 2.32 bil­lion smart­phone users across the globe (as per the lat­est data from statista.com) hav­ing in­stalled mul­ti­ple ap­pli­ca­tions, it cer­tainly adds up to a re­ally huge amount of data, daily. Apart from this, there are other sources of data as well like dif­fer­ent Web ap­pli­ca­tions, sen­sors and ac­tu­a­tors used in IoT de­vices, process au­to­ma­tion plants, etc. All this cre­ates a re­ally big chal­lenge to store such mas­sive amounts of data in a man­ner that can be used as and when needed.

We all know that our busi­nesses can­not get by with­out stor­ing our data. Sooner or later, even small busi­nesses need space for data stor­age—for doc­u­ments, pre­sen­ta­tions, e-mails, im­age graph­ics, au­dio files, data­bases, spread­sheets, etc, which act as the lifeblood for most com­pa­nies. Be­sides, many or­gan­i­sa­tions also have some con­fi­den­tial in­for­ma­tion that must not be leaked or ac­cessed by any­one, in which case, se­cu­rity be­comes one of the most im­por­tant as­pects of any data stor­age so­lu­tion. In crit­i­cal health­care ap­pli­ca­tions, an or­gan­i­sa­tion can­not af­ford to run out of me­mory, so data needs to be mon­i­tored at each and ev­ery sec­ond.

Stor­ing dif­fer­ent kinds of data and man­ag­ing its stor­age is crit­i­cal to any com­pany’s be­hind-the-scenes suc­cess. When we look for a so­lu­tion that cov­ers all our stor­age needs, the pos­si­bil­i­ties seem quite end­less, and many of them are likely to con­sume our pre­cious IT bud­gets. This is why we can­not af­ford to over­look open source data stor­age so­lu­tions. Once you dive into the open source world, you will find a huge ar­ray of so­lu­tions for al­most ev­ery prob­lem or pur­pose, which in­cludes stor­age as well.

Rea­sons for the growth in the data stor­age so­lu­tions seg­ment

Let’s check out some of the rea­sons for this:

1. Var­i­ous re­cent gov­ern­ment reg­u­la­tions, like Sar­banesOx­ley, ask busi­nesses to main­tain and keep a backup of dif­fer­ent types of data which they might have oth­er­wise deleted.

2. Many of the small busi­nesses have now started ar­chiv­ing dif­fer­ent e-mail mes­sages, even those dat­ing back five or more years for var­i­ous le­gal rea­sons.

3. Also, the per­va­sive­ness of spy­ware and viruses re­quires back­ups and that again re­quires more stor­age ca­pac­ity.

4. There has been a grow­ing need to back up and store dif­fer­ent large me­dia files, such as video, MP3, etc, and make the same avail­able to users on a spe­cific net­work. This is again gen­er­at­ing a de­mand for large stor­age so­lu­tions.

5. Each newer ver­sion of any soft­ware ap­pli­ca­tion or op­er­at­ing sys­tem de­mands more space and me­mory than its pre­de­ces­sor, which is an­other rea­son driv­ing the de­mand for large stor­age so­lu­tions.

Dif­fer­ent types of stor­age op­tions

There are dif­fer­ent types of stor­age so­lu­tions that can be used based on in­di­vid­ual re­quire­ments, as listed be­low.

Flash me­mory thumb drives: These drives are par­tic­u­larly use­ful to mo­bile pro­fes­sion­als since they con­sume lit­tle power, are small enough to even fit on a key­chain and have al­most no mov­ing parts. You can con­nect any Flash me­mory thumb drive to your laptop’s Uni­ver­sal Se­rial Bus (USB) port and back up dif­fer­ent files on the sys­tem. Some of the USB thumb drives also pro­vide en­cryp­tion to pro­tect files in case the drive gets lost or is stolen. Flash me­mory thumb drives also let us store our Out­look data (like re­cent e-mails or cal­en­dar items), dif­fer­ent book­marks on In­ter­net Ex­plorer, and even some of the desk­top ap­pli­ca­tions. That way, you can leave your laptop at home and just plug the USB drive into any bor­rowed com­puter to ac­cess all your data else­where.

Ex­ter­nal hard drives: An inexpensive and rel­a­tively sim­pler way to add more me­mory stor­age is to con­nect an ex­ter­nal hard drive to your com­puter. Ex­ter­nal hard disk drives that are di­rectly con­nected to PCs have sev­eral dis­ad­van­tages. Any file stored only on the drive but not else­where re­quires to be backed up. Also, if you travel some­where for work and need ac­cess to some of the files on an ex­ter­nal drive, you will have to take the drive with you or re­mem­ber to make a copy of the re­quired files to your laptop’s in­ter­nal drive, a USB thumb drive, a CD or any other stor­age me­dia. Fi­nally, in case of a fire or other catas­tro­phe at your place of busi­ness, your data will not be com­pletely pro­tected if it’s stored on an ex­ter­nal hard drive.

On­line stor­age: There are dif­fer­ent ser­vices which pro­vide re­mote stor­age and backup over the In­ter­net. All such ser­vices of­fer busi­nesses a num­ber of ben­e­fits. By back­ing up your most im­por­tant files to a highly se­cure re­mote server, you are ac­tu­ally pro­tect­ing the data stored at your place of busi­ness. You can also eas­ily share dif­fer­ent large files with your clients, part­ners or others by pro­vid­ing them with pass­word-pro­tected ac­cess to your on­line stor­age ser­vice, hence elim­i­nat­ing the need to send those large files by e-mail. And in most cases, you can log into your ac­count from any sys­tem us­ing a Web based browser, which is one of the great ways to re­trieve files when you are away from your PC. Re­mote stor­age can be a bit slow, es­pe­cially dur­ing an ini­tial backup ses­sion, and only as fast as the speed of your net­work’s ac­cess to that stor­age. For ex­tremely large files, you may re­quire higher speed net­work ac­cess.

Net­work at­tached stor­age: Net­work at­tached stor­age (NAS) pro­vides fast, re­li­able and simple ac­cess to data in any IP net­work­ing en­vi­ron­ment. Such so­lu­tions are quite suit­able for small or mid-sized busi­nesses that re­quire large vol­umes of eco­nom­i­cal stor­age which can be shared by mul­ti­ple users over a net­work. Given that many of the small busi­nesses lack IT de­part­ments, this stor­age so­lu­tion is easy to de­ploy, can be managed and con­sol­i­dated cen­trally. This type of stor­age so­lu­tion can be as simple as a sin­gle hard drive with an Eth­er­net port or even built-in Wi-Fi con­nec­tiv­ity.

More so­phis­ti­cated NAS so­lu­tions can also pro­vide ad­di­tional USB as well as FireWire ports, en­abling you to con­nect ex­ter­nal hard drives to scale up the over­all stor­age ca­pac­ity of busi­nesses. A NAS stor­age so­lu­tion can also of­fer print-server ca­pa­bil­i­ties, which let mul­ti­ple users eas­ily share a sin­gle printer. A NAS so­lu­tion may also in­clude mul­ti­ple hard drives in a Re­dun­dant Ar­ray of In­de­pen­dent Disks (RAID) Level 1 ar­ray. This stor­age sys­tem con­tains two or more equiv­a­lent hard drives (sim­i­lar to two 250GB drives) in a sin­gle net­work-con­nected de­vice. Files writ­ten to the first (main) drive are au­to­mat­i­cally writ­ten to the sec­ond drive as well. This kind of au­to­mated re­dun­dancy present in NAS so­lu­tions im­plies that if the first hard drive dies, we will still have ac­cess to all our ap­pli­ca­tions and files present on the sec­ond drive. Such so­lu­tions can also help in of­fload­ing files be­ing served by other servers on your net­work, which increases the per­for­mance. A NAS sys­tem al­lows you to con­sol­i­date stor­age, hence in­creas­ing the ef­fi­ciency and reducing costs. It sim­pli­fies the stor­age ad­min­is­tra­tion, data backup and its re­cov­ery, and also al­lows for easy scal­ing to meet the grow­ing stor­age needs.

Choos­ing the right stor­age so­lu­tion

There are a num­ber of stor­age so­lu­tions avail­able in the mar­ket, which meet di­verse re­quire­ments. At times, you could get con­fused while try­ing to choose the right one. Let’s get rid of that con­fu­sion by con­sid­er­ing some of the im­por­tant as­pects of a stor­age so­lu­tion.

Scal­a­bil­ity: This is one of the im­por­tant fac­tors to be con­sid­ered while look­ing for any stor­age so­lu­tion. In dif­fer­ent dis­trib­uted stor­age sys­tems, stor­age ca­pac­ity can be added in two ways. The first way in­volves adding disks

or re­plac­ing the ex­ist­ing disks with ones that have higher stor­age ca­pac­ity (also called ‘scal­ing up’). The other method in­volves adding nodes with ‘scale out’ ca­pac­ity. When­ever you add hard­ware, you in­crease the whole sys­tem’s per­for­mance as well as its ca­pac­ity.

Per­for­mance: This is what we look for while choos­ing any stor­age so­lu­tion. One can­not af­ford to com­pro­mise on the per­for­mance of any stor­age so­lu­tion, as this may di­rectly im­pact the per­for­mance of the ap­pli­ca­tion that uses the given stor­age so­lu­tion. Flex­i­ble scal­a­bil­ity al­lows users to in­crease the ca­pac­ity and per­for­mance in­de­pen­dently as per their needs and bud­get.

Re­li­a­bil­ity: We all look for re­sources that can be re­lied upon for a long pe­riod of time, and this is the case even when search­ing for a stor­age so­lu­tion.

Af­ford­abil­ity: Since bud­get and pric­ing are im­por­tant, an open source stor­age so­lu­tion is a good op­tion be­cause it is avail­able free of cost. This is an im­por­tant fac­tor for small busi­nesses that can­not af­ford to spend much just for stor­age so­lu­tions.

Avail­abil­ity: Some­times, data stored in a stor­age so­lu­tion is not avail­able when be­ing fetched by any ap­pli­ca­tion. This can oc­cur be­cause of some disk fail­ure. We all want to avoid such cir­cum­stances, which may lead to un­avail­abil­ity of data. Data should be eas­ily avail­able when it’s be­ing ac­cessed.

Sim­plic­ity: Even the most ad­vanced stor­age so­lu­tions come with man­age­ment in­ter­faces that are as good as or bet­ter than the tra­di­tional stor­age units. All such in­ter­faces show de­tails about each node, ca­pac­ity al­lo­ca­tion, alerts, over­all per­for­mance, etc. This is a sig­nif­i­cant fac­tor to be con­sid­ered while choos­ing a stor­age so­lu­tion.

Sup­port: Last but not the least, there should be sup­port from the man­u­fac­turer or from a group of de­vel­op­ers, in­clud­ing the sup­port for ap­pli­ca­tions. Sup­port is quite es­sen­tial if you plan on in­stalling your data­base, virtual server farm, email or other crit­i­cal in­for­ma­tion on the stor­age so­lu­tion. You must make sure that the man­u­fac­turer of­fers the level of sup­port you re­quire.

Some of the avail­able open source stor­age so­lu­tions

Here’s a glance at some of the good open source so­lu­tions avail­able.

OpenS­tack: OpenS­tack is ba­si­cally a cloud op­er­at­ing sys­tem which con­trols large pools of net­work­ing re­sources, com­pu­ta­tion and stor­age through­out a data cen­tre, all of which are managed us­ing a dash­board that gives its ad­min­is­tra­tors the con­trols while em­pow­er­ing users to pro­vi­sion the re­sources through a Web in­ter­face. The OpenS­tack Ob­ject Stor­age ser­vice helps in pro­vid­ing soft­ware that stores and re­trieves data over HTTP. Ob­jects (also re­ferred to as blobs of data) are stored in an organisational hi­er­ar­chy which of­fers anony­mous read-only ac­cess or ACL de­fined ac­cess, or even a tem­po­rary ac­cess. This type of ob­ject stor­age sup­ports mul­ti­ple to­ken-based au­then­ti­ca­tion mech­a­nisms that are im­ple­mented via mid­dle­ware.

Ceph: This is a type of dis­trib­uted ob­ject stor­age and file sys­tem de­signed to pro­vide high per­for­mance, scal­a­bil­ity and re­li­a­bil­ity. It is built on the Re­li­able Au­to­nomic Dis­trib­uted Ob­ject Store, and al­lows en­ter­prises to build their own eco­nomic stor­age de­vices us­ing dif­fer­ent com­mod­ity hard­ware. It is main­tained by Red Hat af­ter its ac­qui­si­tion of InkTank in April 2014. It is ca­pa­ble of stor­ing blocks, files and ob­jects as well. It is scale-out, which means that mul­ti­ple Ceph stor­age nodes are present on a sin­gle stor­age sys­tem which eas­ily han­dles many petabytes of me­mory, and si­mul­ta­ne­ously increases per­for­mance and ca­pac­ity. Ceph has many of the ba­sic en­ter­prise stor­age fea­tures, which in­clude repli­ca­tion, thin pro­vi­sion­ing, snap­shots, auto-tier­ing and self-heal­ing ca­pa­bil­i­ties.

Rock­S­tor: This is a free and open source NAS so­lu­tion. The Per­sonal Cloud Server present in it is a very pow­er­ful

lo­cal al­ter­na­tive for pub­lic cloud stor­age, which mit­i­gates the cost and risks as­so­ci­ated with pub­lic cloud stor­age.

This net­work at­tached and cloud stor­age platform is quite suit­able for small to medium busi­nesses as well as home users who do not have much IT ex­pe­ri­ence but need to scale up to ter­abytes of data stor­age. If users are more in­ter­ested in Linux and Btrfs, it is a great al­ter­na­tive to FreeNAS.

This cloud stor­age platform can be managed even within a LAN or over the Web us­ing a very simple and in­tu­itive user in­ter­face. And with the in­clu­sion of add-ons (named ‘Rock­ons’), you can ex­tend the fea­ture set to in­clude dif­fer­ent new ap­pli­ca­tions, servers and ser­vices.

Ki­netic Open Stor­age: Backed by dif­fer­ent com­pa­nies like Sea­gate, EMC, Toshiba, Cisco, Red Hat, NetApp,

Dell, etc, Ki­netic is a Linux Foun­da­tion pro­ject which is ded­i­cated to estab­lish­ing stan­dards for new kinds of ob­ject stor­age ar­chi­tec­ture. It is de­signed es­pe­cially to meet the need for scale-out stor­age used for un­struc­tured data. Ki­netic is ba­si­cally a way for stor­age ap­pli­ca­tions to com­mu­ni­cate di­rectly with stor­age de­vices over the Eth­er­net. Most of the stor­age use cases tar­geted by Ki­netic con­sist of un­struc­tured data like Hadoop, NoSQL and other dis­trib­uted file sys­tems, as well as ob­ject stores in the cloud such as Ama­zon S3, Basho’s Riak and OpenS­tack Swift.

Storj DriveShare and Me­taDisk: Storj is a new type of cloud stor­age which is built on peer-to-peer and blockchain tech­nol­ogy. It of­fers de­cen­tralised and end-toend en­crypted cloud stor­age. The DriveShare ap­pli­ca­tion al­lows users to rent out all their un­used hard drive space so that it can be used by the ser­vice. The Me­taDisk Web ap­pli­ca­tion present in it al­lows users to save all their files to the ser­vice se­curely. The core pro­to­col helps in peer-topeer ne­go­ti­a­tion and ver­i­fi­ca­tion of the stor­age con­tracts. Providers of the stor­age are usu­ally re­ferred to as ‘farm­ers’ and those us­ing the stor­age are called ‘renters’. Renters can pe­ri­od­i­cally au­dit in or­der to check if the farm­ers are still keeping their files se­cure and safe. Con­versely, farm­ers can also decide to stop stor­ing any spe­cific file if its own­ers do not pay and au­dit their ser­vices on time. Dif­fer­ent files are cut up into smaller pieces called ‘shards’ and then are stored three times re­dun­dantly, by de­fault. The net­work can au­to­mat­i­cally de­ter­mine a new farmer and can also move data if copies be­come un­avail­able. The sys­tem puts dif­fer­ent mea­sures in place to pre­vent renters and farm­ers from cheat­ing on each other—for in­stance, by ma­nip­u­lat­ing the au­dit­ing process. Storj of­fers sev­eral ad­van­tages over many tra­di­tional cloud based stor­age so­lu­tions. As data present here is en­crypted and cut into shards at the source, there is al­most no chance for any unau­tho­rised third par­ties to ac­cess the data. And be­cause data stor­age is dis­trib­uted, the avail­abil­ity and down­load speed increases.

Fig­ure 2: Main ser­vices and com­po­nents of OpenS­tack (Im­age source: googleim­ages.com)

Fig­ure 3: Ar­chi­tec­ture for the Ceph stor­age so­lu­tion (Im­age source: googleim­ages.com)

Fig­ure 1: Qual­i­ties of NAS so­lu­tions (Im­age source: googleim­ages.com)

Fig­ure 4: Ten year Data cen­tre rev­enue fore­cast (Im­age source: googleim­ages.com)

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.