The Best Tools for Back­ing Up En­ter­prise Data

To pre­vent a dis­as­trous loss of data, reg­u­lar back­ups are not only rec­om­mended but are de jure. From the many open source tools avail­able in the mar­ket for this pur­pose, this ar­ti­cle helps sys­tems ad­min­is­tra­tors decide which one is best for their sys­tems.

OpenSource For You - - Contents - [1] To know more about the his­tory of HDDs https://www.pc­­ti­cle/127105/ar­ti­cle.html [2] [3]­ter­prise-edi­tion.html [4]­age/ [5]

Be­fore dis­cussing the need for backup soft­ware, some knowl­edge of the brief his­tory of stor­age is rec­om­mended. In 1953, IBM recog­nised the im­por­tance and im­me­di­ate ap­pli­ca­tion of what it called the ‘ran­dom ac­cess file’. The com­pany then went on to de­scribe this as hav­ing high ca­pac­ity with rapid ran­dom ac­cess to files. This led to the in­ven­tion of what sub­se­quently be­came the hard disk drive. IBM’s San Jose, Cal­i­for­nia lab­o­ra­tory in­vented the HDD. This disk drive cre­ated a new level in the com­puter data hi­er­ar­chy, then termed ran­dom ac­cess stor­age but to­day known as sec­ondary stor­age.

The com­mer­cial use of hard disk drives be­gan in 1957, with the ship­ment of an IBM 305 RAMAC sys­tem in­clud­ing IBM Model 350 disk stor­age, for which a US Patent No. 3,503,060 was is­sued on March 24, 1970.

The year 2016 marks the 60th an­niver­sary of the ven­er­a­ble hard disk drive (HDD). Nowa­days, new com­put­ers are in­creas­ingly adopt­ing SSDs (solid-state drives) for main stor­age, but HDDs still re­main the cham­pi­ons of low cost and very high ca­pac­ity data stor­age.

The cost per GB of data has come down sig­nif­i­cantly over the years be­cause of a num­ber of in­no­va­tions and ad­vanced tech­niques de­vel­oped in man­u­fac­tur­ing HDDs. The graph in Fig­ure 1 gives a glimpse of this.

The gen­eral as­sump­tion is that this cost will be re­duced fur­ther. Now, since stor­ing data is not at all costly com­pared to what it was in the 1970s and ‘80s, why should one take backup of data when it so cheap to buy new stor­age. What are the ad­van­tages of hav­ing backup of data?

To­day, we are gen­er­at­ing a lot of data by us­ing var­i­ous gad­gets like mo­biles, tablets, lap­tops, hand­held com­put­ers, servers, etc. When we ex­ceed the al­lowed stor­age ca­pac­ity in these de­vices, we tend to push this data to the cloud or take a backup to avoid any fu­ture dis­as­trous events. Many

cor­po­rates and en­ter­prise level cus­tomers are gen­er­at­ing huge vol­umes of data, and to have back­ups is crit­i­cal for them.

Back­ing up data is very im­por­tant. Af­ter tak­ing a backup, we have to also make sure that this data is se­cure, is man­age­able and that the data’s integrity is not com­pro­mised. Keeping in mind these as­pects, many open source backup soft­ware have been de­vel­oped over a pe­riod of years.

Data backup comes in dif­fer­ent flavours like in­di­vid­ual files and fold­ers, whole drives or par­ti­tions, or full sys­tem back­ups. Nowa­days, we also have the ‘smart’ method, which au­to­mat­i­cally backs up files in com­monly used lo­ca­tions (sync­ing) and we have the op­tion of us­ing cloud stor­age.

Back­ups can be sched­uled, run­ning as in­cre­men­tal, dif­fer­en­tial or full back­ups, as re­quired.

For or­gan­i­sa­tions and large en­ter­prises that are plan­ning on se­lect­ing backup soft­ware tools and tech­nolo­gies, this ar­ti­cle re­views the best open source tools. Be­fore choos­ing the best soft­ware or tool, users should eval­u­ate the fea­tures they pro­vide, with ref­er­ence to sta­bil­ity and open source com­mu­nity sup­port.

Ad­vanced open source stor­age soft­ware like Ceph, Glus­ter, ZFS and Lus­tre can be in­te­grated with some of the pop­u­lar backup tools like Bareos, Bac­ula, AMANDA and CloneZilla; each of these is de­scribed in de­tail in the fol­low­ing sec­tion.


Ceph is one of the lead­ing choices in open source soft­ware for stor­age and backup. Ceph pro­vides ob­ject stor­age, block stor­age and file sys­tem stor­age fea­tures. It is very pop­u­lar be­cause of its CRUSH al­go­rithm, which lib­er­ates stor­age clus­ters from the scal­a­bil­ity and per­for­mance lim­i­ta­tions im­posed by cen­tralised data table map­ping. Ceph elim­i­nates many te­dious tasks for ad­min­is­tra­tors by repli­cat­ing and re­bal­anc­ing data within the cluster, and de­liv­ers high per­for­mance and in­fi­nite scal­a­bil­ity.

Ceph also has RADOS (re­li­able au­to­nomic dis­trib­uted ob­ject store), which pro­vides the ear­lier de­scribed ob­ject, block and file sys­tem stor­age in singly uni­fied stor­age clus­ters. The Ceph RBD backup script in the v0.1.1 re­lease of ceph_rb­ cre­ates the backup so­lu­tion for Ceph. This script helps in back­ing up Ceph pools. It was de­vel­oped keeping in mind back­ing up of spec­i­fied stor­age pools and not only in­di­vid­ual im­ages; it also al­lows re­ten­tion of dates and im­ple­ments a syn­thetic full backup sched­ule if needed.

Many or­gan­i­sa­tions are now mov­ing to­wards large scale ob­ject stor­age and take back­ups reg­u­larly. Ceph is the ul­ti­mate so­lu­tion, as it pro­vides ob­ject stor­age man­age­ment along with state-of-art backup. It also pro­vides in­te­gra­tion into pri­vate cloud so­lu­tions like OpenS­tack, which helps one in man­ag­ing back­ups of data in the cloud.

The Ceph script can also archive data, re­move all the old files and purge all snap­shots. This trig­gers the creation of a new, full and ini­tial snap­shot.

OpenS­tack has a built-in Ceph backup driver, which is an in­tel­li­gent so­lu­tion for VM vol­ume backup and main­te­nance. This helps in tak­ing reg­u­lar and in­cre­men­tal back­ups of vol­umes to main­tain con­sis­tency of data. Along with Ceph backup, one can use a tool called CloudBerry for ver­sa­tile con­trol over Ceph based backup and re­cov­ery mech­a­nisms.

Ceph also has good sup­port from the com­mu­nity and from large or­gan­i­sa­tions, many of which have adopted it for stor­age and backup man­age­ment and in­turn con­trib­ute back to the com­mu­nity.

A lot of de­vel­op­ments and en­hance­ments are hap­pen­ing on a con­tin­u­ous ba­sis with Ceph. A num­ber of re­search or­gan­i­sa­tions have pre­dicted that Ceph’s adop­tion rate will in­crease in the fu­ture. Ceph also has cer­tain cost ad­van­tages in com­par­i­son with other soft­ware prod­ucts.

More in­for­ma­tion about the Ceph RBD script can be found at http://ob­sid­i­an­­dat­edCeph-Backup/.


Red Hat’s Glus­ter is an­other open source soft­ware de­fined scale out, backup and stor­age so­lu­tion. It is also called RGHS. It helps in man­ag­ing un­struc­tured data for phys­i­cal, virtual and cloud en­vi­ron­ments. The ad­van­tages of Glus­ter soft­ware are its cost ef­fec­tive­ness and highly avail­able stor­age that does not com­pro­mise on scale or per­for­mance.

RGHS has a great fea­ture called ‘snap­shot­ting’, which helps in tak­ing ‘point-in-time’ copies of Red Hat Glus­ter Stor­age server vol­umes. This helps ad­min­is­tra­tors in eas­ily re­vert­ing back to previous states of data in case of any mishap.

Some of the ben­e­fits of the snap­shot fea­ture are:

Al­lows file and vol­ume restora­tion with a point-in-time copy of Red Hat Glus­ter Stor­age vol­ume(s)

Has lit­tle to no im­pact on the user or ap­pli­ca­tions, re­gard­less of the size of the vol­ume when snap­shots are taken

Sup­ports up to 256 snap­shots per vol­ume, pro­vid­ing flex­i­bil­ity in data backup to meet pro­duc­tion en­vi­ron­ment re­cov­ery point ob­jec­tives

Cre­ates a read-only vol­ume that is a point-in-time copy of the orig­i­nal vol­ume, which users can use to re­cover files Al­lows ad­min­is­tra­tors to create scripts to take snap­shots of a sup­ported num­ber of vol­umes in a sched­uled fash­ion Pro­vides a re­store fea­ture that helps the ad­min­is­tra­tor re­turn to any previous point-in-time copy

Al­lows the in­stant creation of a clone or a writable snap­shot, which is a space-ef­fi­cient clone that shares the back-end log­i­cal vol­ume man­ager (LVM) with the snap­shot

BareOS con­fig­ured on Glus­terFS has the ad­van­tage of be­ing able to take in­cre­men­tal back­ups. One can create a ‘glus­terfind’ ses­sion to re­mem­ber the time when it was last synched or when pro­cess­ing was com­pleted. For ex­am­ple, your backup ap­pli­ca­tion (BareOS) can run ev­ery day and get in­cre­men­tal re­sults at each run.

More de­tails on the RGHS snap­shot fea­ture can be found at­­files/st­glus­ter­stor­ages­nap­shot­tech­nol­ogy­over­view­inc0407879­201606­en.pdf.

The best open source backup soft­ware tools AMANDA open source backup soft­ware

Amanda or Ad­vanced Mary­land Au­to­matic Net­work Disk Archive ( is a pop­u­lar, en­ter­prise grade open source backup and re­cov­ery soft­ware. Ac­cord­ing to the dis­clo­sure made by AMANDA, it runs on servers and desk­top sys­tems con­tain­ing Linux, UNIX, BSD, Mac OS X and MS Win­dows.

AMANDA comes as both an en­ter­prise edi­tion and an open source edi­tion (though the lat­ter may need some cus­tomi­sa­tion). The lat­est ver­sion of the AMANDA En­ter­prise ver­sion is re­lease 3.3.5.

It is one of the key backup soft­ware tools to be im­ple­mented in gov­ern­ment, data­bases, health­care and cloud based or­gan­i­sa­tions across the globe.

AMANDA has a num­ber of good fea­tures to tackle the ex­plo­sive data growth and for high data avail­abil­ity. It pro­vides and helps in man­ag­ing com­plex and ex­pen­sive backup and re­cov­ery soft­ware prod­ucts.

Some of its ad­van­tages and fea­tures are:

Cen­tralised man­age­ment for het­ero­ge­neous en­vi­ron­ments (in­volv­ing mul­ti­ple OSs and plat­forms) Pow­er­ful pro­tec­tion with simple ad­min­is­tra­tion

Wide platform and ap­pli­ca­tion sup­port

ƒ In­dus­try stan­dard open source sup­port and data for­mats ƒ Low cost of own­er­ship

Bareos (Backup Ar­chiv­ing Re­cov­ery Open Sourced)

Bareos of­fers high data se­cu­rity and re­li­a­bil­ity along with cross-net­work open source soft­ware for back­ups. Now be­ing ac­tively de­vel­oped, it emerged from the Bac­ula Pro­ject in 2010.

Bareos sup­ports Linux/UNIX, Mac and Win­dows based OS plat­forms, along with both a Web GUI and CLI.


Clonezilla is a par­ti­tion and disk imag­ing/cloning pro­gram. It is sim­i­lar to many vari­ants avail­able in the mar­ket like Nor­ton Ghost and True Im­age. It has fea­tures like bare metal backup re­cov­ery, and sup­ports mas­sive cloning with high ef­fi­ciency in multi-cluster node en­vi­ron­ments.

Clonezilla comes in two vari­ants—Clonezilla Live and Clonezilla SE (Server Edi­tion). Clonezilla Live is suit­able for sin­gle ma­chine backup and re­store, and Clonezilla SE for mas­sive de­ploy­ment. The lat­ter can clone many (40 plus) com­put­ers si­mul­ta­ne­ously.


De­signed to be used in a cloud com­put­ing en­vi­ron­ment, Du­pli­cati is a client ap­pli­ca­tion for cre­at­ing en­crypted, in­cre­men­tal, com­pressed back­ups to be stored on a server. It works with pub­lic clouds like Ama­zon, Google Drive and Rackspace, as well as pri­vate clouds and net­worked file servers. Op­er­at­ing sys­tems that it is com­pat­i­ble with in­clude Win­dows, Linux and Mac OS X.


Like Clonezilla, FOG is a disk imag­ing and cloning tool that can aid with both backup and de­ploy­ment. It’s easy to use, sup­ports net­works of all sizes, and in­cludes other fea­tures like virus scan­ning, me­mory test­ing, disk wip­ing, disk test­ing and file re­cov­ery. Op­er­at­ing sys­tems com­pat­i­ble with it in­clude Linux and Win­dows.


Fig­ure 5: Bareos ar­chi­tec­ture

Fig­ure 4: AMANDA ar­chi­tec­ture

Fig­ure 3: Glus­ter Stor­age cost ef­fec­tive­ness (Source: https://red­hat­stor­­hat. com/2016/11/03/idc-the-eco­nomics-of-soft­ware-de­fined-stor­age/)

Fig­ure 2: Ceph adop­tion rate (Source: https://sa­nen­thu­si­­age­data-cen­ter-tech-predictions-2016/)

Fig­ure 1: Hard drive costs per GB of data (Source: http://www.mkomo. com/cost-per-gi­ga­byte)

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.