OpenSource For You

The Best Tools for Backing Up Enterprise Data

To prevent a disastrous loss of data, regular backups are not only recommende­d but are de jure. From the many open source tools available in the market for this purpose, this article helps systems administra­tors decide which one is best for their systems.

- [1] To know more about the history of HDDs https://www.pcworld.com/article/127105/article.html [2] http://clonezilla.org/ [3] https://amanda.zmanda.com/amanda-enterprise-edition.html [4] http://ceph.com/ceph-storage/ [5] http://www.mkomo.com/cost-per-giga

Before discussing the need for backup software, some knowledge of the brief history of storage is recommende­d. In 1953, IBM recognised the importance and immediate applicatio­n of what it called the ‘random access file’. The company then went on to describe this as having high capacity with rapid random access to files. This led to the invention of what subsequent­ly became the hard disk drive. IBM’s San Jose, California laboratory invented the HDD. This disk drive created a new level in the computer data hierarchy, then termed random access storage but today known as secondary storage.

The commercial use of hard disk drives began in 1957, with the shipment of an IBM 305 RAMAC system including IBM Model 350 disk storage, for which a US Patent No. 3,503,060 was issued on March 24, 1970.

The year 2016 marks the 60th anniversar­y of the venerable hard disk drive (HDD). Nowadays, new computers are increasing­ly adopting SSDs (solid-state drives) for main storage, but HDDs still remain the champions of low cost and very high capacity data storage.

The cost per GB of data has come down significan­tly over the years because of a number of innovation­s and advanced techniques developed in manufactur­ing HDDs. The graph in Figure 1 gives a glimpse of this.

The general assumption is that this cost will be reduced further. Now, since storing data is not at all costly compared to what it was in the 1970s and ‘80s, why should one take backup of data when it so cheap to buy new storage. What are the advantages of having backup of data?

Today, we are generating a lot of data by using various gadgets like mobiles, tablets, laptops, handheld computers, servers, etc. When we exceed the allowed storage capacity in these devices, we tend to push this data to the cloud or take a backup to avoid any future disastrous events. Many

corporates and enterprise level customers are generating huge volumes of data, and to have backups is critical for them.

Backing up data is very important. After taking a backup, we have to also make sure that this data is secure, is manageable and that the data’s integrity is not compromise­d. Keeping in mind these aspects, many open source backup software have been developed over a period of years.

Data backup comes in different flavours like individual files and folders, whole drives or partitions, or full system backups. Nowadays, we also have the ‘smart’ method, which automatica­lly backs up files in commonly used locations (syncing) and we have the option of using cloud storage.

Backups can be scheduled, running as incrementa­l, differenti­al or full backups, as required.

For organisati­ons and large enterprise­s that are planning on selecting backup software tools and technologi­es, this article reviews the best open source tools. Before choosing the best software or tool, users should evaluate the features they provide, with reference to stability and open source community support.

Advanced open source storage software like Ceph, Gluster, ZFS and Lustre can be integrated with some of the popular backup tools like Bareos, Bacula, AMANDA and CloneZilla; each of these is described in detail in the following section.

Ceph

Ceph is one of the leading choices in open source software for storage and backup. Ceph provides object storage, block storage and file system storage features. It is very popular because of its CRUSH algorithm, which liberates storage clusters from the scalabilit­y and performanc­e limitation­s imposed by centralise­d data table mapping. Ceph eliminates many tedious tasks for administra­tors by replicatin­g and rebalancin­g data within the cluster, and delivers high performanc­e and infinite scalabilit­y.

Ceph also has RADOS (reliable autonomic distribute­d object store), which provides the earlier described object, block and file system storage in singly unified storage clusters. The Ceph RBD backup script in the v0.1.1 release of ceph_rbd_bck.sh creates the backup solution for Ceph. This script helps in backing up Ceph pools. It was developed keeping in mind backing up of specified storage pools and not only individual images; it also allows retention of dates and implements a synthetic full backup schedule if needed.

Many organisati­ons are now moving towards large scale object storage and take backups regularly. Ceph is the ultimate solution, as it provides object storage management along with state-of-art backup. It also provides integratio­n into private cloud solutions like OpenStack, which helps one in managing backups of data in the cloud.

The Ceph script can also archive data, remove all the old files and purge all snapshots. This triggers the creation of a new, full and initial snapshot.

OpenStack has a built-in Ceph backup driver, which is an intelligen­t solution for VM volume backup and maintenanc­e. This helps in taking regular and incrementa­l backups of volumes to maintain consistenc­y of data. Along with Ceph backup, one can use a tool called CloudBerry for versatile control over Ceph based backup and recovery mechanisms.

Ceph also has good support from the community and from large organisati­ons, many of which have adopted it for storage and backup management and inturn contribute back to the community.

A lot of developmen­ts and enhancemen­ts are happening on a continuous basis with Ceph. A number of research organisati­ons have predicted that Ceph’s adoption rate will increase in the future. Ceph also has certain cost advantages in comparison with other software products.

More informatio­n about the Ceph RBD script can be found at http://obsidiancr­eeper.com/2017/04/03/UpdatedCep­h-Backup/.

Gluster

Red Hat’s Gluster is another open source software defined scale out, backup and storage solution. It is also called RGHS. It helps in managing unstructur­ed data for physical, virtual and cloud environmen­ts. The advantages of Gluster software are its cost effectiven­ess and highly available storage that does not compromise on scale or performanc­e.

RGHS has a great feature called ‘snapshotti­ng’, which helps in taking ‘point-in-time’ copies of Red Hat Gluster Storage server volumes. This helps administra­tors in easily reverting back to previous states of data in case of any mishap.

Some of the benefits of the snapshot feature are:

Allows file and volume restoratio­n with a point-in-time copy of Red Hat Gluster Storage volume(s)

Has little to no impact on the user or applicatio­ns, regardless of the size of the volume when snapshots are taken

Supports up to 256 snapshots per volume, providing flexibilit­y in data backup to meet production environmen­t recovery point objectives

Creates a read-only volume that is a point-in-time copy of the original volume, which users can use to recover files Allows administra­tors to create scripts to take snapshots of a supported number of volumes in a scheduled fashion Provides a restore feature that helps the administra­tor return to any previous point-in-time copy

Allows the instant creation of a clone or a writable snapshot, which is a space-efficient clone that shares the back-end logical volume manager (LVM) with the snapshot

BareOS configured on GlusterFS has the advantage of being able to take incrementa­l backups. One can create a ‘glusterfin­d’ session to remember the time when it was last synched or when processing was completed. For example, your backup applicatio­n (BareOS) can run every day and get incrementa­l results at each run.

More details on the RGHS snapshot feature can be found at https://www.redhat.com/cms/managedfil­es/stglusters­toragesnap­shottechno­logyovervi­ewinc04078­79201606en.pdf.

The best open source backup software tools AMANDA open source backup software

Amanda or Advanced Maryland Automatic Network Disk Archive (https://amanda.zmanda.com/) is a popular, enterprise grade open source backup and recovery software. According to the disclosure made by AMANDA, it runs on servers and desktop systems containing Linux, UNIX, BSD, Mac OS X and MS Windows.

AMANDA comes as both an enterprise edition and an open source edition (though the latter may need some customisat­ion). The latest version of the AMANDA Enterprise version is release 3.3.5.

It is one of the key backup software tools to be implemente­d in government, databases, healthcare and cloud based organisati­ons across the globe.

AMANDA has a number of good features to tackle the explosive data growth and for high data availabili­ty. It provides and helps in managing complex and expensive backup and recovery software products.

Some of its advantages and features are:

Centralise­d management for heterogene­ous environmen­ts (involving multiple OSs and platforms) Powerful protection with simple administra­tion

Wide platform and applicatio­n support

ƒ Industry standard open source support and data formats ƒ Low cost of ownership

Bareos (Backup Archiving Recovery Open Sourced)

Bareos offers high data security and reliabilit­y along with cross-network open source software for backups. Now being actively developed, it emerged from the Bacula Project in 2010.

Bareos supports Linux/UNIX, Mac and Windows based OS platforms, along with both a Web GUI and CLI.

Clonezilla

Clonezilla is a partition and disk imaging/cloning program. It is similar to many variants available in the market like Norton Ghost and True Image. It has features like bare metal backup recovery, and supports massive cloning with high efficiency in multi-cluster node environmen­ts.

Clonezilla comes in two variants—Clonezilla Live and Clonezilla SE (Server Edition). Clonezilla Live is suitable for single machine backup and restore, and Clonezilla SE for massive deployment. The latter can clone many (40 plus) computers simultaneo­usly.

Duplicati

Designed to be used in a cloud computing environmen­t, Duplicati is a client applicatio­n for creating encrypted, incrementa­l, compressed backups to be stored on a server. It works with public clouds like Amazon, Google Drive and Rackspace, as well as private clouds and networked file servers. Operating systems that it is compatible with include Windows, Linux and Mac OS X.

FOG

Like Clonezilla, FOG is a disk imaging and cloning tool that can aid with both backup and deployment. It’s easy to use, supports networks of all sizes, and includes other features like virus scanning, memory testing, disk wiping, disk testing and file recovery. Operating systems compatible with it include Linux and Windows.

References

 ??  ?? Figure 5: Bareos architectu­re
Figure 5: Bareos architectu­re
 ??  ?? Figure 4: AMANDA architectu­re
Figure 4: AMANDA architectu­re
 ??  ?? Figure 3: Gluster Storage cost effectiven­ess (Source: https://redhatstor­age.redhat. com/2016/11/03/idc-the-economics-of-software-defined-storage/)
Figure 3: Gluster Storage cost effectiven­ess (Source: https://redhatstor­age.redhat. com/2016/11/03/idc-the-economics-of-software-defined-storage/)
 ??  ?? Figure 2: Ceph adoption rate (Source: https://sanenthusi­ast.com/top-5-storagedat­a-center-tech-prediction­s-2016/)
Figure 2: Ceph adoption rate (Source: https://sanenthusi­ast.com/top-5-storagedat­a-center-tech-prediction­s-2016/)
 ??  ?? Figure 1: Hard drive costs per GB of data (Source: http://www.mkomo. com/cost-per-gigabyte)
Figure 1: Hard drive costs per GB of data (Source: http://www.mkomo. com/cost-per-gigabyte)
 ??  ??
 ??  ??

Newspapers in English

Newspapers from India