Open Source Storage Solutions You Can Depend On
Storage space is at a premium with petabytes and terabytes of data being generated almost on a daily basis due to modern day living. Open source storage solutions can help mitigate the storage problems of individuals as well as small and large scale enter
We have all been observing a sudden surge in the production of data in the recent past and this will undoubtedly increase in the years ahead. Almost all the applications on our smartphones (like Facebook, Instagram, WhatsApp, Ola, etc) generate data in different forms like text and images, or depend on data to work upon. With around 2.32 billion smartphone users across the globe (as per the latest data from statista.com) having installed multiple applications, it certainly adds up to a really huge amount of data, daily. Apart from this, there are other sources of data as well like different Web applications, sensors and actuators used in IoT devices, process automation plants, etc. All this creates a really big challenge to store such massive amounts of data in a manner that can be used as and when needed.
We all know that our businesses cannot get by without storing our data. Sooner or later, even small businesses need space for data storage—for documents, presentations, e-mails, image graphics, audio files, databases, spreadsheets, etc, which act as the lifeblood for most companies. Besides, many organisations also have some confidential information that must not be leaked or accessed by anyone, in which case, security becomes one of the most important aspects of any data storage solution. In critical healthcare applications, an organisation cannot afford to run out of memory, so data needs to be monitored at each and every second.
Storing different kinds of data and managing its storage is critical to any company’s behind-the-scenes success. When we look for a solution that covers all our storage needs, the possibilities seem quite endless, and many of them are likely to consume our precious IT budgets. This is why we cannot afford to overlook open source data storage solutions. Once you dive into the open source world, you will find a huge array of solutions for almost every problem or purpose, which includes storage as well.
Reasons for the growth in the data storage solutions segment
Let’s check out some of the reasons for this:
1. Various recent government regulations, like SarbanesOxley, ask businesses to maintain and keep a backup of different types of data which they might have otherwise deleted.
2. Many of the small businesses have now started archiving different e-mail messages, even those dating back five or more years for various legal reasons.
3. Also, the pervasiveness of spyware and viruses requires backups and that again requires more storage capacity.
4. There has been a growing need to back up and store different large media files, such as video, MP3, etc, and make the same available to users on a specific network. This is again generating a demand for large storage solutions.
5. Each newer version of any software application or operating system demands more space and memory than its predecessor, which is another reason driving the demand for large storage solutions.
Different types of storage options
There are different types of storage solutions that can be used based on individual requirements, as listed below.
Flash memory thumb drives: These drives are particularly useful to mobile professionals since they consume little power, are small enough to even fit on a keychain and have almost no moving parts. You can connect any Flash memory thumb drive to your laptop’s Universal Serial Bus (USB) port and back up different files on the system. Some of the USB thumb drives also provide encryption to protect files in case the drive gets lost or is stolen. Flash memory thumb drives also let us store our Outlook data (like recent e-mails or calendar items), different bookmarks on Internet Explorer, and even some of the desktop applications. That way, you can leave your laptop at home and just plug the USB drive into any borrowed computer to access all your data elsewhere.
External hard drives: An inexpensive and relatively simpler way to add more memory storage is to connect an external hard drive to your computer. External hard disk drives that are directly connected to PCs have several disadvantages. Any file stored only on the drive but not elsewhere requires to be backed up. Also, if you travel somewhere for work and need access to some of the files on an external drive, you will have to take the drive with you or remember to make a copy of the required files to your laptop’s internal drive, a USB thumb drive, a CD or any other storage media. Finally, in case of a fire or other catastrophe at your place of business, your data will not be completely protected if it’s stored on an external hard drive.
Online storage: There are different services which provide remote storage and backup over the Internet. All such services offer businesses a number of benefits. By backing up your most important files to a highly secure remote server, you are actually protecting the data stored at your place of business. You can also easily share different large files with your clients, partners or others by providing them with password-protected access to your online storage service, hence eliminating the need to send those large files by e-mail. And in most cases, you can log into your account from any system using a Web based browser, which is one of the great ways to retrieve files when you are away from your PC. Remote storage can be a bit slow, especially during an initial backup session, and only as fast as the speed of your network’s access to that storage. For extremely large files, you may require higher speed network access.
Network attached storage: Network attached storage (NAS) provides fast, reliable and simple access to data in any IP networking environment. Such solutions are quite suitable for small or mid-sized businesses that require large volumes of economical storage which can be shared by multiple users over a network. Given that many of the small businesses lack IT departments, this storage solution is easy to deploy, can be managed and consolidated centrally. This type of storage solution can be as simple as a single hard drive with an Ethernet port or even built-in Wi-Fi connectivity.
More sophisticated NAS solutions can also provide additional USB as well as FireWire ports, enabling you to connect external hard drives to scale up the overall storage capacity of businesses. A NAS storage solution can also offer print-server capabilities, which let multiple users easily share a single printer. A NAS solution may also include multiple hard drives in a Redundant Array of Independent Disks (RAID) Level 1 array. This storage system contains two or more equivalent hard drives (similar to two 250GB drives) in a single network-connected device. Files written to the first (main) drive are automatically written to the second drive as well. This kind of automated redundancy present in NAS solutions implies that if the first hard drive dies, we will still have access to all our applications and files present on the second drive. Such solutions can also help in offloading files being served by other servers on your network, which increases the performance. A NAS system allows you to consolidate storage, hence increasing the efficiency and reducing costs. It simplifies the storage administration, data backup and its recovery, and also allows for easy scaling to meet the growing storage needs.
Choosing the right storage solution
There are a number of storage solutions available in the market, which meet diverse requirements. At times, you could get confused while trying to choose the right one. Let’s get rid of that confusion by considering some of the important aspects of a storage solution.
Scalability: This is one of the important factors to be considered while looking for any storage solution. In different distributed storage systems, storage capacity can be added in two ways. The first way involves adding disks
or replacing the existing disks with ones that have higher storage capacity (also called ‘scaling up’). The other method involves adding nodes with ‘scale out’ capacity. Whenever you add hardware, you increase the whole system’s performance as well as its capacity.
Performance: This is what we look for while choosing any storage solution. One cannot afford to compromise on the performance of any storage solution, as this may directly impact the performance of the application that uses the given storage solution. Flexible scalability allows users to increase the capacity and performance independently as per their needs and budget.
Reliability: We all look for resources that can be relied upon for a long period of time, and this is the case even when searching for a storage solution.
Affordability: Since budget and pricing are important, an open source storage solution is a good option because it is available free of cost. This is an important factor for small businesses that cannot afford to spend much just for storage solutions.
Availability: Sometimes, data stored in a storage solution is not available when being fetched by any application. This can occur because of some disk failure. We all want to avoid such circumstances, which may lead to unavailability of data. Data should be easily available when it’s being accessed.
Simplicity: Even the most advanced storage solutions come with management interfaces that are as good as or better than the traditional storage units. All such interfaces show details about each node, capacity allocation, alerts, overall performance, etc. This is a significant factor to be considered while choosing a storage solution.
Support: Last but not the least, there should be support from the manufacturer or from a group of developers, including the support for applications. Support is quite essential if you plan on installing your database, virtual server farm, email or other critical information on the storage solution. You must make sure that the manufacturer offers the level of support you require.
Some of the available open source storage solutions
Here’s a glance at some of the good open source solutions available.
OpenStack: OpenStack is basically a cloud operating system which controls large pools of networking resources, computation and storage throughout a data centre, all of which are managed using a dashboard that gives its administrators the controls while empowering users to provision the resources through a Web interface. The OpenStack Object Storage service helps in providing software that stores and retrieves data over HTTP. Objects (also referred to as blobs of data) are stored in an organisational hierarchy which offers anonymous read-only access or ACL defined access, or even a temporary access. This type of object storage supports multiple token-based authentication mechanisms that are implemented via middleware.
Ceph: This is a type of distributed object storage and file system designed to provide high performance, scalability and reliability. It is built on the Reliable Autonomic Distributed Object Store, and allows enterprises to build their own economic storage devices using different commodity hardware. It is maintained by Red Hat after its acquisition of InkTank in April 2014. It is capable of storing blocks, files and objects as well. It is scale-out, which means that multiple Ceph storage nodes are present on a single storage system which easily handles many petabytes of memory, and simultaneously increases performance and capacity. Ceph has many of the basic enterprise storage features, which include replication, thin provisioning, snapshots, auto-tiering and self-healing capabilities.
RockStor: This is a free and open source NAS solution. The Personal Cloud Server present in it is a very powerful
local alternative for public cloud storage, which mitigates the cost and risks associated with public cloud storage.
This network attached and cloud storage platform is quite suitable for small to medium businesses as well as home users who do not have much IT experience but need to scale up to terabytes of data storage. If users are more interested in Linux and Btrfs, it is a great alternative to FreeNAS.
This cloud storage platform can be managed even within a LAN or over the Web using a very simple and intuitive user interface. And with the inclusion of add-ons (named ‘Rockons’), you can extend the feature set to include different new applications, servers and services.
Kinetic Open Storage: Backed by different companies like Seagate, EMC, Toshiba, Cisco, Red Hat, NetApp,
Dell, etc, Kinetic is a Linux Foundation project which is dedicated to establishing standards for new kinds of object storage architecture. It is designed especially to meet the need for scale-out storage used for unstructured data. Kinetic is basically a way for storage applications to communicate directly with storage devices over the Ethernet. Most of the storage use cases targeted by Kinetic consist of unstructured data like Hadoop, NoSQL and other distributed file systems, as well as object stores in the cloud such as Amazon S3, Basho’s Riak and OpenStack Swift.
Storj DriveShare and MetaDisk: Storj is a new type of cloud storage which is built on peer-to-peer and blockchain technology. It offers decentralised and end-toend encrypted cloud storage. The DriveShare application allows users to rent out all their unused hard drive space so that it can be used by the service. The MetaDisk Web application present in it allows users to save all their files to the service securely. The core protocol helps in peer-topeer negotiation and verification of the storage contracts. Providers of the storage are usually referred to as ‘farmers’ and those using the storage are called ‘renters’. Renters can periodically audit in order to check if the farmers are still keeping their files secure and safe. Conversely, farmers can also decide to stop storing any specific file if its owners do not pay and audit their services on time. Different files are cut up into smaller pieces called ‘shards’ and then are stored three times redundantly, by default. The network can automatically determine a new farmer and can also move data if copies become unavailable. The system puts different measures in place to prevent renters and farmers from cheating on each other—for instance, by manipulating the auditing process. Storj offers several advantages over many traditional cloud based storage solutions. As data present here is encrypted and cut into shards at the source, there is almost no chance for any unauthorised third parties to access the data. And because data storage is distributed, the availability and download speed increases.
Figure 2: Main services and components of OpenStack (Image source: googleimages.com)
Figure 3: Architecture for the Ceph storage solution (Image source: googleimages.com)
Figure 1: Qualities of NAS solutions (Image source: googleimages.com)
Figure 4: Ten year Data centre revenue forecast (Image source: googleimages.com)