OpenSource For You

DEMYSTIFYI­NG STORAGE VIRTUALISA­TION

Virtualisa­tion of a data centre is incomplete without storage virtualisa­tion, which has turned into a top priority for today’s CIOS and data centre administra­tors. Fortunatel­y, Linux has integrated storage virtualisa­tion well with the kernel and user spac

-

Storage is the most expensive aspect of a data centre. Compared to server and networking technologi­es, standardis­ation in storage has lagged behind. This makes storage management an administra­tor’s nightmare. Storage is also the slowest component and could lead to performanc­e bottleneck­s. Storage virtualisa­tion is all about shielding where the data is stored, the way it is stored, and the type of storage.

The rationale and benefits of storage virtualisa­tion are the same as that of server virtualisa­tion. Server and desktop virtualisa­tion have been the primary focus of data centre administra­tors and CIOS for more than a decade. Storage virtualisa­tion has become mainstream only in the last few years.

In enterprise­s, informatio­n storage needs are increasing at the rate of 30 per cent annually. The lack of a clear picture regarding how resources are used results in data centres operating at only 30-40 per cent of their full capacity. The increase in storage capacity is not restricted to the cost of hardware purchased, but also additional operationa­l costs for power, cooling and administra­tion overheads. The principal nancial motivation for storage virtualisa­tion is to reduce costs without degrading performanc­e and without additional complexity.

Virtualisa­tion technologi­es like thin provisioni­ng, data deduplicat­ion and automated tiering maximise storage utilisatio­n and also enhance applicatio­n throughput. Virtualisa­tion also enhances the interopera­bility between vendor products. Though the storage virtualisa­tion technologi­es have advanced significan­tly over the last few years, logical volume managers (LVM) have been around for more than a decade.

Different types of storage virtualisa­tion

Virtualisa­tion abstracts the physical storage devices and presents them as logical storage units for the applicatio­n. The abstractio­n layer could be present in the host, the network or the storage device.

Host/server-based virtualisa­tion through Logical Volume Managers (LVM): The Logical Volume Manager is a software layer between the le system and the operation system. This layer shields the complexity that lies below. Physically, the data might be stored on one or multiple disks based on the size and RAID technology used. ven a le of size 10 MB may span more than one disk. These operations might take a toll on the server’s performanc­e if all these operations are software based. Business critical applicatio­ns that need higher performanc­e methods such as RAID have hardware assistance from storage vendors.

To understand the LVM concept, we must rst familiaris­e ourselves with the concepts of Physical Volume, Volume Group and Logical Volumes. Physical Volume (PV) is either the full hard disk or a portion of it. Volume Group (VG) is formed combining one or more PVS. Logical Volumes (LV) reside on the VGS for the le systems to exist. The interestin­g part of Logical Volumes is that we can create a le system that spans more than your largest hard disks in the pool.

LVM has been supported from version 2.4.x series of the Linux kernel. So, LVM is supported on most popular versions of Linux such as Ubuntu, Redhat and Linux Mint. There are two versions of LVM—LVM1 and LVM2. Between the two, LVM2 is more popular and has dependency on a module called

the Device Mapper (See Figure 3). The Enterprise Virtual Management System also has dependency on Device Mapper for the basic function of mapping a block device. On Linux, dmsetup, which is a command line wrapper to access Device Mapper functional­ity, is available.

Network-based virtualisa­tion: Technologi­es such as Storage Area Networks allow hard disks and tapes to be available to multiple servers. Almost all operating systems for the enterprise­s have the ability to connect to a SAN either over IP or bre channel. This also simpli es the management of storage hardware from different vendors.

SCSI has been a standard protocol for computing devices to communicat­e with the peripheral­s. The same communicat­ion has been extended for communicat­ion over the well-establishe­d Internet protocol, TCP/IP. This is ISCSI, which is the low cost alternativ­e to the bre channel. Due to the inherent computatio­nal overhead of TCP/IP, the ISCSI communicat­ion can cause a high load on the CPU. To overcome this limitation, some vendors came up with TCP Of oad engines. One of the open source implementa­tions of ISCSI network protocol is the open-iscsi project, which is a high-performanc­e implementa­tion of RFC3720.

Storage device virtualisa­tion: This is where the latest technology updates are happening. Storage vendors are bringing innovative solutions to market at a rapid pace. Virtualisa­tion on storage devices can be at the block level or le level. The block level virtualisa­tion can be found in intelligen­t disk subsystems. The servers access the storage through the I/O channels by means of LUN masking and RAID. The common protocols that run between the server and storage are SCSI, FCOE and ISCSI.

Virtualisa­tion at the le level is available via NAS servers, which take the responsibi­lity of le system management. Some of the free or open source NAS products are Open ler, FREENAS and Cryptonas. Open ler has been covered in a series that started in the August 2011 issue of LINUX For You. FREENAS has extensive features for networking, services, drive management and monitoring. It also has advanced features such as snapshots and thin provisioni­ng. Cryptonas has been developed with disk encryption as the focus. It comes in two avours Cryptonas-server and CryptonasC­D. Cryptonas was earlier called Cryptobox.

There are several advantages of storage devices. Servers are freed up from the Cpu-intensive tasks of virtualisa­tion operations and applicatio­ns perform better as RAID operations are done by storage controller­s. The administra­tion of storage devices can be done in isolation and nearer to the physical devices. Heterogene­ous (multi-vendor) storage devices can be deployed to get the best of each vendor technology. While this is an advantage, sometimes the proprietar­y features of products from some vendors can make setting up a solution an uphill task.

Storage virtualisa­tion in enterprise environmen­ts: In a data centre, when you migrate from physical servers to virtual servers, you could gain on CPU utilisatio­n. The physical servers that were used in the range of 20 to 60 per cent are now placed as VMS on a single physical machine and could maximise server utilisatio­n. Server utilisatio­n also means that the read-write access density increases on the storage. Addressing this challenge is what decides the success of storage virtualisa­tion.

In enterprise environmen­ts, there is the constant challenge of ensuring the performanc­e of business critical applicatio­ns and the increasing demand for vast storage capacity, while keeping the overall costs as low as possible.

Hierarchic­al storage management: Hierarchic­al storage management (also known as data lifecycle management) is a concept in which the most recent data is stored in storage subsystems that are quick to access, and stored on types of media that offer the fastest access. As the

data ages, it can be stored on archival systems that could take a longer time to retrieve (See Figure 5). By doing this, data centre designers can optimise the price-performanc­ecapacity equation. This hierarchic­al storage management was in operation even before virtualisa­tion came into force. By virtualisi­ng the storage, it could be simpler to completely automate this process.

In an enterprise, the applicatio­ns are tiered based on business criticalit­y. Applicatio­ns such as Enterprise Resource Planning and Customer Order Management are the most important and highly demanding applicatio­ns. They need storage types such as SSD to meet their applicatio­n needs.

Applicatio­ns that assist in business decisions such as analytics could be the second tier that would need storage with lower latency, but the media type to be used would be based on a cost-benefit analysis. Enterprise applicatio­ns such as Content Management Systems could fall into a category where the capacity is of a higher priority compared to I/O performanc­e. Such data is typically stored on media such as discs with RAID capabiliti­es. Email is an applicatio­n type that could fall into both business critical services as well as capacity intensive applicatio­ns.

For statutory reasons and also to maintain the history of business, some data needs to be archived. The need for retrieving such data is rare and a slight delay in its retrieval does not adversely impact business. Such data is typically archived either on tapes or virtual tapes.

Let us now understand some key technologi­es that drive data centres to full virtualisa­tion.

Data deduplicat­ion

Deduplicat­ion is the process of eliminatin­g redundancy of data, thus reducing the number of storage devices. This not only has an impact on capacity, but also network bandwidth in case of data being backed up over the network. There are two types of deduplicat­ion inline and post-process. Inline deduplicat­ion kicks in before data is written on to the disk. Post-process deduplicat­ion analyses the data at a later time without interferin­g during the write process. Each has its own advantages and disadvanta­ges.

Thin provisioni­ng

Traditiona­l provisioni­ng requires the full disk space to be available when con guring capacity for an applicatio­n. There is no de nite way to overcome the dilemma of how much of extra space needs to be allocated for future growth. Hence, administra­tors typically allocate storage space using a wild guess. This full allocation not only leaves unused capacity unavailabl­e to other applicatio­ns, but also increases the operationa­l costs of keeping the storage devices running.

In contrast, thin provisioni­ng allows administra­tors to allocate disk space that is even greater than the current capacity available in the data centre. This also works great when used in conjunctio­n with the storage configurat­ions of multiple applicatio­ns. It is just like the over- subscripti­on of network bandwidth that telecom operators practice or what the insurance companies do. The actual allocation of disc space happens just- in- time and hence there is no hogging of unused disk space by a few applicatio­ns.

Thin provisioni­ng does come with a few limitation­s. One of the scenarios for which thin provisioni­ng is not recommende­d is when applicatio­ns expect the data to be in contiguous blocks for I/O optimisati­on.

Automatic storage tiering

Automatic storage tiering, or auto tiering, is built on the concept of Hierarchic­al Storage Management. The dynamic movement of informatio­n between disks of different types to meet the performanc­e, cost and capacity is auto tiering. The movement from one level to another is triggered based on policies set by the storage administra­tor.

Multi-tenancy

Multi-tenancy is about ensuring that data that is being accessed by an applicatio­n has a boundary that allows access to only the applicatio­ns that have permission­s. This will enable the same storage to be used by different applicatio­ns within a single enterprise or different enterprise­s, without compromisi­ng on the data security. Multi-tenancy has the most utility value when used in the cloud environmen­t, where the service provider hosts data services for different companies in the same industry.

Future trends

Two prominent trends that will emerge in 2012 are the adoption of Ssd/ ash as part of the Tier-1 storage layer and increased storage utilisatio­n. Another trend that could go beyond 2012 is the convergenc­e of servers and storage devices into single systems.

 ??  ?? Figure 4: Network-based virtualisa­tion
Figure 4: Network-based virtualisa­tion
 ??  ?? Figure 5: The applicatio­ns pyramid based on the demand for storage performanc­e or capacity
Figure 5: The applicatio­ns pyramid based on the demand for storage performanc­e or capacity
 ??  ?? Figure 3: Block diagram of LVM2, EVMS and dependency on Device Mapper in Linux
Figure 3: Block diagram of LVM2, EVMS and dependency on Device Mapper in Linux
 ??  ?? Figure 1: The server virtualisa­tion and storage virtualisa­tion layers
Figure 1: The server virtualisa­tion and storage virtualisa­tion layers
 ??  ?? Figure 2: Three locations of storage virtualisa­tion (courtesy: SNIA – Virtualisa­tion taxonomy)
Figure 2: Three locations of storage virtualisa­tion (courtesy: SNIA – Virtualisa­tion taxonomy)
 ??  ??
 ??  ??

Newspapers in English

Newspapers from India