THIS IS A RAID
It’s out of the clean rooms and into our homes, but should you be running RAID? And if so, which type? By Chris Lloyd
It’s out of the clean rooms and into our homes, but should you be running RAID? And if so, which type? We explain all you need to know.
In 1980, IBM’s 3380 hard drive cost $81,000 for the 2.5GB model. For the same money, you could walk into your local GM dealer, and buy five Cadillac Eldorados. The huge cost of high-speed storage provided the impetus to develop something that could match these monsters for less—enter the Redundant Array of Inexpensive Disks, RAID’s original acronym.
The idea was simple: couple together a number of smaller and cheaper drives to build a larger, faster, and more reliable unit. Data is spread across the disks to introduce a level of redundancy. The concept and the odd patent was knocking about in the ’70s, but the first hardware didn’t arrive until 1983 (a simple mirrored system from DEC). The term RAID wasn’t coined until 1987, and it wasn’t until the 1990s that it found favor and started to be used in earnest, before becoming the staple of server rooms. The early systems didn’t prove as inexpensive as hoped, either, so the acronym was quietly changed from “Inexpensive” to “Independent.”
Hard drives have now become much, much cheaper, and adding RAID to your home rig is well within reach. You’ll find software support in Windows and Linux, along with basic hardware support built into most half-decent motherboards. The cost can range from next to nothing, if you have a couple of old drives on the shelf, to $1,000 and upward, if you’re after some serious storage. The first question to ask yourself is: Is it worth it? Quickly followed by: What sort of RAID should you employ?
RAID comes in various numbered levels, standardized by the Storage Networking Industry Association. These define the way the data is stored across the disks. Each level has its strengths and weaknesses. So, let’s introduce the main contenders currently in widespread use.
RAID Level 0 Data is striped across drives—blocks of data are written in parallel on two or more disks. These disks need to be the same size, although unused space can be used elsewhere with some clever partitioning. As with all levels, the RAID array appears as a single volume. This level is built for speed; in practice, on a typical setup, a two-disk RAID 0 gives you about a 50 percent increase in raw data transfer rates, although access times remain about the same. The size of each data block—the stripe size—affects performance. Options range from 4KB to 128KB. Generally, the larger, the faster, although it increases potential wastage. A smaller stripe size can lead to gains if you have a larger number of disks, or small files. By and large, a compromise somewhere in the middle is best. Adding more disks makes the array faster incrementally, in theory. However, there’s a number of other factors that come into play with bandwidth limits and bottlenecks elsewhere in your system, so expect diminishing returns. RAID 0 has no parity or redundancy—if a disk dies, the whole thing is fried, and the more disks you use, the greater that risk. If you place performance above all else, then RAID 0 delivers it simply. RAID Level 1 This level uses mirroring. Blocks of data are copied to multiple disks at once— generally two, unless you really do expect to be unlucky. These need to be of equal size, and overall capacity is halved. If a drive fails, then you’ve still got a full copy of the data. So, after you’ve replaced the failed drive, and copied across the data, you are back to where you started. You can continue to use the array even with one drive down. Performance is pretty much the same as for a single drive, despite any theoretical gains. Read speeds can be increased, but this is down to how the controller and OS work; Windows can struggle to see much difference. RAID 1 is really aimed at data safety done simply.
RAID Level 5 Now it starts to get a little more complicated. This level uses striping, as Level 0, but adds distributed parity into the mix, to allow for a disk failure. Data blocks are striped as RAID 0, and a block of parity data is calculated and written on a separate physical drive. You need a minimum of three disks, and you effectively lose one drive to the parity data. The parity blocks are distributed across the physical drives. If a drive fails, then the data can be rebuilt from the data and parity data. It’s worth noting that this is far from quick—it can easily take hours or even days to rebuild an array with modern drives. You still have access to your data, even with one drive down. Data read speeds are good, but the need to calculate the parity drags down write speeds. RAID 5 is a compromise between security and performance. Capacity reduction is less than for a straight mirrored system, and read speeds are better. It doesn’t work well with a large number of small random write operations, however. This level used to be recommended for applications where data security was paramount, but times have changed. More on that later.
RAID Level 6 Much like RAID 5, but with double parity. It requires a minimum of four drives. Data is striped to two drives, then the parity data is written to the remaining two drives. No matter how many drives you have, two are effectively lost to parity data. Like RAID 5, the read speeds are good, but write speeds are compromised. The big advantage over RAID 5 is that a RAID 6 system can recover from the loss of two drives at once. You have access to the data, even while a failed drive is being rebuilt, which, like RAID 5, can be
painfully slow. It’s the RAID of choice for any system that needs to be up and running at all times, because drives can be swapped out while it’s still operating.
RAID Level 10 Also known as RAID 1+0. A nested array, it’s effectively a combination of striping and mirroring. It requires a minimum of four drives, divided into two groups. Data is striped across the groups, and mirrored within the group. Capacity is always halved, no matter how many drives you add. RAID 10 can survive the loss of one drive, and without the need to rebuild from parity data, rebuild times are good. A second failure is not always fatal—if it’s the mirror of the first failed drive. RAID 10 offers the advantages of RAID 0 speeds, and improved security over RAID 1. It is good with large files and streaming. The biggest drawback is the loss of half the capacity
however many drives you add.
More RAIDs That’s not all. There are dozens of non-standard and proprietary levels of RAID. These include other nested variations, such as RAID 5+0 and 1+0+0, and versions limited to certain OSes, filesystems, or specific hardware. However, they are all pretty much variations of the existing levels. RAID 60, for example, stripes data across two RAID 6 groups. It increases performance due to striping, and doubles the number of disks you can lose before a full data loss. However, you need a minimum of eight disks, and you lose half the capacity to parity data. There is also an interesting non-standard array called RAID-1E, which combines mirroring and striping, making a copy of each data block, useful for a three-drive array.
WHICH DO I WANT?
If speed is your thing, RAID 0 is the fastest. Others may come close, but for consistent maximum performance, it is the best. You also get to use all the drive capacity you’ve bought. The downside is the increased risk. Any unrecoverable sector is just that: gone for good. If any drive in the array fails, you’ve lost everything—time to format, and start again. The more disks in the array, the higher the chance. It does come into its own when used for very large temporary files, such as those created by image and video editors. You can run these from a fast RAID 0 array safely, because any data loss is easily reconstructed. However, given the
performance of modern SSDs, you are far better off investing in these first, rather than a RAID of spinners, unless you really need the capacity.
RAID 0 too risky? RAID 1 offers simple redundancy, but capacity is halved, and performance gains minimal. If a drive’s mirror fails during a rebuild, it’s dead. However, RAID 1 is nice and simple, and you only need two drives. If security is important, you can do better. RAID 5 used to be the default choice, but things have moved on. RAID 5 can recover from a drive failure, but it has become increasingly vulnerable to multiple failures.
For good data security, RAID 6 is preferable. That extra redundancy means you can have a drive go down, and it can still survive an unrecoverable data loss on a second drive. Sounds unlikely? Modern hard drives are reliable beasts, but even with a tiny chance of error on any one bit, the cumulative effect makes even a slim chance a real risk when spread over multiple large disks. A typical drive has an Unrecoverable Read Error of 10 to the power of 14; the best can even manage 15. Plug that tiny number into some math, and your chances of failure with a multi-terabyte RAID 5 starts to enter single-digit percentages. RAID 5 no longer offers the levels of security it was designed to give, and can’t be recommended. RAID 6 means the loss of a drive to parity data—but think about how catastrophic full data loss would be. Given current drive prices, it’s worth losing the capacity to be safe.
That said, RAID 6 is due to hit the same problems as RAID 5 in a few years’ time, given the rate at which drives are getting bigger. Triple-parity RAID isn’t far away now, although rebuild and data scrubbing times may make it cumbersome in practice. We could soon see the end of traditional parity-based redundancy.
That leaves us with RAID 10, which combines the best bit of RAID 0, the speed, with the best bit of RAID 1, the redundancy. It lacks the data security of parity checking, but is a good compromise between safety and speed. You do lose half the capacity, though. Which do you want? It depends on what you need, but if security is key, it has to be RAID 6. Apart from specialist applications, RAID 10 looks good—drives are cheap enough to stomach the capacity loss for the extra security and speed.
THE HARDWARE
A full-on RAID controller includes a dedicated processor, called ROC (RAIDon-Chip), cache memory, and more. Many consumer motherboards have RAID support, but the abilities and quality vary,
because they lack the dedicated hardware, pushing RAID functions over to the rest of the board, the processor, and software. You’ll find the motherboard is limited in some ways, only supporting RAID 0, 1, and 5, using whole drives only, no support for hotswapping, and more. Disparagingly called “Fake RAID” by the denizens of server rooms, they are usually much more inflexible than software RAID, without the benefits of a full hardware implementation.
If you’re thinking of using your motherboard’s controller, read the specifications carefully. These simpler hardware controllers work well enough in a home rig, but it’s worth knowing the limitations before you start. If you are going for the security of RAID 6, a more capable controller needs to be in your budget. It can get confusing working out exactly how capable a hardware RAID is, but the price of a controller should be a big clue. If it’s tens of dollars, it isn’t a full hardware implementation. A “proper” PCIe hardware RAID controller can run into the hundreds.
Running RAID in software sounds like it’ll compromise performance, but processors are fast enough now, and if you run without parity, a software RAID can be light on CPU load. Depending on your OS, they can be very flexible, too—Linux take a bow.
SATA or SAS? Depending on your controller, you’ll have a choice. Mixing types is frowned upon. SAS drives are sold as “enterprise” drives, and are technically more competent, and theoretically faster. However, benefiting from the extra performance would mean building a
serious hardware RAID. SAS drives come in at around $100 for 1TB. Best to go SATA, where $50 buys you 1TB, and $150 gets you 5TB. Don’t forget to budget for an enclosure and cables if you plan on a large RAID.
Full hardware RAID systems expect TLER hard drives: Time Limited Error Recovery. If a drive encounters a problem— reallocating a bad block, for example—it can take a few seconds or longer to resolve. However, your RAID controller is likely to notice earlier, and mark the whole drive as having failed. TLER drives take seven seconds to try to fix the problem, then send a warning message to the controller, and defer further attempts. It enables the controller and drive to co-operate more effectively to repair problems. Without it, you might replace good disks needlessly. If the drive has the word “Enterprise” in its name, TLER is enabled. Otherwise, you have to check it’s compatible, and turn it on in firmware. If you’re running a software or simple hardware RAID, you don’t have to worry about TLER.
A simple rule of thumb is to use as many drives as possible. Due to the parallel
The HP 462919 offers capabilities comparable to most consumer motherboards. nature of RAID, the more drives you have, the faster it is. If you are using parity, more drives mean that rebuild times are shorter, as each drive holds less of the parity data. The situation for SSDs is reversed, as the higher capacity drives are faster; although, again, it’s worth checking the specifications.
If a drive does go down, you’ll probably want a replacement quickly. Best to be prepared, and have an extra drive ready and waiting. Remember: RAID really wants matched drives—not just the same capacity, but identical ones. If you don’t have one ready, you might have to compromise, or spend time looking for a decent match.
SHOULD YOU?
Many RAID systems can recover from the loss of a disk; some more than one. Don’t let this beguile you into thinking it’s a safe place for important data. It isn’t a substitute for a proper backup. A power spike can still kill the whole rack. And you have no incremental backup, so if you overwrite an important file, it’s gone as effectively as on a failed drive. You’re still putting all your eggs in one basket, albeit a more robust one. You still need an effective backup.
Just because you can, it doesn’t follow that you should. Hardware RAID can get expensive and complicated fairly quickly. This is a technology that has come down from the server rooms, and can be daunting. Think what you want to achieve and why; RAID might not be the answer. In its simplest form—using Windows 10 Storage Spaces to configure a couple of drives, for example—it is painless enough, but from then on, it can get tricky fast. It’s no magic bullet, either—there’s no such thing as completely secure storage or fault-free operation. Still interested? Good. Done right, the performance can be intoxicating. Just don’t get carried away.