HDDs up, this is a RAID!
Handy admin tips for locating those many drives in your redundant arrays.
Sometimes novice administrators don’t understand that cloud disk performance management and optimisation can be an art form. Some cloud providers limit individual disk I/O performance to ensure decent performance is available to all.
As an example, in Azure, as disk sizes increase, I/O allowances don’t scale linearly. Those cloud vendors want you to buy the much more expensive SSD rather than the cheaper HDD. Disks can get very expensive as performance and size increase.
Fear not – there is a way around this issue that keeps performance up and costs down: using the Linux mdadm command (software RAID). This enables the administrator to stripe a group of less performant disks together and get the performance of those individual disks combined without affecting availability (loss of one disk). In effect, it creates a RAID 1 disk stripe using software RAID.
As an example, rather than having one 1TB disk with 60MBit/s and 120 IOPs limitation, you can use mdadm and merge four 256GB disks with more generous allowances per GB, and quadruple the performance for just a few dollars more – and substantially less than the SSD equivalent. This can be the difference between acceptable performance on normal HDD and SSD.
The great thing about cloud disks is that the real physical disks underlying the cloud are (or should be!) already redundant at the hardware level, so there is no risk in using RAID 1 (no disk redundancy) in the cloud VM. A word of caution: use this only for data disks, not boot disks. It may be possible, but it’s redundant because the first disk should only contain the essentials. As previously mentioned, the disks are already redundant at the real hardware level.
Implementing such a useful feature in Linux is quite straightforward. Using Azure as an example, create an Azure VM with a boot disk and make sure there’s an additional four data disks. Once the VM is up and running, login via SSH and check the disks are available by using sudo lsblk to make sure those four additional disks are present and not mounted already or otherwise in use.
With that list of disks, creating the new RAID 1 disk becomes quite straightforward, we’ll explain.
Once you’ve verified which disks are the data disks, use the command below to create the virtual disk:
$ sudo mdadm --create /dev/md0 –level=0 --raiddevices=4
/dev/vdb /dev/vdc /dev/vdd /dev/vde
Using this creates /dev/md0 – a virtual RAID disk that is made up of our data disks /dev/sdb, /dev/sdc, /dev/sdd and /dev/sde. Let’s ensure the array is automatically assembled during boot time with:
$ sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/ mdadm.conf
Finally edit fstab to add the array so that it automounts on boot. Use sudo nano /etc/fstab and add:
/dev/md0 /mnt/md0 ext4 defaults,nofail,discard 0 0
Once the disk is created, the administrator can use it in the same way as any other physical disk, because it shows up as /dev/md0. We suggest using LVM to manage it appropriately. Using LVM makes it trivial to set up custom disk sizing for different uses.
Finally, and very importantly, when setting up the filesystems to mount at boot, make sure the fstab file is valid by using sudo mount -a . This command rereads the fstab file and makes sure the entry is valid, as well as mounting the disks in the fstab. This step is very important because if the fstab file has an error in it, it may very well prevent booting into multi-user mode on reboot, which means SSH won’t be up and it becomes a pain to fix.
It may or may not be worth your effort. It depends on the workload versus cost, but it can be useful for production scenarios where you need performance at the best possible cost. A last word of caution: each Azure SKU had a finite number of disks. Something to bear in mind but balance that against paying a few extra dollars to quadruple the performance.