Data protection must change in the virtualization age
Backup has never been as easy or reliable as it should be, and virtualization just makes it worse
y first job in the storage market was working for a backup software manufacturer in the late ‘80s. We were one of the first companies that started advocating the idea of backing up data across a network to an automated tape library. For a lot of reasons, it was very complicated and failed more often than it worked. Fast forward almost 25 years later and, overall, the data protection process still doesn’t work very well. Why is this the case? Why is backup the job that nobody wants? Why do you even have to think about backups? Great questions. While backup software and the devices we back up to have made significant improvements in capabilities and ease of use, they still fail almost as often as they work.
RAPID FILE GROWTH
The single biggest challenge to the backup process is not the size but the quantity of data. The size or amount of data that you have to protect is clearly an issue, but it’s something we have dealt with for years. Addressing it means the constant upgrading of network connections, as well as faster backup storage devices.
The bigger challenge is the number of files that need to be protected. We used to warn customers about servers that had millions of files; that is now commonplace. Now we warn customers about billions of files. Backing up these servers via the file-by-file copy method common in legacy backup systems is almost impossible. In many cases, it takes longer to walk the file system than it does to actually copy the files to the backup target.
RAPID SERVER GROWTH
Virtualization of servers, networks, storage, and just about everything else brings significant flexibility to datacenter operations. It has also led to the creation of an “app” mentality among users and line-of-business owners. Everything is an app now, and that means yet another virtual machine created in the virtual infrastructure. The growth rate of VMs within an organization after a successful virtualization rollout is staggering. All of these VMs, or at least most of them, must be protected. While most, if not all, backup solutions have resolved the issue of in-VM backups, few are dealing with the massive growth of VM count. Often each VM needs to be its own job, and that means managing potentially hundreds of jobs.
GROWTH OF USER EXPECTATIONS
These realities are compounded by the fact that user expectations are at an alltime high. They now interact with online services that appear to never be down, and they expect the same from their IT. In other words, recovery has to be instant. Even the time to copy data from the backup server may take too long.
THE FIX MAY BE BETTER PRIMARY STORAGE
The fix for all this may be to make primary storage more accountable for its own protection. Clearly it does that to some extent already, providing protection from drive and controller failure. But given all the above challenges, it also needs to provide longer term, point-in-time data protection, so that if an application corrupts, you can roll back to a version you made a copy of an hour ago, instantly.
At the same time, data protection needs to change. We’ve seen intelligence added to systems so they can rapidly back up large file stores. We’ve also seen instant-recovery products that allow for a VM to run directly from a backup. But there are challenges with instant recovery that need to be addressed, like how well that instantly recovered VM will perform from a disk backup device and how that VM will be migrated back into production.