Something as simple as how you tell your back-up product which files and databases to back-up can have a massive impact on your recoverability.
Proper back-up selection is essentially a balance between ensuring that everything that should be backed up is indeed backed up, while also trying not to back-up worthless data.
Physical server inclusion
Virtually all back-up products require some initial installation and configuration at the level of a physical server. This means that for any of the tactics mentioned in this article to work, one must first install the appropriate software and authorisation on each physical server in the data centre.
This means every VMware or Hyper-V server (not to be confused with each VM on those servers), every physical UNIX or Windows server, and any cloud services that are being backed up. Someone must make that initial connection and authentication before the back-up system can perform its magic.
The most common method of including files, objects, or databases in a back-up system is to manually select them when configuring the back-ups of a given system. Here are three examples of selective inclusion:
- Clicking through the vCentre or Hyper-V control panel and manually selecting which VMs to back-up
- Manually selecting one or more databases from a list of all databases
- Manually selecting one or more filesystems or subdirectories
The reason this is the most common method is that it fits the way people think; they want to perform back-ups, so they specify what they want to back up. It also helps minimise the amount of data backed up that has no value, because very few people would select a test VM or database, or a file system such as /tmp on UNIX.
The concern with selective inclusion is what happens over time. If only systems you manually select will be backed up, what happens when the configuration changes?
For example, what happens when you add new VMs to a given VMware server? What happens if you move a given VM from VMware to Hyper-V, or even the cloud?
If you manually selected it in VMware, it will not automatically start getting backed up when it moves to another configuration. Back-up experts generally warn against this type of back-up selection method because the risk of data loss is simply too high.
Once a given VM or database server has been added to the back-up configuration, another very common method is to simply specify that all VMs, databases, or filesystems found therein should be backed up.
This is the safest method of back-up inclusion because it ensures that every new data source will be backed up. It addresses the concern about selective inclusion because VMs – or a VM that was moved from one type of configuration to another – would automatically get backed up without anyone having to be notified.
Some say this method virtually ensures it will back up worthless data. While this is true, it also ensures that you will automatically be backing up important data.
The worst-case scenario with selective inclusion is that a really important file system, database, or VM doesn’t get backed up. With an automatic inclusion system, the worst-case scenario is that you are also backing up garbage.
This technique is typically used in conjunction with an automatic-inclusion system. A customer configures their back-up systems to back-up every VM, database, or file system except those that are specifically called out on a list of exclusions. Selective exclusion is a way to have your cake and eat it, too, as it allows you to use automatic inclusion as a way to ensure all important data is backed up, while also automatically excluding known worthless data.
This can be done in a UI, where a customer clicks through and manually selects drives or databases that he or she knows hold no value. An administrator trying to save space might add test databases or BMs, or filesystems like /tmp to the exclusion list to ensure that space is not wasted on them.
Another way to set up selective exclusion is to use wildcards or regular expressions to identify what should not be backed up.
For example, one could specify *.tmp, *.bak, *.cache as wildcard exclusion patterns; any files found with those extensions would not be backed up. Those familiar with regular expressions can get very creative with them in order to exclude particular types of files no matter where they are found.
A very modern way of including data in a back-up is to use tags, which are quite prevalent in the VM world. This allows you to specify not only that VMs with a certain tag should be backed up, but also how they should be backed up.
For example, you could specify that VMs with a #database tag should be backed up with the database back-up policy that will handle those VMs in a particular way. The same is true for VMs with hashtags like #fileserver, #test, etc. You can create several different types of back-up policies that behave in particular ways, and then apply those policies to different VM's via hashtags.
This is a form of automatic inclusion, as any new VMs would be automatically added to the appropriate back-up policy based on the hashtag. You can also continue to use automatic exclusion system to ensure that garbage data doesn't get backed up.
Whenever using automatic inclusion or tag-based inclusion, you need some kind of catch-all mechanism. For example, if a VM or database is not automatically selected via some type of hashtag or other mechanism, you want to make sure that it is still backed up. The more you use intelligent systems like tag-based inclusion, the more important a default inclusion system becomes.
If your back-up system supports it, it works like this: Any VM or database that is not already selected by an automatic policy or a tag-based policy will be backed up by this policy. Obviously, the policy will not be tailored to the needs of that particular system, but at least some kind of back-ups are happening.
You could then monitor this particular policy to see if any systems are ever backed up using a default inclusion system. If they are, perhaps you should examine why that is happening and solve that by putting them in the appropriate type of back-up configuration.
Remember this fundamental rule of back-up-system design: You cannot restore that which has not been backed up. No one ever got fired because they backed up too much data, but many people have been fired because they didn't back-up enough data.
Do your best to eliminate wasted back-ups, but try to err on the side of caution. Be more concerned with data that is not being backed up than with worthless data being backed up. That should help keep you from creating what many people call a resume-producing event.