
The invention of synthetic full back-ups is one of the most important advancements in back-up technology in the last few decades, right up there with disk-based back-ups, deduplication, continuous data protection (CDP), and the cloud.
Here’s how they came to be and an explanation of what benefits they might offer.
Traditional back-up options
There are essentially two very broad categories of what the back-up industry calls back-up levels; you are either backing up everything (full back-up) or you are backing up only what has changed (incremental back-up). There are different types of incremental back-ups, but that's really not relevant to this particular discussion.
A typical set up runs incremental back-ups every night and full back-ups every week – or even less often than that.
The reason for periodic full back-ups is what happens when CIOs perform a restore. Traditional back-up software will restore all data found on the full back-up – even if some of the data on that tape has been replaced by newer versions that will be found on incremental back-ups.
The restore process will then begin restoring new or updated files from the various incremental back-ups, in the order that they were created.
This process of performing multiple restores, some of which are restoring data that will be overwritten, is inefficient to say the least. If the restores are coming from tape, CIOs must also add the time required to insert and load each tape, seek to the appropriate place on the tape, and eject the tape once it is no longer needed. This process can take over five minutes per tape.
This means that with this type of configuration, the more frequent full back-ups are, the faster restores will be because they are wasting less time. From a restore perspective, full back=ups every night would be ideal.
This is why it was very common to perform a full back-up once a week on all systems. As systems got more automated, some practitioners moved to monthly or quarterly full back-ups.
However, performing a full back-up on an active server or VM creates a significant load on that server. This gives an incentive for a back-up administrator to decrease the frequency of full back-ups as much as possible, even if it results in restores that take longer. This push and pull between back-up and restore efficiency is the main reason that synthetic back-ups came to be.
What is a synthetic full back-up?
A synthetic full backup is a back-up that behaves as a full back-up during restores, but does not do so during back-ups. In fact, in a typical synthetic full back-up configuration, full back-ups are all but done away with. There are three main methods of accomplishing this.
The first, and probably the most common, method of creating a synthetic full back-up is to create one from the available back-ups. The back-up system keeps a catalog of all data it finds during each back-up.
So at any given point it knows all of the files – and which versions of those files – that would be on a full back-up if it were to create one in the traditional way. It simply copies each of those files from one medium to another. This method will work with tape or disk, as long as multiple devices are available.
This method of performing a synthetic full back-up can take quite some time; however, this process can be run any time of day without any impact to the systems being backed up. In fact, the servers or VMs being backed up are completely uninvolved. The resulting back-up is in every sense a full back-up and subsequent incremental back-ups can be based on that full back-up. The only downside to this method is the time necessary to copy the full back-up.
The second method is only possible when using disk as your primary back-up target. It's also only possible if the back-up system is storing each changed file or block as a separate object in its storage system. This is in contrast to the way back-up systems have traditionally stored back-ups, where many files are put inside a container (e.g. tar or a proprietary back-up format).
If all changed files or blocks are stored as individual chunks of data, then a synthetic full back-up can be created by simply creating a snapshot-like view to the current version of all of the current chunks that make up the full back-up.
There are many advantages to this method, starting with the fact that it takes virtually no time to create the synthetic full back-up, as there is no movement of data. This means that synthetic full back-ups can be created much more often, and in fact most systems that support this will do this after every back-up.
This means that, while the system performs only incremental back-ups, all of its back-ups behave as full back-ups. This is typically referred to as a block-level incremental forever back-up system, as it never again requires a full back-up to be created, either traditionally or using the copy method mentioned above.
Finally, there is an approach that is somewhat of a hybrid of these two approaches. This is only possible with target deduplication systems.
Like the second approach mentioned above, all back-ups are stored as small chunks of data, resulting in each changed file or block being represented by many small chunks stored in the target deduplication system. This means that it is possible for this appliance to create a virtual full back-up – similar to the incremental forever method mentioned above – in very little time.
This process can also be controlled via a back-up product, where the back-up product tells the target deduplication system to create a synthetic full back-up. Like the second approach mentioned above, this method is very efficient and happens nearly instantaneously.
Does your system support synthetic full back-ups?
Whether it uses the copy method, block level incremental back-ups forever, or the virtual copy method used by target deduplication systems, a synthetic full back-up has become quite common in most commercial back-up systems.
If you are not using this functionality it might be time to investigate whether it is possible with your hardware and software set up. If it is not possible, this could be an indication that your back-up system is a little behind the times.