Why can't a differential backup specify its base?

A differential backup uses what is called the differential change map to build a list of pages that have been modified since the last full backup. This list is a "differential" list, hence the name of the backup type, and the reason the backup can only ever be restored over top of the associated full backup.

Performing a full backup resets the differential change map. From that point forward, any page that is modified is recorded in the map. If you then take a differential, that backup only contains pages that have been modified since the last full backup, and recorded in the map.

In your analogy, the two full backups, which serve as a base for the entire restore process would likely have different contents, and therefore different differential maps. If you restore a diff based on the first backup over the 2nd backup, the database would likely be corrupted. In fact, SQL Server prevents the restore of a diff backup over anything except the original full backup it is based upon.

When you ask SQL Server to take a differential backup, the only "base" for the differential is the single differential change map present in the database at the time the differential backup starts. This is why you cannot specify the base for the differential backup.

In response to a comment from @MartinSmith - you might be able to use COPY_ONLY backups to restore a differential backup over a number of full backups. Consider the following scenario:

BACKUP DATABASE xyz TO DISK = 'path_to_backup.bak';
BACKUP DATABASE xyz TO DISK = 'path_to_backup_2.bak' WITH COPY_ONLY;
BACKUP DATABASE xyz TO DISK = 'path_to_backup_3.bak' WITH COPY_ONLY;
BACKUP DATABASE xyz TO DISK = 'path_to_backup_4.bak' WITH COPY_ONLY;
BACKUP DATABASE xyz TO DISK = 'path_to_backup_diff.bak' WITH DIFFERENTIAL;

The differential backup in step 5 should be capable of being restored over any of the backups taken in steps 1 to 4, since the differential change map is only cleared when the full backup in step 1 occurs. The COPY_ONLY backups in steps 2, 3, and 4, do not reset the change map. Since the differential change map accumulates changes made since the full backup, each of the successive COPY_ONLY backups contains enough information for the differential backup to work against any of the previous 4 backups.

Although it seems like it should work, in practice, restoring a differential over top of a copy_only backup results in the following error:

Msg 3136, Level 16, State 1, Line 1
This differential backup cannot be restored because the database has not been restored to the correct earlier state.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

I've created a SQL Server 2012 platform repro for testing differential and copy_only restores, and saved the file on gist.github.com - WARNING the script will drop any database named RestoreTest as its first step.

The feature you want could exist in principle. It would not be efficient with the current database structures (see Max Vernon's answer). SQL Server would have to either maintain a set of diff maps or compare the current DB contents against the full backup that you specify as a base.

There are applications that deduplicate large files. You can make two full backups and only the changed data will actually be stored. This is like a diff with custom base. exdupe for example can do that.

The nice thing about that is that it works with any set of backup files at all. In fact starting with the 3rd full backup file you will only pay incremental (not differential) space usage. The space usage is the difference to the previous backup file (not to the first). Deduplicating storage have similar behavior.

Why does the feature that you describe not exist? Each feature consumes budget causing other features to not be present. This one apparently did not make it far enough on the priority list. I'm not sure what it would be good for. Seems like a fairly esoteric requirement to use custom bases.

Don't confuse transaction log backups with differential backups, they have different purposes! What you're calling a "differential backup", whereby you "note all the changes to the cells", is in fact a transaction log.

A differential backup's purpose is to keep the size of the resulting backup file small by only recording the information that has changed since the last full backup, and to keep the restore time within your recovery time objective (RTO).

A transaction log backup's purpose is to let you replay the transactions to an arbitrary point in time - often, but definitely not necessarily to "the most recent anything to happen".

What you're talking about is in fact possible - but you need to restore the full backup, and then restore the transaction logs.

If you have the day 1 full backup and all of the transaction log backups between day 1 and day 5, there's nothing stopping you from restoring the day 1 backup and replaying the transaction log until you have the data as it was on day 4. You could also start from the day 2 backup, which would be slightly faster to restore, as you'd be replaying fewer transactions. You could also restore the day 1 full backup, the day 3 differential backup, and then restore the transaction logs to day 4.

Edit: OK, your edited analogy makes a little more sense. The answer is then "because you can already achieve what you want with transaction log backups". A differential backup is merely a cheap and convenient way of recording a whole bunch of transaction log activity. It doesn't offer any data recovery granularity that a transaction log backup doesn't offer. There's only so many features that offer "mere convenience" that make it into a product.

Why can't a differential backup specify its base?

Tags:

Sql Server

Backup

Related

Recent Posts