Database Design for Revisions?

  1. Do not put it all in one table with an IsCurrent discriminator attribute. This just causes problems down the line, requires surrogate keys and all sorts of other problems.
  2. Design 2 does have problems with schema changes. If you change the Employees table you have to change the EmployeeHistories table and all the related sprocs that go with it. Potentially doubles you schema change effort.
  3. Design 1 works well and if done properly does not cost much in terms of a performance hit. You could use an xml schema and even indexes to get over possible performance problems. Your comment about parsing the xml is valid but you could easily create a view using xquery - which you can include in queries and join to. Something like this...
CREATE VIEW EmployeeHistory
AS
, FirstName, , DepartmentId

SELECT EmployeeId, RevisionXML.value('(/employee/FirstName)[1]', 'varchar(50)') AS FirstName,

  RevisionXML.value('(/employee/LastName)[1]', 'varchar(100)') AS LastName,

  RevisionXML.value('(/employee/DepartmentId)[1]', 'integer') AS DepartmentId,

FROM EmployeeHistories 

I think the key question to ask here is 'Who / What is going to be using the history'?

If it's going to be mostly for reporting / human readable history, we've implemented this scheme in the past...

Create a table called 'AuditTrail' or something that has the following fields...

[ID] [int] IDENTITY(1,1) NOT NULL,
[UserID] [int] NULL,
[EventDate] [datetime] NOT NULL,
[TableName] [varchar](50) NOT NULL,
[RecordID] [varchar](20) NOT NULL,
[FieldName] [varchar](50) NULL,
[OldValue] [varchar](5000) NULL,
[NewValue] [varchar](5000) NULL

You can then add a 'LastUpdatedByUserID' column to all of your tables which should be set every time you do an update / insert on the table.

You can then add a trigger to every table to catch any insert / update that happens and creates an entry in this table for each field that's changed. Because the table is also being supplied with the 'LastUpdateByUserID' for each update / insert, you can access this value in the trigger and use it when adding to the audit table.

We use the RecordID field to store the value of the key field of the table being updated. If it's a combined key, we just do a string concatenation with a '~' between the fields.

I'm sure this system may have drawbacks - for heavily updated databases the performance may be hit, but for my web-app, we get many more reads than writes and it seems to be performing pretty well. We even wrote a little VB.NET utility to automatically write the triggers based on the table definitions.

Just a thought!