Selecting most recent and specific version in each group of records, for multiple groups

To get only latest revisions:

SELECT * from t t1
WHERE t1.rev = 
  (SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)

To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):

SELECT * from foo t1
WHERE t1.rev = 
  (SELECT max(rev) 
   FROM foo t2 
   WHERE t2.id = t1.id
   AND t2.rev <= 1)

It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.

Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev). It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.

Aging

Replace the rev column with an age column
Create a view of the current latest data with filter: age = 0
To create a new version of your data ...
- INSERT: new rows with age = -1 - This was my slow long running batch process.
- UPDATE: UPDATE table-name SET age = age + 1 for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
- DELETE: rows having age > N in the subset - Optionally purge old data

Indexing

Create a composite index with age and then id so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id ordered by age, you may need an additional non-unique index on id then age.

Rollback

Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:

UPDATE table-name SET age = age - 1 -- Roll back a version
DELETE table-name WHERE age < 0 -- Clean up bad stuff

Existing Table

Suppose you have an existing table that now needs to support aging. You can use this pattern by first renaming the existing table, then add the age column and indexing and then create the view that includes the age = 0 condition with the same name as the original table name.

This strategy may or may not work depending on the nature of technology layers that depended on the original table but in many cases swapping a view for a table should drop in just fine.

Notes

I recommend naming the age column to RowAge in order to indicate this pattern is being used, since it's clearer that its a database related value and it complements SQL Server's RowVersion naming convention. It also won't conflict with a column or view that needs to return a person's age.

Unlike other solutions, this pattern works for non SQL Server databases.

This is how I would do it. ROW_NUMBER() requires SQL Server 2005 or later

Sample data:

DECLARE @foo TABLE (
    id int,
    rev int,
    field nvarchar(10)
)

INSERT @foo VALUES
    ( 1, 1, 'test1' ),
    ( 2, 1, 'fdsfs' ),
    ( 3, 1, 'jfds' ),
    ( 1, 2, 'test2' )

The query:

DECLARE @desiredRev int

SET @desiredRev = 2

SELECT * FROM (
SELECT 
    id,
    rev,
    field,
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM @foo WHERE rev <= @desiredRev 
) numbered
WHERE rn = 1

The inner SELECT returns all relevant records, and within each id group (that's the PARTITION BY), computes the row number when ordered by descending rev.

The outer SELECT just selects the first member (so, the one with highest rev) from each id group.

Output when @desiredRev = 2 :

id          rev         field      rn
----------- ----------- ---------- --------------------
1           2           test2      1
2           1           fdsfs      1
3           1           jfds       1

Output when @desiredRev = 1 :

id          rev         field      rn
----------- ----------- ---------- --------------------
1           1           test1      1
2           1           fdsfs      1
3           1           jfds       1

Selecting most recent and specific version in each group of records, for multiple groups

Aging

Indexing

Rollback

Existing Table

Notes

Tags:

Sql

Sql Server

Tsql

Related

Recent Posts