Strategies for "checking out" records for processing
I am not a huge fan of either the extra "lock" table or the idea of locking the whole table to grab the next record. I get why it is being done, but that also hurts concurrency for operations that are updating to release a locked record (surely two processes can't be fighting over that when it is not possible for two processes to have locked the same record at the same time).
My preference would be to add a ProcessStatusID (typically TINYINT) column to the table with the data being processed. And is there a field for LastModifiedDate? If not, then it should be added. If yes, then do these records get updated outside of this processing? If records can be updated outside of this particular process, then another field should be added to track StatusModifiedDate (or something like that). For the rest of this answer I will just use "StatusModifiedDate" as it is clear in its meaning (and in fact, could be used as the field name even if there is currently no "LastModifiedDate" field).
The values for ProcessStatusID (which should be placed into a new lookup table called "ProcessStatus" and Foreign Keyed to this table) could be:
- Completed (or even "Pending" in this case as both mean "ready to be processed")
- In Process (or "Processing")
- Error (or "WTF?")
At this point it seems safe to assume that from the application, it just wants to grab the next record to process and won't be passing anything in to help make that decision. So we want to grab the oldest (at least in terms of StatusModifiedDate) record that is set to "Completed" / "Pending". Something along the lines of:
SELECT TOP 1 pt.RecordID
FROM ProcessTable pt
WHERE pt.StatusID = 1
ORDER BY pt.StatusModifiedDate ASC;
We also want to update that record to "In Process" at the same time to prevent the other process from grabbing it. We could use the OUTPUT
clause to let us do the UPDATE and SELECT in the same transaction:
UPDATE TOP (1) pt
SET pt.StatusID = 2,
pt.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
OUTPUT INSERTED.RecordID
FROM ProcessTable pt
WHERE pt.StatusID = 1;
The main problem here is that while we can do a TOP (1)
in an UPDATE
operation, there is no way to do an ORDER BY
. But, we can wrap it in a CTE to combine those two concepts:
;WITH cte AS
(
SELECT TOP 1 pt.RecordID
FROM ProcessTable pt (READPAST, ROWLOCK, UPDLOCK)
WHERE pt.StatusID = 1
ORDER BY pt.StatusModifiedDate ASC;
)
UPDATE cte
SET cte.StatusID = 2,
cte.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
OUTPUT INSERTED.RecordID;
The obvious question is whether or not two processes doing the SELECT at the same time can grab the same record. I am pretty sure that the UPDATE with OUTPUT clause, especially combined with the READPAST and UPDLOCK hints (see below for more details), will be fine. However, I have not tested this exact scenario. If for some reason the above query does not take care of the race condition, then adding the following will: application locks.
The CTE query above can be wrapped in sp_getapplock and sp_releaseapplock to create a "gate keeper" for the process. In doing so, only one process at a time will be able to enter in order to run the query above. The other process(es) will be blocked until the process with the applock releases it. And since this step of the overall process is just to grab the RecordID, it is fairly quick and won't be blocking the other process(es) for very long. And, just as with the CTE query, we are not blocking the entire table, thereby allowing other updates to other rows (to set their status to either "Completed" or "Error"). Essentially:
BEGIN TRANSACTION;
EXEC sp_getapplock @Resource = 'GetNextRecordToProcess', @LockMode = 'Exclusive';
{CTE UPDATE query shown above}
EXEC sp_releaseapplock @Resource = 'GetNextRecordToProcess';
COMMIT TRANSACTION;
Application locks are very nice but should be used sparingly.
Lastly, you just need a stored procedure to handle setting the status to either "Completed" or "Error". And that can be a simple:
CREATE PROCEDURE ProcessTable_SetProcessStatusID
(
@RecordID INT,
@ProcessStatusID TINYINT
)
AS
SET NOCOUNT ON;
UPDATE pt
SET pt.ProcessStatusID = @ProcessStatusID,
pt.StatusModifiedDate = GETDATE() -- or GETUTCDATE()
FROM ProcessTable pt
WHERE pt.RecordID = @RecordID;
Table Hints (found at Hints (Transact-SQL) - Table):
READPAST (seems to fit this exact scenario)
Specifies that the Database Engine not read rows that are locked by other transactions. When READPAST is specified, row-level locks are skipped. That is, the Database Engine skips past the rows instead of blocking the current transaction until the locks are released...READPAST is primarily used to reduce locking contention when implementing a work queue that uses a SQL Server table. A queue reader that uses READPAST skips past queue entries locked by other transactions to the next available queue entry, without having to wait until the other transactions release their locks.
ROWLOCK (just to be safe)
Specifies that row locks are taken when page or table locks are ordinarily taken.
UPDLOCK
Specifies that update locks are to be taken and held until the transaction completes. UPDLOCK takes update locks for read operations only at the row-level or page-level.