Is a MERGE with OUTPUT better practice than a conditional INSERT and SELECT?
Because you are using a Sequence, you can use the same NEXT VALUE FOR function -- that you already have in a Default Constraint on the Id
Primary Key field -- to generate a new Id
value ahead of time. Generating the value first means that you don't need to worry about not having SCOPE_IDENTITY
, which then means that you don't need either the OUTPUT
clause or doing an additional SELECT
to get the new value; you will have the value before you do the INSERT
, and you don't even need to mess with SET IDENTITY INSERT ON / OFF
:-)
So that takes care of part of the overall situation. The other part is handling the concurrency issue of two processes, at the exact same time, not finding an existing row for the exact same string, and proceeding with the INSERT
. The concern is about avoiding the Unique Constraint violation that would occur.
One way to handle these types of concurrency issues is to force this particular operation to be single threaded. The way to do that is by using application locks (which work across sessions). While effective, they can be a bit heavy-handed for a situation like this where the frequency of collisions is probably fairly low.
The other way to deal with the collisions is to accept that they will sometimes occur and handle them rather than try to avoid them. Using the TRY...CATCH
construct, you can effectively trap a specific error (in this case: "unique constraint violation", Msg 2601) and re-execute the SELECT
to get the Id
value since we know that it now exists due to being in the CATCH
block with that particular error. Other errors can be handled in the typical RAISERROR
/ RETURN
or THROW
manner.
Test Setup: Sequence, Table, and Unique Index
USE [tempdb];
CREATE SEQUENCE dbo.MagicNumber
AS INT
START WITH 1
INCREMENT BY 1;
CREATE TABLE dbo.NameLookup
(
[Id] INT NOT NULL
CONSTRAINT [PK_NameLookup] PRIMARY KEY CLUSTERED
CONSTRAINT [DF_NameLookup_Id] DEFAULT (NEXT VALUE FOR dbo.MagicNumber),
[ItemName] NVARCHAR(50) NOT NULL
);
CREATE UNIQUE NONCLUSTERED INDEX [UIX_NameLookup_ItemName]
ON dbo.NameLookup ([ItemName]);
GO
Test Setup: Stored Procedure
CREATE PROCEDURE dbo.GetOrInsertName
(
@SomeName NVARCHAR(50),
@ID INT OUTPUT,
@TestRaceCondition BIT = 0
)
AS
SET NOCOUNT ON;
BEGIN TRY
SELECT @ID = nl.[Id]
FROM dbo.NameLookup nl
WHERE nl.[ItemName] = @SomeName
AND @TestRaceCondition = 0;
IF (@ID IS NULL)
BEGIN
SET @ID = NEXT VALUE FOR dbo.MagicNumber;
INSERT INTO dbo.NameLookup ([Id], [ItemName])
VALUES (@ID, @SomeName);
END;
END TRY
BEGIN CATCH
IF (ERROR_NUMBER() = 2601) -- "Cannot insert duplicate key row in object"
BEGIN
SELECT @ID = nl.[Id]
FROM dbo.NameLookup nl
WHERE nl.[ItemName] = @SomeName;
END;
ELSE
BEGIN
;THROW; -- SQL Server 2012 or newer
/*
DECLARE @ErrorNumber INT = ERROR_NUMBER(),
@ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE();
RAISERROR(N'Msg %d: %s', 16, 1, @ErrorNumber, @ErrorMessage);
RETURN;
*/
END;
END CATCH;
GO
The Test
DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
@SomeName = N'test1',
@ID = @ItemID OUTPUT;
SELECT @ItemID AS [ItemID];
GO
DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
@SomeName = N'test1',
@ID = @ItemID OUTPUT,
@TestRaceCondition = 1;
SELECT @ItemID AS [ItemID];
GO
Question from O.P.
Why is this better than the
MERGE
? Won't I get the same functionality without theTRY
by using theWHERE NOT EXISTS
clause?
MERGE
has various "issues" (several references are linked in @SqlZim's answer so no need to duplicate that info here). And, there is no additional locking in this approach (less contention), so it should be better on concurrency. In this approach you'll never get a Unique Constraint violation, all without any HOLDLOCK
, etc. It's pretty much guaranteed to work.
The reasoning behind this approach is:
- If you have enough executions of this procedure such that you need to worry about collisions, then you don't want to:
- take any more steps than are necessary
- hold locks on any resources for longer than necessary
- Since collisions can only happen upon new entries (new entries submitted at the exact same time), the frequency of falling into the
CATCH
block in the first place will be pretty low. It makes more sense to optimize the code that will run 99% of the time instead of the code that will run 1% of the time (unless there is no cost to optimizing both, but that is not the case here).
Comment from @SqlZim's answer (emphasis added)
I personally prefer to try and tailor a solution to avoid doing that when possible. In this case, I don't feel that using the locks from
serializable
is a heavy handed approach, and I would be confident it would handle high concurrency well.
I would agree with this first sentence if it were amended to state "and _when prudent". Just because something is technically possible does not mean that the situation (i.e. intended use-case) would be benefited by it.
The issue I see with this approach is that it locks more than what is being suggested. It is important to re-read the quoted documentation on "serializable", specifically the following (emphasis added):
- Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
Now, here is the comment in the example code:
SELECT [Id]
FROM dbo.NameLookup WITH (SERIALIZABLE) /* hold that key range for @vName */
The operative word there is "range". The lock being taken is not just on the value in @vName
, but more accurately a range starting at the location where this new value should go (i.e. between the existing key values on either side of where the new value fits), but not the value itself. Meaning, other processes will be blocked from inserting new values, depending on the value(s) currently being looked up. If the lookup is being done at the top of the range, then inserting anything that could occupy that same position will be blocked. For example, if values "a", "b", and "d" exist, then if one process is doing the SELECT on "f", then it will not be possible to insert values "g" or even "e" (since any one of those will come immediately after "d"). But, inserting a value of "c" will be possible since it wouldn't be placed in the "reserved" range.
The following example should illustrate this behavior:
(In query tab (i.e. Session) #1)
INSERT INTO dbo.NameLookup ([ItemName]) VALUES (N'test5');
BEGIN TRAN;
SELECT [Id]
FROM dbo.NameLookup WITH (SERIALIZABLE) /* hold that key range for @vName */
WHERE ItemName = N'test8';
--ROLLBACK;
(In query tab (i.e. Session) #2)
EXEC dbo.NameLookup_getset_byName @vName = N'test4';
-- works just fine
EXEC dbo.NameLookup_getset_byName @vName = N'test9';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1
EXEC dbo.NameLookup_getset_byName @vName = N'test7';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1
EXEC dbo.NameLookup_getset_byName @vName = N's';
-- works just fine
EXEC dbo.NameLookup_getset_byName @vName = N'u';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1
Likewise, if value "C" exists, and value "A" is being selected (and hence locked), then you can insert a value of "D", but not a value of "B":
(In query tab (i.e. Session) #1)
INSERT INTO dbo.NameLookup ([ItemName]) VALUES (N'testC');
BEGIN TRAN
SELECT [Id]
FROM dbo.NameLookup WITH (SERIALIZABLE) /* hold that key range for @vName */
WHERE ItemName = N'testA';
--ROLLBACK;
(In query tab (i.e. Session) #2)
EXEC dbo.NameLookup_getset_byName @vName = N'testD';
-- works just fine
EXEC dbo.NameLookup_getset_byName @vName = N'testB';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1
To be fair, in my suggested approach, when there is an exception, there will be 4 entries in the Transaction Log that won't be happening in this "serializable transaction" approach. BUT, as I said above, if the exception happens 1% (or even 5%) of the time, that is far less impacting than the far more likely case of the initial SELECT temporarily blocking INSERT operations.
Another, albeit minor, issue with this "serializable transaction + OUTPUT clause" approach is that the OUTPUT
clause (in its present usage) sends the data back as a result set. A result set requires more overhead (probably on both sides: in SQL Server to manage the internal cursor, and in the app layer to manage the DataReader object) than a simple OUTPUT
parameter. Given that we are only dealing with a single scalar value, and that the assumption is a high frequency of executions, that extra overhead of the result set probably adds up.
While the OUTPUT
clause could be used in such a way as to return an OUTPUT
parameter, that would require additional steps to create a temporary table or table variable, and then to select the value out of that temp table / table variable into the OUTPUT
parameter.
Further Clarification: Response to @SqlZim's Response (updated answer) to my Response to @SqlZim's Response (in the original answer) to my statement regarding concurrency and performance ;-)
Sorry if this part is a wee-bit long, but at this point we are just down to the nuances of the two approaches.
I believe the way the information is presented could lead to false assumptions about the amount of locking one could expect to encounter when using
serializable
in the scenario as presented in the original question.
Yes, I will admit that I am biased, though to be fair:
- It is impossible for a human to not be biased, at least to some small degree, and I do try to keep it at a minimum,
- The example given was simplistic, but that was for illustrative purposes to convey the behavior without over-complicating it. Implying excessive frequency was not intended, though I do understand that I also didn't explicitly state otherwise and it could be read as to imply a larger problem than actually exists. I will try to clarify that below.
- I did also include an example of locking a range between two existing keys (the second set of "Query tab 1" and "Query tab 2" blocks).
- I did find (and volunteer) the "hidden cost" of my approach, that being the four extra Tran Log entries each time the
INSERT
fails due to a Unique Constraint violation. I have not seen that mentioned in any of the other answers / posts.
Regarding @gbn's "JFDI" approach, Michael J. Swart's "Ugly Pragmatism For The Win" post, and Aaron Bertrand's comment on Michael's post (regarding his tests showing what scenarios have decreased performance), and your comment on your "adapation of Michael J. Stewart's adaptation of @gbn's Try Catch JFDI procedure" stating:
If you are inserting new values more often than selecting existing values, this may be more performant than @srutzky's version. Otherwise I would prefer @srutzky's version over this one.
With respect to that gbn / Michael / Aaron discussion related to the "JFDI" approach, it would be incorrect to equate my suggestion to gbn's "JFDI" approach. Due to the nature of the "Get or Insert" operation, there is an explicit need to do the SELECT
to get the ID
value for existing records. This SELECT acts as the IF EXISTS
check, which makes this approach more so equate to the "CheckTryCatch" variation of Aaron's tests. Michael's re-written code (and your final adaptation of Michael's adaptation) also includes a WHERE NOT EXISTS
to do that same check first. Hence, my suggestion (along with Michael's final code and your adaptation of his final code) won't actually hit the CATCH
block all that often. It could only be situations where two sessions, given the same non-existent ItemName
, and running the INSERT...SELECT
at the exact same moment such that both sessions receive a "true" for the WHERE NOT EXISTS
at the exact same moment and thus both attempt to do the INSERT
at the exact same moment. That very specific scenario happens much less often than either selecting an existing ItemName
or inserting a new ItemName
when no other process is attempting to do so at the exact same moment.
WITH ALL OF THE ABOVE IN MIND: Why do I prefer my approach?
First, let's look at what locking takes place in the "serializable" approach. As mentioned above, the "range" that gets locked depends on the existing key values on either side of where the new key value would fit. The beginning or ending of the range could also be the beginning or ending of the index, respectively, if there is no existing key value in that direction. Assume we have the following index and keys (^
represents the beginning of the index while $
represents the end of it):
Range #: |--- 1 ---|--- 2 ---|--- 3 ---|--- 4 ---|
Key Value: ^ C F J $
If session 55 attempts to insert a key value of:
A
, then range # 1 (from^
toC
) is locked: session 56 cannot insert a value ofB
, even if unique and valid (yet). But session 56 can insert values ofD
,G
, andM
.D
, then range # 2 (fromC
toF
) is locked: session 56 cannot insert a value ofE
(yet). But session 56 can insert values ofA
,G
, andM
.M
, then range # 4 (fromJ
to$
) is locked: session 56 cannot insert a value ofX
(yet). But session 56 can insert values ofA
,D
, andG
.
As more key values are added, the ranges between key values becomes narrower, hence reducing the probability / frequency of multiple values being inserted at the same time fighting over the same range. Admittedly, this is not a major problem, and fortunately it appears to be a problem that actually decreases over time.
The issue with my approach was described above: it only happens when two sessions attempt to insert the same key value at the same time. In this respect it comes down to what has the higher probability of happening: two different, yet close, key values are attempted at the same time, or the same key value is attempted at the same time? I suppose the answer lies in the structure of the app doing the inserts, but generally speaking I would assume it to be more likely that two different values that just happen to share the same range are being inserted. But the only way to really know would be to test both on the O.P.s system.
Next, let's consider two scenarios and how each approach handles them:
All requests being for unique key values:
In this case, the
CATCH
block in my suggestion is never entered, hence no "issue" (i.e. 4 tran log entries and the time it takes to do that). But, in the "serializable" approach, even with all inserts being unique, there will always be some potential for blocking other inserts in the same range (albeit not for very long).High-frequency of requests for same key value at the same time:
In this case -- a very low degree of uniqueness in terms of incoming requests for non-existent key values -- the
CATCH
block in my suggestion will be entered regularly. The effect of this will be that each failed insert will need to auto-rollback and write the 4 entries to the Transaction Log, which is a slight performance hit each time. But the overall operation should never fail (at least not due to this).(There was an issue with the previous version of the "updated" approach that allowed it to suffer from deadlocks. An
updlock
hint was added to address this and it no longer gets deadlocks.)BUT, in the "serializable" approach (even the updated, optimized version), the operation will deadlock. Why? Because theserializable
behavior only preventsINSERT
operations in the range that has been read and hence locked; it doesn't preventSELECT
operations on that range.The
serializable
approach, in this case, would seem to have no additional overhead, and might perform slightly better than what I am suggesting.
As with many / most discussions regarding performance, due to there being so many factors that can affect the outcome, the only way to really have a sense of how something will perform is to try it out in the target environment where it will run. At that point it won't be a matter of opinion :).
Updated Answer
Response to @srutzky
Another, albeit minor, issue with this "serializable transaction + OUTPUT clause" approach is that the OUTPUT clause (in its present usage) sends the data back as a result set. A result set requires more overhead (probably on both sides: in SQL Server to manage the internal cursor, and in the app layer to manage the DataReader object) than a simple OUTPUT parameter. Given that we are only dealing with a single scalar value, and that the assumption is a high frequency of executions, that extra overhead of the result set probably adds up.
I agree, and for those same reasons I do use output parameters when prudent. It was my mistake not to use an output parameter on my initial answer, I was being lazy.
Here is a revised procedure using an output parameter, additional optimizations, along with next value for
that @srutzky explains in his answer:
create procedure dbo.NameLookup_getset_byName (@vName nvarchar(50), @vValueId int output) as
begin
set nocount on;
set xact_abort on;
set @vValueId = null;
if nullif(@vName,'') is null
return; /* if @vName is empty, return early */
select @vValueId = Id /* go get the Id */
from dbo.NameLookup
where ItemName = @vName;
if @vValueId is not null /* if we got the id, return */
return;
begin try; /* if it is not there, then get the lock */
begin tran;
select @vValueId = Id
from dbo.NameLookup with (updlock, serializable) /* hold key range for @vName */
where ItemName = @vName;
if @@rowcount = 0 /* if we still do not have an Id for @vName */
begin; /* get a new Id and insert @vName */
set @vValueId = next value for dbo.IdSequence; /* get next sequence value */
insert into dbo.NameLookup (ItemName, Id)
values (@vName, @vValueId);
end;
commit tran;
end try
begin catch;
if @@trancount > 0
begin;
rollback transaction;
throw;
end;
end catch;
end;
update note: Including updlock
with the select will grab the proper locks in this scenario. Thanks to @srutzky, who pointed out that this could cause deadlocks when only using serializable
on the select
.
Note: This might not be the case, but if it is possible the procedure will be called with a value for @vValueId
, include set @vValueId = null;
after set xact_abort on;
, otherwise it can be removed.
Concerning @srutzky's examples of key range locking behavior:
@srutzky only uses one value in his table, and locks the "next"/"infinity" key for his tests to illustrate key range locking.
While his tests illustrate what happens in those situations, I believe the way the information is presented could lead to false assumptions about the amount of locking one could expect to encounter when using serializable
in the scenario as presented in the original question.
Even though I perceive a bias (perhaps falsely) in the way he presents his explanation and examples of key range locking, they are still correct.
After more research, I found a particularly pertinent blog article from 2011 by Michael J. Swart: Mythbusting: Concurrent Update/Insert Solutions.
In it, he tests multiple methods for accuracy and concurrency. Method 4: Increased Isolation + Fine Tuning Locks is based on Sam Saffron's post Insert or Update Pattern For SQL Server, and the only method in the original test to meet his expectations (joined later by merge with (holdlock)
).
In February of 2016, Michael J. Swart posted Ugly Pragmatism For The Win. In that post, he covers some additional tuning he made to his Saffron upsert procedures to reduce locking (which I included in the procedure above).
After making those changes, Michael wasn't happy that his procedure was starting to look more complicated and consulted with a colleage named Chris. Chris read all of the original Mythbusters post and read all the comments and asked about @gbn's TRY CATCH JFDI pattern. This pattern is similar to @srutzky's answer, and is the solution that Michael ended up using in that instance.
Michael J Swart:
Yesterday I had my mind changed about the best way to do concurrency. I describe several methods in Mythbusting: Concurrent Update/Insert Solutions. My preferred method is to increase the isolation level and fine tune locks.
At least that was my preference. I recently changed my approach to use a method that gbn suggested in the comments. He describes his method as the “TRY CATCH JFDI pattern”. Normally I avoid solutions like that. There’s a rule of thumb that says developers should not rely on catching errors or exceptions for control flow. But I broke that rule of thumb yesterday.
By the way, I love the gbn’s description for the pattern “JFDI”. It reminds me of Shia Labeouf’s motivational video.
In my opinion, both solutions are viable. While I still prefer to increase the isolation level and fine tune locks, @srutzky's answer is also valid and may or may not be more performant in your specific situation.
Perhaps in the future I too will arrive at the same conclusion that Michael J. Swart did, but I'm just not there yet.
It isn't my preference, but here is what my adapation of Michael J. Stewart's adaptation of @gbn's Try Catch JFDI procedure would look like:
create procedure dbo.NameLookup_JFDI (
@vName nvarchar(50)
, @vValueId int output
) as
begin
set nocount on;
set xact_abort on;
set @vValueId = null;
if nullif(@vName,'') is null
return; /* if @vName is empty, return early */
begin try /* JFDI */
insert into dbo.NameLookup (ItemName)
select @vName
where not exists (
select 1
from dbo.NameLookup
where ItemName = @vName);
end try
begin catch /* ignore duplicate key errors, throw the rest */
if error_number() not in (2601, 2627) throw;
end catch
select @vValueId = Id /* get the Id */
from dbo.NameLookup
where ItemName = @vName
end;
If you are inserting new values more often than selecting existing values, this may be more performant than @srutzky's version. Otherwise I would prefer @srutzky's version over this one.
Aaron Bertrand's comments on Michael J Swart's post links to relevant testing he has done and led to this exchange. Excerpt from comment section on Ugly Pragmatism For the Win:
Sometimes, though, JFDI leads to worse performance overall, depending on what % of calls fail. Raising exceptions has substantial overhead. I showed this in a couple of posts:
http://sqlperformance.com/2012/08/t-sql-queries/error-handling
https://www.mssqltips.com/sqlservertip/2632/checking-for-potential-constraint-violations-before-entering-sql-server-try-and-catch-logic/
Comment by Aaron Bertrand — February 11, 2016 @ 11:49 am
and the reply of:
You’re right Aaron, and we did test it.
It turns out that in our case, the percent of calls that failed was 0 (when rounded to the nearest percent).
I think you illustrate the point that as much as possible, evaluate things on a case-by-case basis over following rules-of-thumb.
It’s also why we added the not-strictly-necessary WHERE NOT EXISTS clause.
Comment by Michael J. Swart — February 11, 2016 @ 11:57 am
New links:
- Ugly Pragmatism For The Win - Michael J. Swart - 2016-02-11
- Mythbusting: Concurrent Update/Insert Solutions - Michael J. Swart - 2011-09-08
- @gbn's TRY CATCH JFDI Pattern
- Checking for potential constraint violations before entering SQL Server TRY and CATCH logic - Aaron Bertrand
- Performance impact of different error handling techniques - Aaron Bertrand
- SQL Server Isolation Levels: A Series - Paul White
Original answer
I still prefer the Sam Saffron upsert approach vs using merge
, especially when dealing with a single row.
I would adapt that upsert method to this situation like this:
declare @vName nvarchar(50) = 'Invader';
declare @vValueId int = null;
if nullif(@vName,'') is not null /* this gets your where condition taken care of before we start doing anything */
begin tran;
select @vValueId = Id
from dbo.NameLookup with (serializable)
where ItemName = @vName;
if @@rowcount > 0
begin;
select @vValueId as id;
end;
else
begin;
insert into dbo.NameLookup (ItemName)
output inserted.id
values (@vName);
end;
commit tran;
I would be consistent with your naming, and as serializable
is the same as holdlock
, pick one and be consistent in its use.
I tend to use serializable
because it is the same name used as when specifying set transaction isolation level serializable
.
By using serializable
or holdlock
a range lock is taken based on the value of @vName
which makes any other operations wait if they selecting or inserting values into dbo.NameLookup
that include the value in the where
clause.
For the range lock to work properly, there needs to be an index on the ItemName
column this applies when using merge
as well.
Here is what the procedure would look like mostly following Erland Sommarskog's whitepapers for error handling, using throw
.
If throw
isn't how you are raising your errors, change it to be consistent with the rest of your procedures:
create procedure dbo.NameLookup_getset_byName (@vName nvarchar(50) ) as
begin
set nocount on;
set xact_abort on;
declare @vValueId int;
if nullif(@vName,'') is null /* if @vName is null or empty, select Id as null */
begin
select Id = cast(null as int);
end
else /* else go get the Id */
begin try;
begin tran;
select @vValueId = Id
from dbo.NameLookup with (serializable) /* hold key range for @vName */
where ItemName = @vName;
if @@rowcount > 0 /* if we have an Id for @vName select @vValueId */
begin;
select @vValueId as Id;
end;
else /* else insert @vName and output the new Id */
begin;
insert into dbo.NameLookup (ItemName)
output inserted.Id
values (@vName);
end;
commit tran;
end try
begin catch;
if @@trancount > 0
begin;
rollback transaction;
throw;
end;
end catch;
end;
go
To summarize what is going on in the procedure above:
set nocount on; set xact_abort on;
like you always do, then if our input variable is null
or empty, select id = cast(null as int)
as the result.
If it isn't null or empty, then get the Id
for our variable while holding that spot in case it isn't there. If the Id
is there, send it out. If it isn't there, insert it and send out that new Id
.
Meanwhile, other calls to this procedure trying to find the Id for the same value will wait until the first transaction is done and then select & return it. Other calls to this procedure or other statements looking for other values will continue on because this one isn't in the way.
While I agree with @srutzky that you can handle collisions and swallow the exceptions for this sort of issue, I personally prefer to try and tailor a solution to avoid doing that when possible.
In this case, I don't feel that using the locks from serializable
is a heavy handed approach, and I would be confident it would handle high concurrency well.
Quote from sql server documentation on the table hints serializable
/ holdlock
:
SERIALIZABLE
Is equivalent to HOLDLOCK. Makes shared locks more restrictive by holding them until a transaction is completed, instead of releasing the shared lock as soon as the required table or data page is no longer needed, whether the transaction has been completed or not. The scan is performed with the same semantics as a transaction running at the SERIALIZABLE isolation level. For more information about isolation levels, see SET TRANSACTION ISOLATION LEVEL (Transact-SQL).
Quote from sql server documentation on transaction isolation level serializable
SERIALIZABLE Specifies the following:
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
Links related to the solution above:
Insert or Update pattern for Sql Server - Sam Saffron
Documentation on serializable and other Table Hints - MSDN
Error and Transaction Handling in SQL Server Part One – Jumpstart Error Handling - Erland Sommarskog
Erland Sommarskog's advice regarding @@rowcount, (which I didn't follow in this instance).
MERGE
has a spotty history, and it seems to take more poking around to make sure that the code is behaving how you want it to under all that syntax. Relevant merge
articles:
An Interesting MERGE Bug - Paul White
UPSERT Race Condition With Merge - sqlteam
Use Caution with SQL Server's MERGE Statement - Aaron Bertrand
Can I optimize this merge statement - Aaron Bertrand
If you are using indexed views and MERGE, please read this! - Aaron Bertrand
One last link, Kendra Little did a rough comparison of merge
vs insert with left join
, with the caveat where she says "I didn’t do thorough load testing on this", but it is still a good read.