Friday, November 23, 2007

How to make a thread safe insert/update procedure on SQL Server

When developing an application that does a lot of inserts and updates, it is sometimes easier to wrap both statements in one stored procedure. This results in a simple procedure that will check if a certain record exists. When no record is found a new one is inserted, when a record is found the values are updated.

We will use the following table script as example to explain the problem:

create table updatetest 
  (
   id     int primary key, 
   name   varchar(20), 
   amount numeric
  )


insert into updatetest values (1, 'first', 100)
insert into updatetest values (2, 'second', 500)
insert into updatetest values (3, 'third', 30)
insert into updatetest values (4, 'fourth', 200)

When making an insert/update procedure most people write something like this:

CREATE PROCEDURE updatevalues(@id     int, 
                              @name   varchar(20), 
                              @amount numeric)
AS

IF EXISTS(SELECT * FROM updatetest WHERE id = @id)
  BEGIN
    UPDATE updatetest 
    SET name = @name, 
        amount= @amount
    WHERE id = @id
  END
ELSE
  BEGIN
    INSERT INTO updatetest VALUES (@id, @name, @amount)
  END
This looks like the perfect solution to our problem except for one detail. When this procedure is executed from several threads at the same time to update the same record, it will crash.

Let's alter our procedure script a bit to make it easier to reproduce the problem.

CREATE PROCEDURE updatevalues(@id     int, 
                              @name   varchar(20), 
                              @amount numeric)
AS

IF EXISTS(SELECT * FROM updatetest WHERE id = @id)
  BEGIN
    waitfor delay '00:00:10'
    UPDATE updatetest 
    SET name = @name, 
        amount= @amount
    WHERE id = @id
  END
ELSE
  BEGIN
    waitfor delay '00:00:10'
    INSERT INTO updatetest VALUES (@id, @name, @amount)
  END

Compile this script and then run the following statements (at the same time) in 2 query windows.

exec updatevalues 7, 'Seven', 777

Surprise surprise, look at the error we get in the second query window...

"Msg 2627, Level 14, State 1, Procedure updatevalues, Line 17
Violation of PRIMARY KEY constraint 'PK__updatetest__29AC2CE0'. Cannot insert duplicate key in object 'updatetest'.
The statement has been terminated.

(0 row(s) affected)

Why does this happen? Simple, the first execution checks if the record exists and decides to insert a new record because the if exists returns false. The second thread is executing the same query at the same time and of course the existance check returns false as well. They both try to insert and a primary key error happens.
Threading without locking gives errors.

Some people think that this will never happen, but they are wrong, from the moment you are working with large amounts of data and users, this conflict will happen very often.

Luckily there is an easy way to solve this (very annoying) problem by using Lock hints in our advantage.

We will update our test procedure to include a transaction and an Exclusive Lock:

CREATE PROCEDURE updatevalues(@id     int, 
                              @name   varchar(20), 
                              @amount numeric)
AS

BEGIN TRANSACTION
IF EXISTS(SELECT * FROM updatetest (XLOCK, SERIALIZABLE) WHERE id = @id)
  BEGIN
    waitfor delay '00:00:10'
    UPDATE updatetest 
    SET name = @name, 
        amount= @amount
    WHERE id = @id
  END
ELSE
  BEGIN
    waitfor delay '00:00:10'
    INSERT INTO updatetest VALUES (@id, @name, @amount)
  END
COMMIT TRANSACTION
RETURN(0)

The updated code has as effect that the second thread will wait until the first thread has completed it's operation. It is no longer possible that another operation on the same record is "sneaking" in between the check and the insert or update.
Threading without locking gives errors.

Since I simplified the procedure script for the example you can download a complete version of the code.

10 comments:

Anonymous said...

Thanks man!
You thought me a good lesson :)

Waleed Eissa said...

Great post, thanks for sharing.

BTW, I just read a stripped down (and slightly altered) version of this post on another blog, someone else thinks the post is great that they decided to publish it on their own blog, under their name though :)

http://www.semdendoncker.com/blog/?p=17

Filip De Vos said...

Thank you Wal, I sent him a message to correctly attribute the work to me.

Sem said...

No problem, I will add your link as a source.
Sorry about copying your post.

Waleed Eissa said...

No problem, I'm glad this was settled down

Anonymous said...

Great post!

I have one question though:
Would this code behave correctly?

--------------------
BEGIN TRANSACTION

UPDATE updatetest with(holdlock)
SET name = @name,
amount= @amount
WHERE id = @id

if (@@rowcount = 0)
INSERT INTO updatetest VALUES (@id, @name, @amount)

COMMIT TRANSACTION
---------------

I'm trying to avoid the extra reads of the EXISTS(...).

Filip De Vos said...

This is a perfectly viable solution, but performance wise it is similar to the "if exists approach"

Where the "update/insert" construction is faster on updates(44% of query plan), the "if exists" solution is faster on inserts (41% of query plan)

Anonymous said...

I got another question concerning the holdlock mechanism. In both cases, if no rows match the criteria, then an insert will be done.

My question is, what's locked during that period?

In the case of the update, the selected row is the only thing locked. But in the case of the insert, I'm not sure... From what I've read, it locks a range on the index of the query ... whatever that means ...

Do you have a better understanding of what happens with the locking in the case of the insert? Is it causing a table lock?

Filip De Vos said...

The "if Exists" creates a "Shared Table Lock", so during the lock period the other queries are blocked. Nothing you can do about that unfortunately.

Ahmed said...

Thanks Filip. That's EXACTLY what I was looking for :)