Friday, May 9, 2008

How to strip trailing zeros

A client came to me with another string formatting problem. I would never recommend formatting strings in T-SQL, but rather to do formatting on the client side. The question was to strip trailing zeros from a numeric value stored in a string but retain the decimals when they are not zero.

   
1.0501needs to show1.0501
1.0500needs to show1.05
1.0000needs to show1
This is easy to do by replacing all zeroes with a space and then using RTRIM() to remove the right spaces. Finally replace the spaces again with zeroes and you have the correct result. Except when all decimals were zero, in that case the decimal point needs to be suppressed as well. I poured the code in a function that is easier to use.
CREATE FUNCTION dbo.fn_strip_zeros(@number numeric(38,10))
RETURNS varchar(38)
WITH ENCRYPTION
AS
  BEGIN
    DECLARE @result varchar(38)
    DECLARE @decimal varchar(3)

    SET @decimal = substring(convert(varchar(38), convert(numeric(38,10),1)/5 ), 2,1)

    set @result = rtrim(replace(replace(rtrim(replace(@number,'0',' ')),' ','0') + ' ', @decimal + ' ', ''))

    RETURN @result
  END

GO

Friday, November 23, 2007

How to make a thread safe insert/update procedure on SQL Server

When developing an application that does a lot of inserts and updates, it is sometimes easier to wrap both statements in one stored procedure. This results in a simple procedure that will check if a certain record exists. When no record is found a new one is inserted, when a record is found the values are updated.

We will use the following table script as example to explain the problem:

create table updatetest 
  (
   id     int primary key, 
   name   varchar(20), 
   amount numeric
  )


insert into updatetest values (1, 'first', 100)
insert into updatetest values (2, 'second', 500)
insert into updatetest values (3, 'third', 30)
insert into updatetest values (4, 'fourth', 200)

When making an insert/update procedure most people write something like this:

CREATE PROCEDURE updatevalues(@id     int, 
                              @name   varchar(20), 
                              @amount numeric)
AS

IF EXISTS(SELECT * FROM updatetest WHERE id = @id)
  BEGIN
    UPDATE updatetest 
    SET name = @name, 
        amount= @amount
    WHERE id = @id
  END
ELSE
  BEGIN
    INSERT INTO updatetest VALUES (@id, @name, @amount)
  END
This looks like the perfect solution to our problem except for one detail. When this procedure is executed from several threads at the same time to update the same record, it will crash.

Let's alter our procedure script a bit to make it easier to reproduce the problem.

CREATE PROCEDURE updatevalues(@id     int, 
                              @name   varchar(20), 
                              @amount numeric)
AS

IF EXISTS(SELECT * FROM updatetest WHERE id = @id)
  BEGIN
    waitfor delay '00:00:10'
    UPDATE updatetest 
    SET name = @name, 
        amount= @amount
    WHERE id = @id
  END
ELSE
  BEGIN
    waitfor delay '00:00:10'
    INSERT INTO updatetest VALUES (@id, @name, @amount)
  END

Compile this script and then run the following statements (at the same time) in 2 query windows.

exec updatevalues 7, 'Seven', 777

Surprise surprise, look at the error we get in the second query window...

"Msg 2627, Level 14, State 1, Procedure updatevalues, Line 17
Violation of PRIMARY KEY constraint 'PK__updatetest__29AC2CE0'. Cannot insert duplicate key in object 'updatetest'.
The statement has been terminated.

(0 row(s) affected)

Why does this happen? Simple, the first execution checks if the record exists and decides to insert a new record because the if exists returns false. The second thread is executing the same query at the same time and of course the existance check returns false as well. They both try to insert and a primary key error happens.
Threading without locking gives errors.

Some people think that this will never happen, but they are wrong, from the moment you are working with large amounts of data and users, this conflict will happen very often.

Luckily there is an easy way to solve this (very annoying) problem by using Lock hints in our advantage.

We will update our test procedure to include a transaction and an Exclusive Lock:

CREATE PROCEDURE updatevalues(@id     int, 
                              @name   varchar(20), 
                              @amount numeric)
AS

BEGIN TRANSACTION
IF EXISTS(SELECT * FROM updatetest (XLOCK, SERIALIZABLE) WHERE id = @id)
  BEGIN
    waitfor delay '00:00:10'
    UPDATE updatetest 
    SET name = @name, 
        amount= @amount
    WHERE id = @id
  END
ELSE
  BEGIN
    waitfor delay '00:00:10'
    INSERT INTO updatetest VALUES (@id, @name, @amount)
  END
COMMIT TRANSACTION
RETURN(0)

The updated code has as effect that the second thread will wait until the first thread has completed it's operation. It is no longer possible that another operation on the same record is "sneaking" in between the check and the insert or update.
Threading without locking gives errors.

Since I simplified the procedure script for the example you can download a complete version of the code.

Friday, August 10, 2007

Building a small Virtual PC with windows xp and SQL Server 2005 Developer edition

I love Virtual PC 2007, but I hate how every VPC image I used so far is at least 5GB and easily grows to 10GB. It is hard to keep multiple images active on your machine. So I started looking for a way to reduce the size of an image.

About the only usefull resource I found was the site of http://www.bold-fortune.com. This website contains the log of a guy that deletes windows files one by one and tests his system for stability.

My goal was to end up with the smallest possible Windows XP installation with the SQL Server 2005 service installed (and running).

Here is how I got there:

  • Created a Virtual PC with Virtual PC 2007 with 128Mb memory enabled.
  • Installed Windows XP SP2 and switched off as many of the optional stuff as I could.
  • Set the swap file size to a fixed 128MB
  • Installed virtual pc additions.
  • Disabled the Windows Security center
  • Disabled windows file protection (following the method described on the bold-fortune site)
  • Installed the SQL Server service with the following components: Database Server, Shared components, Replication support.
  • In The surface configuration wizard enable remote connections over tcp/ip.
  • Disable all protocols except tcp/ip in the server configuration manager.
  • Followed the deletion steps as described by bold-fortune except "WMI" and "Logical Disk manager". (also left the file drvmain.sdb)
    Between every delete I recycled SQL Server to see if I could still connect remotely and run queries afterwards.

When a in my view important service or component was deleted I restarted the vpc to see if everything was still in working order. (I check in event viewer for errors.)

Another small fix I did was running the following in a cmd.exe window:

net config server /autodisconnect:-1
This fixes most of the "MRxSmb the redirector failed to determine the connection type on the loopback connection"

When all this was done I switched on compression on the complete c-drive of the vpc (make sure sql server is running because it does not when the system databases are compressed). Then used the Whitney Defrag utility to defragment the drive. This defragmenter is extremely nice on virtual pc's because it has a tiny installation footprint and it moves all the clusters to the start of the disk.

When all is ready it is time to start compacting the Virtual PC. Start this process by capturing the ISO file "C:\Program Files\Microsoft Virtual PC\Virtual Machine Additions\Virtual Disk Precompactor.iso". This will automatically start zeroing out all the free space of the virtual hard disk.

Once the virtual hard disk is pre-compacted, shut down your Virtual Machine and compact the virtual hard disk by using the "Virtual Disk Wizard".

This (lenghty) process delivered me a Virtual PC of 826 MB and with compression on a footprint of 550 MB on disk which enables me to keep several virtual machines around. I do realise that a vpc with so little memory is slow and not usable for any serious testing, but it enables me to comfortably experiment with clustering and replication.

You can download the batch files i created to shrink my vpc images but be carefull. If you run these files on your normal pc, they WILL delete a LOT of files and certain software will NOT WORK anymore. So please be carefull. And I do not guarantee that these batch files will work perfectly for you. For me they do exactly what I need and leave me with a good usable VPC image for testing.

So If you want to try the same thing I did, you can download the batch files and start deleting windows.

Monday, August 6, 2007

What to do when you want to drop a FileGroup

I was working on a system to partition tables on SQL Server 2005 which means I was creating database filegroups and database files and I needed to drop my newly created objects regularly to retest my code. But sometimes it was not possible to drop the filegroups with complaints that the filegroup was not empty. After a lot of googling I found that it is possible that statistics are defined on the filegroup (most likely automatic stats). To see this you can run the following query.

select object_name(id) AS TableName, * 
from dbo.sysindexes 
where groupid = object_id('<yourfilegroup>')
After dropping the statistics I could drop the filegroup without problems.

Wednesday, July 4, 2007

A small update to the sp_help_partition procedure.

One of the flaws in the sp_help_partition function I posted earlier is the lack of information if the partition function is a Range RIGHT or a Range LEFT function. To make that clearer I added a check on the boundary_value_on_right field from the system tables. Also added an order by to make the presentation nicer.

SELECT
          object_name(i.[object_id]) as [Table],
          ps.name                    as PartitionScheme,
          pf.name                    as PartitionFunction,
          dds.destination_id         as PartitionNumber,
          fg.Name                    as FileGroupName,
          case when boundary_value_on_right = 1 then 'RIGHT' else 'LEFT' end [Range], 
          prv.value                  as RangeValue
    FROM sys.indexes i
      INNER JOIN sys.partition_schemes ps on ps.data_space_id = i.data_space_id
      INNER JOIN sys.partition_functions pf on ps.function_id = pf.function_id
      INNER JOIN sys.destination_data_spaces dds on dds.partition_scheme_id = ps.data_space_id
      INNER JOIN sys.filegroups fg on fg.data_space_id = dds.data_space_id
      LEFT  JOIN sys.partition_range_values prv on prv.boundary_id = dds.destination_id
    WHERE i.type = 1 
I hope this version works better for you. Download the complete script.

Monday, July 2, 2007

Zero Padding a numeric value in Transact SQL

A quick one before I go home (and just because I used it 10 minutes ago)... I zero pad a numeric value like this:

DECLARE @mynumber INT,
        @padding  INT
SELECT @mynumber = 123, 
       @padding = 8
SELECT REPLACE(STR(@mynumber, @padding), ' ', '0')
The result of this example will be
 
00000123
I admit, it is not very elegant, but is string manipulation ever elegant in TSQL...

High Performance writing to XML Log files.

A characteristic of logfiles is the "append" nature. We append new content at the back of the file. When writing to a normal text file this is very easy and fast.

2007-07-01 13:56:04.313 start process
2007-07-01 13:56:25.837 do something
2007-07-01 13:56:25.853 the end
But when we want to use XML files to write log information things become a little bit more complicated because the xml needs to be valid, and we need a "closing" tag for the file.
<?xml version="1.0" encoding="UTF-8"?>
<log>
<logline date="2007-07-01 13:56:04.313" text="start process" />
<logline date="2007-07-01 13:56:25.837" text="do something" />
<logline date="2007-07-01 13:56:25.853" text="the end" />
</log>
This structure means we can no longer just "append" lines to the file. But we need to "seek" the closing tag and write before that. This is an extremely slow process especially when the logfiles become big. (I have seen 1Gb+ logfiles before.) Or even worse, we need to open the logfile as a DOM document. (RAM explosion anyone...) To solve this problem I use a little xml trick.
NOTE: This code will not be parsed correctly by every XML parser on the planet. But all the parsers I have used do it correctly.
The little logfile you saw in the previous example can be rewritten to be stored in 2 separate files. One with the opening and closing tags, and one with the log lines. Then the loglines get included in the "header" file which keeps it valid xml. And we can append lines in the "loglines" file. First the "header" file which contains an include section for the "loglines.xml" file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log [
&lt;!ENTITY loglines SYSTEM "loglines.xml">
]>
<log>
&loglines;
</log>
Then the loglines.xml file itself. (Note that this file does not validate.)
<logline date="2007-07-01 13:56:04.313" text="start process" />
<logline date="2007-07-01 13:56:25.837" text="do something" />
<logline date="2007-07-01 13:56:25.853" text="the end" />
Now when you parse the header.xml file it returns the complete log file as valid xml and you can append new log lines to the loglines.xml file without having to parse the complete file. As a little bonus... To format the log file as a nice html page you can apply the following xsl stylesheet.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
  <html>
  <body>
    <h2>Log Lines</h2>
    <table border="1">
    <tr bgcolor="#9acd32">
      <th align="left">Date</th>
      <th align="left">Message</th>
    </tr>
    <xsl:apply-templates select="log/logline" >
    <xsl:sort select="@date"/>
    </xsl:apply-templates>
    </table>
  </body>
  </html>
  </xsl:template>
  
  <xsl:template match="logline">
    <tr>
      <td><xsl:value-of select="@date"/></td>
      <td><xsl:value-of select="@text"/></td>
    </tr>
  </xsl:template>
  
</xsl:stylesheet>
You can download the files used in the examples by right clicking on the links and selecting "Save As...". logfile.xml, loglines.xml and logfile.xsl.