Friday, May 25, 2007

Using a schema collection to validate output of FOR XML RAW

Schema collections are used in SQL Server 2005 to create an xml datatype field that is validated with an XML Schema. Unfortunately there is an xsd feature that is not implemented very nicely is the processContents="lax" attribute. The attribute processContents is implemented, but where normal xsd supports the values strict, lax and skip. The sql server schema collection parser only supports strict and skip. And unfortunately using lax does not degrade, it just gives a compilation error. So reusing prebuild xsd files is going to create some issues that need to be resolved before you can compile them in the database. I will show an example where using the processContents="lax" technique would be usefull. The following schema can be used to create a datatype to store a result of a "for xml raw" query. When you set processContents="lax" you will get a compilation error. Although it would be usefull, we have to be content with processContents="skip".

IF EXISTS(SELECT * FROM sys.schemas WHERE name ='xml_raw_schema')
    DROP XML SCHEMA COLLECTION  dbo.xml_raw_schema
GO
CREATE XML SCHEMA COLLECTION xml_raw_schema AS
 N'<?xml version="1.0"?>
 <xsd:schema xmlns:targetNamespace="http://schemas.mycompany.com/myproduct"
            elementFormDefault="qualified"
            attributeFormDefault="unqualified"
            xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <xsd:element name="table">
     <xsd:complexType>
        <xsd:sequence>
           <xsd:element name="row" minOccurs="0" maxOccurs="unbounded">
              <xsd:complexType>
                 <xsd:anyAttribute processContents="skip" />
              </xsd:complexType>
           </xsd:element>
        </xsd:sequence>
     </xsd:complexType>
   </xsd:element>
</xsd:schema>'

You can use the schema collection like this:
declare @xml xml(xml_raw_schema)
SET @xml = (select * from sys.tables FOR XML RAW, TYPE, ROOT('table'))
select @xml
Which shows (in my test case) the following output:
<table>
  <row name="Orders" object_id="2073058421" schema_id="1" parent_object_id="0" type="U " type_desc="USER_TABLE" create_date="2007-04-12T17:30:08.150" modify_date="2007-04-12T17:46:17.043" is_ms_shipped="0" is_published="0" is_schema_published="0" lob_data_space_id="0" max_column_id_used="3" lock_on_bulk_load="0" uses_ansi_nulls="1" is_replicated="0" has_replication_filter="0" is_merge_published="0" is_sync_tran_subscribed="0" has_unchecked_assembly_data="0" text_in_row_limit="0" large_value_types_out_of_row="0" />
</table>
Although we have to be carefull that it does not slow down processing too much I find it a very valuable technique because you can achieve a very basic level of type safety.

No comments: