Monday, July 2, 2007

High Performance writing to XML Log files.

A characteristic of logfiles is the "append" nature. We append new content at the back of the file. When writing to a normal text file this is very easy and fast.

2007-07-01 13:56:04.313 start process
2007-07-01 13:56:25.837 do something
2007-07-01 13:56:25.853 the end
But when we want to use XML files to write log information things become a little bit more complicated because the xml needs to be valid, and we need a "closing" tag for the file.
<?xml version="1.0" encoding="UTF-8"?>
<log>
<logline date="2007-07-01 13:56:04.313" text="start process" />
<logline date="2007-07-01 13:56:25.837" text="do something" />
<logline date="2007-07-01 13:56:25.853" text="the end" />
</log>
This structure means we can no longer just "append" lines to the file. But we need to "seek" the closing tag and write before that. This is an extremely slow process especially when the logfiles become big. (I have seen 1Gb+ logfiles before.) Or even worse, we need to open the logfile as a DOM document. (RAM explosion anyone...) To solve this problem I use a little xml trick.
NOTE: This code will not be parsed correctly by every XML parser on the planet. But all the parsers I have used do it correctly.
The little logfile you saw in the previous example can be rewritten to be stored in 2 separate files. One with the opening and closing tags, and one with the log lines. Then the loglines get included in the "header" file which keeps it valid xml. And we can append lines in the "loglines" file. First the "header" file which contains an include section for the "loglines.xml" file.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log [
&lt;!ENTITY loglines SYSTEM "loglines.xml">
]>
<log>
&loglines;
</log>
Then the loglines.xml file itself. (Note that this file does not validate.)
<logline date="2007-07-01 13:56:04.313" text="start process" />
<logline date="2007-07-01 13:56:25.837" text="do something" />
<logline date="2007-07-01 13:56:25.853" text="the end" />
Now when you parse the header.xml file it returns the complete log file as valid xml and you can append new log lines to the loglines.xml file without having to parse the complete file. As a little bonus... To format the log file as a nice html page you can apply the following xsl stylesheet.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
  <html>
  <body>
    <h2>Log Lines</h2>
    <table border="1">
    <tr bgcolor="#9acd32">
      <th align="left">Date</th>
      <th align="left">Message</th>
    </tr>
    <xsl:apply-templates select="log/logline" >
    <xsl:sort select="@date"/>
    </xsl:apply-templates>
    </table>
  </body>
  </html>
  </xsl:template>
  
  <xsl:template match="logline">
    <tr>
      <td><xsl:value-of select="@date"/></td>
      <td><xsl:value-of select="@text"/></td>
    </tr>
  </xsl:template>
  
</xsl:stylesheet>
You can download the files used in the examples by right clicking on the links and selecting "Save As...". logfile.xml, loglines.xml and logfile.xsl.

No comments: