Translate

Image of RHCE Red Hat Certified Engineer Linux Study Guide (Exam RH302) (Certification Press)
Image of Advanced Programming in the UNIX Environment, Second Edition (Addison-Wesley Professional Computing Series)
Image of Linux Kernel Development (3rd Edition)
Image of Android Wireless Application Development

XSL Recognizing Newlines

One of the major improvements in XSLT2 is the support for sequences as a replacement for node-sets. One of the new functions that takes advantage of this support is the the tokenize() function. The tokenize() function is equivalent to Python’s split function which takes a string and a delimiter and returns an array of the substrings that were separated by the specified delimiter(s). Perl and Ruby have an equivalent split function while Unix shells such as zsh and ksh93 provide similar functionality via different mechanisms.

Consider the following trivial example (file.xml):

<?xml version="1.0"?>
<root>
  <text>line1
        line2
        line3
        line4</text>
</root>


Suppose you want to convert the contents of the <text> element into HTML making each line of text a separate paragraph. To do this you need to have a way of splitting the element text into a series of strings using newline (‘/n’) as the delimiter. In other words, the processor must recognize where the newlines are in the element text and act accordingly.

Some XSLT1 processors such as xsltproc provide the EXSLT tokenize() function an an extension function. The following example (file.xsl) works with the xsltproc processor:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:str="http://exslt.org/strings"
   extension-element-prefixes="str">

<xsl:output method="html"/>

<xsl:template match="//text/text()">
   <xsl:for-each select="str:tokenize(.,'&#xA;')">
     <p><xsl:value-of select="normalize-space(.)"/></p>
   </xsl:for-each>
</xsl:template>

</xsl:stylesheet>


Here is the output when the processor transforms the input document file.xml:

$ xsltproc file.xsl file.xml
  <p>line1</p><p>line2</p><p>line3</p><p>line4</p>
$


Here is an XSLT2 stylesheet to do the same transformation:

<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="html"/>

<xsl:template match="//text/text()">
   <xsl:for-each select="tokenize(., '\n')">
     <p><xsl:value-of select="normalize-space(.)"/></p>
   </xsl:for-each>
</xsl:template>

</xsl:stylesheet>


and here is the output using the Saxon Java-based XSLT2 processor:

$ cat saxon
#!/bin/bash
java -jar /usr/share/java/saxon.jar -s:$2 -xsl:$1
$
$ ./saxon file.xsl file.xml
<p>line1</p>
<p>line2</p>
<p>line3</p>
<p>line4</p>
$ 


One thing I like about the Saxon XSLT processor is that it tends to output the transformed document in a more readable format. As you can see each line of text is outputted on a separate line whereas xsltproc serialized the lines of text. Semantically the outputs are equivalent but the default output from the Saxon processor is easier on the eye.
 

Comments are closed.