Fikin Ant Tasks Pearls: XML Processing on Steroids

XML Processing on Steroids

What is the simplest way to process an XML document?

Okay, I agree this is not a fair question. Lets make it more explicit.

Is there a simpler way to deal with Java JAXP API?

How about following code:

import net.sf.fikin.xml.*; Xslt xsl = new Xslt( Xslt.newStreamSource("my.xsl") ); xsl.transform( Xslt.newStreamSource("in.xml"), new StreamResult( "my.txt" ) );

This is equal to:

import javax.xml.transform.*; import javax.xml.transform.stream.*; // create TrAX factory TransformerFactory defaultFact = TransformerFactory.newInstance(); // instantiate template Templates tmpl = fact.newTemplates( xsl ); // transform Transformer tr = tmpl.newTransformer(); tr.transform( new StreamSource( "in.xml" ), new StreamResult( "my.txt" ) );

Okay, it is shorter but why would the first be much better than the second? So far this looks like an exercise in software programming only, right?

Lets try to complicate thinks a bit then.

What is the simplest way to read the following input input into an Java array?

Obviously you have two options: SAX or DOM.

Lets look at the SAX solution:

import java.util.*; import org.w3c.dom.*; import javax.xml.parsers.*; import org.xml.sax.helpers.*; public class MySaxHandler extends DefaultHandler { boolean gather = false; Vector arr = new Vector(); // get the populated array public Vector getArray() { return arr; } public void startElement( String uri, String localName,String qName, Attributes attributes) throws SAXException { if ( localName.equals( "elem" ) ) gather = true; } public void endElement(String uri, String localName, String qName) throws SAXException { gather = false; } public void characters(char[] ch, int start, int length) throws SAXException { if ( gather ) { arr.add( new String( ch, start, length ) ); } } } Vector arr = new Vector(); SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser parser = factory.newSAXParser(); MySaxHandler saxHandler = new MySaxHandler(); parser.parse( new FileInputStream( "in.xml" ), saxHandler ); // saxHandler.getArray()

And the DOM solution:

import java.util.*; import org.w3c.dom.*; import javax.xml.parsers.*; Vector arr = new Vector(); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document dom = builder.parse( new FileInputStream( "in.xml" ) ); NodeList list = doc.getElementsByTagName( "elem" ); for( int i=getLength(); i!=0; ) arr.add( list.item( --i ) );

And now lets look how this would look like with Xslt:

// my.xsl looks like this one: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xslt="xalan://net.sf.fikin.xml.Xslt" xmlns:arr="xalan://java.util.Vector" > <xsl:param name="callbackObj" /> <xsl:varialbe name="myPojo" select="xslt:lookupObject( $callbackObj )" /> <xsl:template match="list/elem"> <xsl:value-of select="arr:add( $myPojo, string(.) )" /> <xsl:template> </xsl:stylesheet> import java.util.*; import net.sf.fikin.xml.*; Vector arr = new Vector(); Xslt xsl = new Xslt( Xslt.newStreamSource("my.xsl") ); Hashtable params = new Hashtable( 1 ); params.put( "callbackObj", xsl.exportJavaObject( arr ) ); xsl.transform( Xslt.newStreamSource("in.xml"), params );

So, what is the point here beside the choice?

There are few thinks worth considering.

First, there is the ability to avoid completely DOM and SAX which are cumberstone interfaces to deal with, with pure Java and XSLT. This leads us to a solution with a shorter codebase and one being very much scalable.

I say scalable because XSLT is the ultimate XML processing language one can employ in XML processing.

Lets look some more examples illustrating this point.

Lets change the input document schema but preserve our goal:

// in.xml is changed to this <ns1:list xnlns:ns1="include-this-elems" xmlns:n2="no-not-include-this-data" > <ns1:elem>1</ns1:elem> <ns2:elem>2</ns2:elem> </ns1:list> // now, updating the Xslt-based would require only this change: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xslt="xalan://net.sf.fikin.xml.Xslt" xmlns:arr="xalan://java.util.Vector" xnlns:ns1="include-this-elems" > <xsl:param name="callbackObj" /> <xsl:varialbe name="myPojo" select="xslt:lookupObject( $callbackObj )" /> <xsl:template match="ns1:list/ns1:elem"> <xsl:value-of select="arr:add( $myPojo, string(.) )" /> <xsl:template> </xsl:stylesheet>

Now lets try to add an indexing logic which is to give us only those elements that are in the second list.

// in.xml is changed to this <ns1:list xnlns:ns1="include-this-elems" xmlns:n2="no-not-include-this-data" > <ns1:data> <ns1:elem>1</ns1:elem> <ns2:elem>2</ns2:elem> <ns1:elem>3</ns1:elem> </ns1:data> <ns1:index> <ns1:elem>1</ns1:elem> </ns1:index> </ns1:list> // now, updating the Xslt-based would require only this change: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xslt="xalan://net.sf.fikin.xml.Xslt" xmlns:arr="xalan://java.util.Vector" xnlns:ns1="include-this-elems" > <xsl:param name="callbackObj" /> <xsl:varialbe name="myPojo" select="xslt:lookupObject( $callbackObj )" /> <xsl:template match="ns1:list/ns1:data/ns1:elem[ /ns1:list/ns1:index/ns1:elem = .]"> <xsl:value-of select="arr:add( $myPojo, string(.) )" /> <xsl:template> </xsl:stylesheet>

For homework : compare this with corresponding DOM and/or SAX solutions.

Second is that we can "bind" XSLT and POJO via callbacks. This give us a great flexibility to refacture POJO classes as much as we like without breaking the main transformation logic.

For example we may change the class hierarchy without much difficulty.

// lets replace Vector with our own class public class MyArray { int arr[10]; int index = 0; public void add(int val) { arr[ index++ ] = val; } } // lets use it in the transformation ... MyArray arr = new MyArray(); params.put( "callbackObj", xsl.exportJavaObject( arr ) ); ... // and modify my.xsl accordingly <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xslt="xalan://net.sf.fikin.xml.Xslt" xmlns:arr="xalan://MyArray" > <xsl:param name="callbackObj" /> <xsl:varialbe name="myPojo" select="xslt:lookupObject( $callbackObj )" /> <xsl:template match="list/elem"> <xsl:value-of select="arr:add( $myPojo, number(.) )" /> <xsl:template> </xsl:stylesheet>

and this pretty much is all!

Third is that we can extend XSLT with custom code, not part of XSLT specification.

But hold a second, isn't there a catch? How about XSLT interoperability? Isn't this going to be a problem at some point of time?

Perhaps yes but most probably not. In majority of my experience it has not been. And this is not an exaggeration. The situations where I had to switch from Xalan (which is shipped with J2SDK TrAX) with Saxon has been zero. And situations when I had to switch from Java to C has been only one (the outcome of this was that there was no performance advantage of C beside the process startup time).

I'm fully aware that not everyone feels comfortable with approach where one deviates from XSLT specification. But this is not a sufficient reason to opt for a much more lengthy coding solution.

Using scripting and callback POJOs can greatly improve XSLT transformations which has a positive impact on the development effort and code complexity.

I'd like to encourage you to play with this idea and explore the depths to which you will find it convenient for your problems.