I refer to all XML manipulations via XSL stylesheets as simply “XSL”. One could get quite particular about calling something XSLT over XSL but really I just associate all things that read/manipulate/produce XML data with XSLT and XPATH as being XSL.
XML is generally the data being consumed and/or produced in addition to the XSL, but not always. I’ve written XSLs that have transformed XML into CSVs or going the opposite way, I’ve written XSLs that have produced PDFs using XSL:FO (with Apache FOP), there are typically many ways to achieve data transformation and using XSL to do this is a powerful one.
Another use of XSL is for validation of a given XML dataset, this effectively entails parsing over a particular XML input and producing an alternate XML output based up on “errors” in the source data.
I’m not going to go into the details of XSL very much, but for a quick start on XSL, have a read over at W3Schools here.
Developing an XSL is one thing, you might have solved your development problem, but how do you know it will continue to be the case?
The source format could change or a developer could inadvertantly change the XSL for an entirely unrelated task (a shared template perhaps). You need testing on your XSLs, even if it’s just for detecting regressions.
I’ve implemented a very simple framework layered on top of XMLUnit that allows repeatable unit tests on an XSL in a very painless way.
With the tiny extra layer, you can write a unit test by simply providing the XSL, the input XML and the output XML. At this stage, I haven’t really tried it for anything like XML -> CSV transformations but I’m sure it could be helpful there too with some tweaking. Perhaps Google-Diff-Match-Patch could be helpful in this case.
XMLUnit is a library that provides a few useful functions to compare two pieces of XML whether they’re exactly the same or the same but with differing white-space.
It looks like they’ve just updated to version 1.4 as of February so there might be something new in there. I’ve been using the September 2009 version 1.3 up til now.
I recommend having a look at the user guide here.
For the example, I’ve sourced an XML list of countries from here. Which has the form:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
This data contains more than we want and isn’t entirely tidy with the elements named like “Col1”, “Col2”, “Col8”.
For the example I’m going to produce a list of “Independent State” countries with their Name, Capital and TLD.
To make the XSL a little less trivial, I’ll also only choose those entries with:
- names beginning with ‘A’
- their currency to be ‘Dollar’
- and their TLD to start with a ‘.a’
So the result will be of the form:
1 2 3 4 5 6 7 8 9 10 11 12
Transforming with XSL
This transformation process is pretty typical of the job XSL is called to do, we take input in one form and output in another.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
If you’re using IntelliJ, you can execute the XSL directly from the Run menu and simply have to provide the popup dialog with the input file.
From there, you can simply hit the play/run button (or SHIFT+F10) to run.
Upon execution our source XML is transformed by the XSL and the resulting XML is output to the console.
This looks to be the form we want, but how can we make that a repeatable process?
In testing code, one generally will exercise various parts that presumably cover as much functionality and logical branches within the code to give some sense of understanding that it works as expected. This isn’t as easy with XSL as it’s pretty much an all or nothing kinda deal, but there are some things that can be done to provide some sense of comfort with the correctness of your XSL.
Producing these files should not be much more of a burden to the developer as typically one would have access to some kind of input data from another system perhaps. The expected output XML could possibly be simply created from whatever specs you have been provided as it is ultimately what you’re writing the XSL for.
The Diff is done by XMLUnit but the thin layer I’ve written produces some reasonably useful output to the developer should the test fail.
Here is the very simple unit test which simply calls the XslTestHelper class, abstracting away any complexities to do with XSL and classes like TransformerFactory, StreamResult and DocumentBuilderFactory.
1 2 3 4 5 6 7 8
Here is the XslTestHelper class which performs the heavy lifting.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Firstly, there is
execute which is called by the unit test and calls out to another class
XslUtils which does the actual transformation of the input XML with the XSL. The result of that is then put through XMLUnit to produce a
assertValid method is then called which simply checks to see if the diff process has produced any tangible differences (i.e. ignoring white-space).
The last method,
printDifferences is merely a display method to produce a somewhat nicely formatted list of the differences detected (if any).
I’ve deliberately edited the input file to cause a test fail and produce a diff, in this case I’ve added an extra ‘s’ to the domain for Australia.
There are many ways you can edit XML & XSL and sometimes if you’re only tweaking something then just a text editor like Notepad++ will suffice. However, when you’re getting deep into something, or writing something from scratch, an XML editing tool can be a great help.
Some of the commerical options available include:
These are all reasonable choices, but my tool of choice recently has become IntelliJ IDEA which is amazing. In my current role I have been doing a lot of XSLs lately and I’ve both XmlSpy and IntelliJ installed on my machine. I have found however, that I just really don’t need XmlSpy at all as IntelliJ provides all the code-completion functionality I need. It’s actually more useful in the sense that IntelliJ is almost reading your mind when it comes to where you want the cursor upon inserting code, whereas XmlSpy is a little less desirable in that it doesn’t actually complete the XML element for you.
One last advantange is that I can be sure (at least more so than XmlSpy) that my XSL will produce the same results as when my XSL is executed in the non-development environment. There are some variances that tend to crop up from time to time between the different XSL processor implementations, developing code in IntelliJ that will be run using the same JAXP Xalan library is logical. Also worthly of mention is that my XSLs can live alongside all the other code I’m writing without having to jump out to another application.
The code described in this post is available here. I’ve also gone down the path of using Gradle to manage the build of this project as I’ve been following it quite closely over the last year or so but have not really played with it up until now, so it’s something I’ll no doubt delve deeper into in the future. To execute the tests in the project, simply run
gradlew test and it should sort out all the dependencies.
XSL can help you solve transformation problems in simpler ways than you might imagine, give it a try … and then write unit tests for them :)
Coming up in Part 2, I’ll delve into more fine grained unit tests.