The Curious Dev

Various coding insights from a curious dev

Testing XSL - Part 1

XSL/XSLT

I refer to all XML manipulations via XSL stylesheets as simply “XSL”. One could get quite particular about calling something XSLT over XSL but really I just associate all things that read/manipulate/produce XML data with XSLT and XPATH as being XSL.

XML is generally the data being consumed and/or produced in addition to the XSL, but not always. I’ve written XSLs that have transformed XML into CSVs or going the opposite way, I’ve written XSLs that have produced PDFs using XSL:FO (with Apache FOP), there are typically many ways to achieve data transformation and using XSL to do this is a powerful one.

Another use of XSL is for validation of a given XML dataset, this effectively entails parsing over a particular XML input and producing an alternate XML output based up on “errors” in the source data.

I’m not going to go into the details of XSL very much, but for a quick start on XSL, have a read over at W3Schools here.

Testing XSLs

Developing an XSL is one thing, you might have solved your development problem, but how do you know it will continue to be the case?

The source format could change or a developer could inadvertantly change the XSL for an entirely unrelated task (a shared template perhaps). You need testing on your XSLs, even if it’s just for detecting regressions.

I’ve implemented a very simple framework layered on top of XMLUnit that allows repeatable unit tests on an XSL in a very painless way.

With the tiny extra layer, you can write a unit test by simply providing the XSL, the input XML and the output XML. At this stage, I haven’t really tried it for anything like XML -> CSV transformations but I’m sure it could be helpful there too with some tweaking. Perhaps Google-Diff-Match-Patch could be helpful in this case.

XMLUnit

XMLUnit is a library that provides a few useful functions to compare two pieces of XML whether they’re exactly the same or the same but with differing white-space.

It looks like they’ve just updated to version 1.4 as of February so there might be something new in there. I’ve been using the September 2009 version 1.3 up til now.

I recommend having a look at the user guide here.

Source/Input XML

For the example, I’ve sourced an XML list of countries from here. Which has the form:

countrylist.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<document>
  <row>
      <Col0>9</Col0>
      <Col1>Australia</Col1>
      <Col2>Commonwealth of Australia</Col2>
      <Col3>Independent State</Col3>
      <Col4/>
      <Col5/>
      <Col6>Canberra</Col6>
      <Col7>AUD</Col7>
      <Col8>Dollar</Col8>
      <Col9>+61</Col9>
      <Col10>AU</Col10>
      <Col11>AUS</Col11>
      <Col12>036</Col12>
      <Col13>.au</Col13>
  </row>
</document>

Expected/Output XML

This data contains more than we want and isn’t entirely tidy with the elements named like “Col1”, “Col2”, “Col8”.

For the example I’m going to produce a list of “Independent State” countries with their Name, Capital and TLD.

To make the XSL a little less trivial, I’ll also only choose those entries with:

  • names beginning with ‘A’
  • their currency to be ‘Dollar’
  • and their TLD to start with a ‘.a’

So the result will be of the form:

result.xml
1
2
3
4
5
6
7
8
9
10
11
12
<Countries>
  <Country>
      <Name>Antigua and Barbuda</Name>
      <Capital>Saint John's</Capital>
      <TLD>.ag</TLD>
  </Country>
  <Country>
      <Name>Australia</Name>
      <Capital>Canberra</Capital>
      <TLD>.au</TLD>
  </Country>
</Countries>

Transforming with XSL

This transformation process is pretty typical of the job XSL is called to do, we take input in one form and output in another.

TransformCountries.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<?xml version='1.0' encoding='utf-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output encoding="UTF-8" indent="yes" method="xml"/>

    <xsl:template match="/document">
        <xsl:element name="Countries">
            <xsl:for-each select="row">
                <xsl:if test="starts-with(Col1,'A')
                    and Col3='Independent State'
                    and Col8='Dollar'
                    and starts-with(Col13,'.a')">
                    <xsl:element name="Country">
                        <xsl:element name="Name">
                            <xsl:value-of select="Col1"/>
                        </xsl:element>
                        <xsl:element name="Capital">
                            <xsl:value-of select="Col6"/>
                        </xsl:element>
                        <xsl:element name="TLD">
                            <xsl:value-of select="Col13"/>
                        </xsl:element>
                    </xsl:element>
                </xsl:if>
            </xsl:for-each>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

If you’re using IntelliJ, you can execute the XSL directly from the Run menu and simply have to provide the popup dialog with the input file.

Run XSL in IntelliJ

From there, you can simply hit the play/run button (or SHIFT+F10) to run.

Upon execution our source XML is transformed by the XSL and the resulting XML is output to the console.

XSL Output

This looks to be the form we want, but how can we make that a repeatable process?

Testing

In testing code, one generally will exercise various parts that presumably cover as much functionality and logical branches within the code to give some sense of understanding that it works as expected. This isn’t as easy with XSL as it’s pretty much an all or nothing kinda deal, but there are some things that can be done to provide some sense of comfort with the correctness of your XSL.

In my simple solution below, I’ve merely brought a Diff into a JUnit test, so with that it requires one to provide an expected result XML file, along with an input XML file.

Producing these files should not be much more of a burden to the developer as typically one would have access to some kind of input data from another system perhaps. The expected output XML could possibly be simply created from whatever specs you have been provided as it is ultimately what you’re writing the XSL for.

The Diff is done by XMLUnit but the thin layer I’ve written produces some reasonably useful output to the developer should the test fail.

Here is the very simple unit test which simply calls the XslTestHelper class, abstracting away any complexities to do with XSL and classes like TransformerFactory, StreamResult and DocumentBuilderFactory.

XslUtilsTest.groovy
1
2
3
4
5
6
7
8
    @Test
    def void testCountryTransformWithXMLUnit() {
        XslTestHelper.execute(
            XslUtils.loadXslFromFile("xsl/TransformCountries.xsl"),
            XslUtils.loadXmlFromFile("xml/countrylist.xml"),
            XslUtils.loadXmlFromFile("xml/countrylistExpected.xml")
        )
    }

Here is the XslTestHelper class which performs the heavy lifting.

XslTestHelper.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
class XslTestHelper {
    static void execute(Source xsl, String inputXml, String expectedXml) throws Exception {
        //important, makes the diffs ignore the whitespace between XML elements
        XMLUnit.setIgnoreWhitespace(true)

        String actualXml = XslUtils.transformXmlWithXsl(inputXml, xsl)
        Document expectedDocument = XMLUnit.buildDocument(XMLUnit.newControlParser(), new StringReader(expectedXml))
        Document actualDocument = XMLUnit.buildDocument(XMLUnit.newControlParser(), new StringReader(actualXml))

        Diff diff = new Diff(expectedDocument, actualDocument)
        DetailedDiff detailDiff = new DetailedDiff(diff)
        detailDiff.overrideElementQualifier(null)

        XslTestHelper.assertValid(detailDiff)
    }

    static void assertValid(DetailedDiff detailedDiff) {
        //use junit assertion to check the xsl output is the same as that expected
        assertTrue(
            XslTestHelper.printDifferences(detailedDiff.getAllDifferences()),
            (detailedDiff.getAllDifferences() != null
                && detailedDiff.getAllDifferences().size() == 0
                && detailedDiff.similar()
                && detailedDiff.identical()
            )
        )
    }

    private static String printDifferences(List<Difference> diffs) {
        //pull all diffs together to display nicely in junit assertion fail
        def sb = new StringBuilder()
        diffs.each { d ->
            sb << "${d}\n"
        }

        return sb.toString()
    }
}

Firstly, there is execute which is called by the unit test and calls out to another class XslUtils which does the actual transformation of the input XML with the XSL. The result of that is then put through XMLUnit to produce a DetailedDiff.

The assertValid method is then called which simply checks to see if the diff process has produced any tangible differences (i.e. ignoring white-space).

The last method, printDifferences is merely a display method to produce a somewhat nicely formatted list of the differences detected (if any).

I’ve deliberately edited the input file to cause a test fail and produce a diff, in this case I’ve added an extra ‘s’ to the domain for Australia.

Output of Failed Unit Test

Tools

There are many ways you can edit XML & XSL and sometimes if you’re only tweaking something then just a text editor like Notepad++ will suffice. However, when you’re getting deep into something, or writing something from scratch, an XML editing tool can be a great help.

Some of the commerical options available include:

These are all reasonable choices, but my tool of choice recently has become IntelliJ IDEA which is amazing. In my current role I have been doing a lot of XSLs lately and I’ve both XmlSpy and IntelliJ installed on my machine. I have found however, that I just really don’t need XmlSpy at all as IntelliJ provides all the code-completion functionality I need. It’s actually more useful in the sense that IntelliJ is almost reading your mind when it comes to where you want the cursor upon inserting code, whereas XmlSpy is a little less desirable in that it doesn’t actually complete the XML element for you.

One last advantange is that I can be sure (at least more so than XmlSpy) that my XSL will produce the same results as when my XSL is executed in the non-development environment. There are some variances that tend to crop up from time to time between the different XSL processor implementations, developing code in IntelliJ that will be run using the same JAXP Xalan library is logical. Also worthly of mention is that my XSLs can live alongside all the other code I’m writing without having to jump out to another application.

Code

The code described in this post is available here. I’ve also gone down the path of using Gradle to manage the build of this project as I’ve been following it quite closely over the last year or so but have not really played with it up until now, so it’s something I’ll no doubt delve deeper into in the future. To execute the tests in the project, simply run gradlew test and it should sort out all the dependencies.

Summary

XSL can help you solve transformation problems in simpler ways than you might imagine, give it a try … and then write unit tests for them :)

Coming up in Part 2, I’ll delve into more fine grained unit tests.

Messaging With RabbitMQ

Over the last few weeks, since I finished reading RabbitMQ in Action, I have played about with RabbitMQ a little … and I like it!

The whole integration / middleware / messaging space is wide and varied, but for this post I’m just referring to messaging with RabbitMQ.

What is RabbitMQ?

RabbitMQ is a mature open source project (with commercial support) and has been around for quite a few years. Rabbit implements the AMQP format and there is also support via plugins for the STOMP and MQTT with some tinkering done to utilise the (perhaps) more pervasive JMS. The vast platform and language compatibility that is available points to wide adoption.

Messaging is often at the core of an integration / middleware solution but can also be very useful for decoupling various modules of a system.

This post is just a brief intro to one particular form of messaging with RabbitMQ, using a “fanout” exchange whereby messages are routed to an exchange/broker which will then broadcast the message to all connections to that exchange (in our case, just one) with the corresponding routing key.

A similar and perhaps more powerful function would be Publisher/Subscriber “PubSub” using a Topic that potentially has multiple Subscribers that “listen” on a topic or many topics for specific message types (as configured in the Subscriber’s connection to the exchange in the routing key). A scenario that might be useful to have multiple subscribers to a stream of data would be HR data in a company where various information for all new employees is published to a topic and various applications consume that data in numerous ways. For example, an Incident Reporting System might keep track of all employees’ contact details or a Payroll System would get updated bank account information. In a future post I might go deeper into this.

There doesn’t have to just be one publisher either, you might have several systems providing information into an exchange, weather station sensors perhaps, that is then subscribed to by many other systems.

RabbitMQ can be configured in many ways to suit your project needs, be they performance considerations, reliability, guaranteed delivery etc. I recommend getting RabbitMQ in Action to further explore these options.

Installing RabbitMQ

  • get the Erlang runtime from here and install
  • then, get the RabbitMQ server from here and install
  • if you’re using Groovy, you can use this @Grape/@Grab to sort out the RabbitMQ client dependencies:
1
2
3
@Grapes(
  @Grab(group='com.rabbitmq', module='amqp-client', version='3.0.1')
)

If you’re using Java, you will likely need to manually download the client library from here, unless you’re using Maven or equivalent.

Creating a Producer

I’ve written a pretty simple little groovy class that generates a payload of raw text from one of my earlier scripts, as described in my post here: Simple Data Collectors.

Essentially, it just calls the temperature script and publishes that provided temperature to “temperatureExchange” which is configured to be a “fanout” exchange. I’ve thrown in a random sleep (1-30secs) to mix it up a bit.

WeatherProducer.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
@Grapes(
  @Grab(group='com.rabbitmq', module='amqp-client', version='3.0.1')
)
import com.rabbitmq.client.*

class WeatherProducer {
  ConnectionFactory factory = null
  Connection conn = null
  String exchangeName = "WeatherExchange"
  String routingKey = "weather"
  String queueName = "WeatherQueue"
  Expando bomSite = null
      
  def start() {
      Channel channel = getNewChannel()
      channel.exchangeDeclare(exchangeName, "fanout", true)
      channel.queueDeclare(queueName, true, false, false, null)
      def extractTemperatures = new ExtractTemperatures()

      while (true) {
          int sleepTime = Math.random()*30000
          
          String myData = extractTemperatures.processBOM(bomSite)
          
          channel.basicPublish(exchangeName, routingKey, MessageProperties.PERSISTENT_TEXT_PLAIN, myData.getBytes())

          println "Sleeping for ${sleepTime}ms"
          sleep(sleepTime)
      }

      channel.close()
      conn.close()
  }
  
  Channel getNewChannel() {
      if (factory == null || conn == null) {
          factory = new ConnectionFactory()
          factory.setUsername("guest")
          factory.setPassword("guest")
          factory.setVirtualHost("/")
          factory.setHost("localhost")
          factory.setPort(5672)
          conn = factory.newConnection()
      }
      
      return conn.createChannel()
  }
}

Weather Producer

Creating a Subscriber

WeatherConsumer.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@Grapes(
  @Grab(group='com.rabbitmq', module='amqp-client', version='3.0.1')
)
import com.rabbitmq.client.*

class WeatherConsumer {
  ConnectionFactory factory = null
  Connection conn = null
  String exchangeName = "WeatherExchange"
  String routingKey = "weather"
  String queueName = "WeatherQueue"
  
  def execute() {
      Channel channel = getNewChannel()
      channel.queueBind(queueName, exchangeName, "#");
      println " Queue: ${queueName} "

      boolean noAck = false;
      def consumer = new QueueingConsumer(channel);
      channel.basicConsume(queueName, noAck, consumer);
      boolean running = true

      while(running) {
          QueueingConsumer.Delivery delivery;
          try {
              delivery = consumer.nextDelivery();
              println new String(delivery.body) + " - " + System.currentTimeMillis()
          } catch (InterruptedException ie) {
              //we don't really care .. do we?
              println ie.getMessage()
              running = false
          }
          
          channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
      }
  }
  
  Channel getNewChannel() {
      if (factory == null || conn == null) {
          factory = new ConnectionFactory()
          factory.setUsername("guest")
          factory.setPassword("guest")
          factory.setVirtualHost("/")
          factory.setHost("localhost")
          factory.setPort(5672)
          conn = factory.newConnection()
      }
      
      return conn.createChannel()
  }
}

Weather Consumer

Running It

I’ve written a very simple script file RunWeatherMessager.groovy that takes one argument and simply starts up a Producer or a Consumer with a ‘pub’ or ‘sub’. Like this:

1
c:\>groovy RunWeatherMessager.groovy pub

Just open two consoles and run a Producer and a Consumer.

RunWeatherMessager.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
if (args[0] == "pub") {
  println "Starting Weather producer"
  def pub = new WeatherProducer()
  pub.start()
}
else if (args[0] == "sub") {
  println "Starting Weather consumer"
  def sub = new WeatherConsumer()
  sub.execute()
}
else {
  println "===ERROR==="
  println "Only valid options are 'pub' or 'sub'."
}

Code

The above code examples are here

Simple Data Collectors

The city that I live in, Perth, is sprawled out over a large distance and so the BOM’s temperature for Perth isn’t entirely accurate. Enter the Ag Dept’s temperature sensors here or more specifically, the one for Wanneroo here. These sensors are updated every minute (most of the time) and provide a more accurate measure of what the weather is doing right now. Rather than visiting the page manually, I wrote a script to grab the temperature for use elsewhere, such as on a dashboard.

I’ve previously written something similar that extracted the Perth value from the BOM which delightfully provide a JSON feed. So I rolled them in together and maybe I’ll use them both in the future, maybe to show how effective our afternoon sea-breeze is the closer one is to the coast :)

The BOM script

This script simply grabs the text of the given URL and parses it with the (as of Groovy 2.0) JsonSlurper (note the explicit import).

ExtractTemperatures.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import groovy.json.*

class ExtractTemperatures {
  def processBOM(Expando site) {
      //create a URL and get the text from it
      def content = new URL(site.source).text

      //create a new jsonSluper and parse the json data
      def jsonSlurper = new JsonSlurper()
      def jsonData = jsonSlurper.parseText(content)
      
      //extract just the entry we want
      def latestEntry = jsonData.observations.data[0]
      
      println "\n==== BOM ${site.name} ===="
      println "airTemp = ${latestEntry.air_temp}"
      println "windSpeed = ${latestEntry.wind_spd_kmh}"
      println "windDirection = ${latestEntry.wind_dir}"
      
      return latestEntry.air_temp
  }
  
  //snip...
}

The BOM have provided JSON feeds for most of their data, there are links for most of the BOM station sites at the bottom of each station specific page, see Perth’s here.

The Ag. Dept. script

The WA Ag. Dept. have many stations but I’m only really interested in the Wanneroo one as that is quite close to my location. The script simply grabs the text from the specified page and then through 6 specific regex queries the Temperature, Wind Speed and Wind Direction are extracted.

ExtractTemperatures.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import groovy.json.*

class ExtractTemperatures {
  //snip...
  
  def processAgDeptWithRegex(Expando site) {
      //generic URL values
      String agDeptPrefix = "http://agspsrv34.agric.wa.gov.au/climate/livedata/"
      String agDeptsuffix = "webpag.htm"
      
      //retrieve web page text for the particular site
      String sourceText = new URL(agDeptPrefix + site.source + agDeptsuffix).text
      
      //extract all lines with the "Yellow" formatting, then just get the first one (which we know is the correct one) for Temperature.
      String firstYellowRow = sourceText.findAll(/<td width="68"><font face="Courier New"[\s]?color="yellow">[\s]?(?:[0-9]+\.[0-9]?)[<\/font>]?<\/td>/)[0]
      String currentTemp = firstYellowRow.findAll(/[0-9]+\.[0-9]/)[0] //just get the "decimal" element
      
      //extract all lines with the "Yellow" formatting, then just get the sixth one (which we know is the correct one) for Wind Speed.
      String windSpeedElement = sourceText.findAll(/<td width="68"><font face="Courier New"[\s]?color="yellow">[\s]?(?:[0-9]+\.[0-9]?)[<\/font>]?<\/td>/)[5]
      String currentWindSpeed = windSpeedElement.findAll(/[0-9]+\.[0-9]/)[0] //just get the "decimal" element
      
      //extract all lines with the "Yellow" formatting, then just get the one with ENSW variations in it for Wind Direction.
      String windDirectionElement = sourceText.findAll(/<td width="68"><font face="Courier New"[\s]?color="yellow">[\s]?(?:[ENSW]+)[\s]?[<\/font>]?<\/td>/)[0]
      String currentWindDirection = windDirectionElement.findAll(/(?!=ow">[\s]?)[ENSW]+[\s]?(?=<\/)/)[0] //just get the element value
      
      println "\n==== Ag. Dept. ${site.name} ===="
      println "airTemp = ${currentTemp}"
      println "windSpeed = ${currentWindSpeed}"
      println "windDirection = ${currentWindDirection}"

      return currentTemp
  }
}

Regex is like black magic at times and for this script was aided greatly by the great software Regex Buddy, well worth the small purchase price of $40ish. No doubt those huge regexes to extract the exact line in the sourceText could be a little tighter, they’re long mainly because of the matching of exact strings, but they work and are not noticably slow so I’m happy.

Some regex tips from the above code:

  • [\s]? optionally find a space
  • [0-9]+ find at least one digit, to many
  • [0-9]+? optionally find at least one digit, to many i.e. zero to many.
  • (?!=ow">[\s]?) negative-lookahead, the start of your search with the text ow"> (as in color=”yellow”), with an optional space. The wrapping ( and ) are the important bits here.
  • [ENSW]+ find one to many characters in ‘E’, ‘N’, ‘S’ or ‘W’ allowing for combinations like SSW or SW or just E.
  • (?=<\/)/) positive-lookahead, the end of your search with the text </ (as in the closing “td” tag for the data line). As in the negative-lookahead above, the wrapping ( and ) are important here.

Bringing it all together

Finally, I’ve just written a little script to pull these two other functions together and dropped it into a thread with an infinite loop that sleeps for 60 seconds after executing, so that we can keep on getting this data rather that just once.

RunWeatherCollectors.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
g = new ExtractTemperatures()

def th = Thread.start {
  while (true) {
      println "\n\n ${new Date().toString()}"
      
      def bomPerth = new Expando(source: "http://www.bom.gov.au/fwo/IDW60901/IDW60901.94608.json", name: "Perth")
      g.processBOM(bomPerth)
  
      def agWanneroo = new Expando(source: "wn", name: "Wanneroo")
      g.processAgDeptWithRegex(agWanneroo)
      
      sleep 60000
  }
}

Code

The above code examples are here