The Curious Dev

Various programming sidetracks and shiny-object detours

Words From Numbers

Recently I discovered a local site, westernrails.com, provides a weekly update on some train movements in the Perth / WA area, in the form of an e-mag / PDF. (I’m a bit of a train geek, but that’s a different post).

Unfortunately the site doesn’t offer a feed so as a bit of an exercise I decided to write a little downloader script that would grab the whole list of back-issues. For some reason the site editor has named the first 100 PDF files with the words as numbers i.e. 20 => twenty.

This “problem” could certainly be tackled a couple other ways, and parsing the html to get all the links was one option but I went with the words from numbers option. One failing of this option is that it doesn’t cater for mispelt numbers (tweleve).

Numbers to Words

So here’s a little library to return a number as a word:

class NumberUtils {

    static String generateWordFromNumber(nbr) {
        String wordNbr = ""
        def ones = [1:"one", 2:"two", 3:"three", 4:"four", 5:"five", 6:"six", 7:"seven", 8:"eight", 9:"nine"]
        def teens = [11:"eleven", 12:"twelve", 13:"thirteen", 14:"fourteen", 15:"fifteen", 16:"sixteen", 17:"seventeen", 18:"eighteen", 19:"nineteen"]
        def tens = [10: "ten", 20: "twenty", 30:"thirty", 40:"forty", 50:"fifty", 60:"sixty", 70:"seventy", 80:"eighty", 90:"ninety", 100:"one hundred"]

        if (nbr in ones) {
            wordNbr = ones[nbr]
        }
        else if (nbr in tens) {
            wordNbr = tens[nbr]
        }
        else if (nbr in teens) {
            wordNbr= teens[nbr]
        }
        else if (nbr < 100) {
            Integer tensKey = Integer.parseInt(nbr.toString()[0].concat('0'))
            wordNbr = tens.get(tensKey)

            Integer onesKey = Integer.parseInt(nbr.toString()[1])
            wordNbr += "_".concat(ones.get(onesKey))
        }
        else if (nbr < 1000) {
            println "ERROR: ${nbr} => 100s not handled ... yet"
            return null
        }

        return wordNbr
    }
}

What this allows for is to simply call NumberUtils.generateWordFromNumber(27) within your class which for this example, 27 will come back as twenty_seven. For this one, there is probably some groovy syntactical magic that I can’t think of right now to cut this down.

Downloads

The other thing I’ve pushed into its own class, because I might find useful in the future, is the part that downloads the file and saves to disk:

class DownloadUtils {

    static def downloadFile(address, outputDir, filename) {
        println "Downloading '${address}' ... "

        def file = new FileOutputStream(outputDir + filename)
        def out = new BufferedOutputStream(file)
        out << new URL(address).openStream()
        out.close()

        println "Done. Saved to ${outputDir}${filename}"
    }
}

This class has one method which accepts 3 parameters:

  • the source URL
  • the destination directory
  • the name to give the file

We simply create a FileOutputStream from the destination parameters, BufferedOutputStream with the output stream. Then comes the groovy magic:

out << new URL(address).openStream()

This one line opens a stream to the URL and downloads and saves the destination file.

Putting It Into Action

So with the above library classes we can now bring it together and download our PDFs.

This script is written to accept 3 parameters:

  • Destination directory
  • Starting Number
  • Ending Number

Here’s the main script:

import NumberUtils
import DownloadUtils

/*
A little script to download a bulk number of files (PDFs in this case) with a very similar naming pattern.

Usage:
c:\>groovy BulkDownloader.groovy c:/Temp/ 1 194

Where:
param1 = path to save to
param2 = lowest issue number to retrieve
param3 = highest issue number to retrieve
*/

def g = new Downloader()
g.outputDir = args[0]
g.issueMin = Integer.parseInt(args[1])
g.issueMax = Integer.parseInt(args[2])
g.process()

class Downloader {
    def site = 'http://www.westernrails.com/'
    def prefix = 'West_Aust_railscene_e-Mag_issue_'
    def suffix = '.pdf'
    def outputDir = ''
    Integer issueMin = -1
    Integer issueMax = -1

    def process() {
        def fileList = []

        if (issueMin < 100) {
            for (i in issueMin..100) {
                def wordNbr = NumberUtils.generateWordFromNumber(i)
                if (wordNbr) {
                    wordNbr = wordNbr.replaceAll(" ", { it -> "_" })
                    fileList.add("${site}${prefix}${wordNbr}${suffix}")
                }
            }
        }

        for (i in (issueMin > 100 ? issueMin:101)..issueMax) {
            fileList.add("${site}${prefix}number_${i}${suffix}")
        }

        fileList.each { srcUrl ->
            def outputFile = srcUrl.tokenize("/")[-1]
            DownloadUtils.downloadFile(srcUrl, outputDir, outputFile)
        }
    }
}

Future Enhancements

So this script is certainly functional, I’ve now got all the issues of the e-mag in my dropbox, but I think I’ll some day expand it a little to be more generic to be able to download from any location.

Another option is to automate the script to run periodically to download the new editions of the e-mag (and keep track of what it’s up to).

Code

I’m still thinking about how I’ll structure my code for these little projects, but for now, these classes will just be dumped in a repo: https://bitbucket.org/sbennettmcleish/bulkdownloader.

I’m thinking of creating a generic repo for all these tools to be packaged logically with a Gradle build script and maybe even some unit tests :)

Update: all code is now here

Comments

Included file 'facebook_like.html' not found in _includes directory