One of the many podcasts I listen to on my daily commute is the Thrasher’s Wheat Radio Hour which offers a little insight into the huge and varied catalog of Neil Young.
I’ve been a fan for many years now and when I discovered this show I immediately wanted to subscribe to the podcast feed in my favourite “podcatcher” Miro. But this was to no avail … I could not find the feed link! The most recent show is merely added to an existing blog post as an update on the original page: http://neilyoungnews.thrasherswheat.org/2012/08/podcast-thrashers-wheat-radio-hour-show.html.
This wouldn’t do, so I decided I could roll my own feed of the show.
After initially writing a reasonably useful regex to parse the blog post for the shows and produce a feed, I then discovered an even easier way. The mp3 files for the show are simply placed in the http://thrasherswheat.org/twradio/ directory and upon browsing there one gets a simple listing of the files.
Adapting my regex to use this was straight forward.
(?!href=")TWR-Episode([0-9][0-9]?.mp3|-[0-9][0-9]?-on-[0-9][0-9]?-[0-9][0-9]?-[0-9][0-9]\.(mp3|wav))(?=">\s)
Next, I needed a way of serving up the feed and the logical solution for me was Grails on Heroku.
To be fair, using Grails to produce a single XML feed is serious overkill, and the produced WAR is nearly 90MB, another lighter solution might be Ratpack, but I’ll delve into that another day. Grails is my favourite web framework of the last few years and rather than optimising for a small WAR file, I am attracted to the quick development cycle on offer.
The basic structure of the app is thus:
- A
FeedFetcherService
service that screen scrapes the directory and returns an ordered list of URLs for each show - A view at
feedFetcher/index.gsp
that serves up the XML to the requesting application - A
FeedFetcherController
controller that calls the above service and forwards the data to the view
Note that there are no domain classes, at this point there’s no need to store anything, just build the feed anew upon every request.
Service
class FeedFetcherService {
//private String endpointUrl = "http://neilyoungnews.thrasherswheat.org/2012/08/podcast-thrashers-wheat-radio-hour-show.html"
String endpointUrl = "http://thrasherswheat.org/twradio/"
def checkSource() {
println "using URL: ${endpointUrl}"
//retrieve web page text
String sourceText = new URL(endpointUrl).text
//extract all links to episodes with regex
def podcasts = sourceText.findAll(/(?!href=")TWR-Episode([0-9][0-9]?.mp3|-[0-9][0-9]?-on-[0-9][0-9]?-[0-9][0-9]?-[0-9][0-9]\.(mp3|wav))(?=">\s)/)
def downloadList = processList(podcasts)
return downloadList
}
def processList(def podcasts) {
//iterate through the list and add to download list if within requested boundaries
def downloadList = [:]
podcasts.each { p ->
String pNbr = p.findAll(/(?!Episode[-]?)([0-9][0-9]?)(?=(-on-|.mp3))/)[0]
println "filename: ${endpointUrl}${p}"
assert new Integer(pNbr)
downloadList.putAt(pNbr, p)
}
//sort in reverse episode nbr order
downloadList = downloadList.sort { p1, p2 -> new Integer(p2.key) <=> new Integer(p1.key) }
return downloadList
}
}
So in the above code, the checkSource
method grabs the text content returned from http://thrasherswheat.org/twradio/
and runs the regex on it to produce a list of strings that corresponds to the episodes.
The processList
method is then called to clean up the URLs into a map and then sort the map in reverse-chronological order.
Those two methods probably represent 75% of the effort.
Controller
class FeedFetcherController {
def feedFetcherService
def index() {
def eps = feedFetcherService.checkSource()
List<?> epList = []
eps.each {
def ep = new Expando()
ep.key = it.key
ep.value = it.value
ep.url = "http://thrasherswheat.org/twradio/" + it.value
ep.description = it.value.toString().substring(0, it.value.toString().size()-4)
epList << ep
}
[episodes: epList, timeNow: getTimeNow()]
}
private String getTimeNow() {
SimpleDateFormat sdf = new SimpleDateFormat("EEE, d MMM yyyy HH:mm:ss z")
def cal = GregorianCalendar.getInstance()
return sdf.format(cal.time)
}
}
The controller calls the service and then builds a list of Expando objects (use em!) with the current time. This list is then handed to the view.
View
<%@ page import="javax.swing.text.DefaultEditorKit" contentType="text/xml;charset=UTF-8" %><?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Thrasher's Wheat Radio</title>
<description>The TWR podcast</description>
<link>http://neilyoungnews.thrasherswheat.org/2012/08/podcast-thrashers-wheat-radio-hour-show.html</link>
<language>en-us</language>
<copyright>Copyright 2013</copyright>
<lastBuildDate>${timeNow}</lastBuildDate>
<pubDate>${timeNow}</pubDate>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<webMaster>scott@dands.ws</webMaster>
<g:each in="${episodes}" var="ep">
<item>
<title>${ep.value}</title>
<link>http://neilyoungnews.thrasherswheat.org/2012/08/podcast-thrashers-wheat-radio-hour-show.html</link>
<guid>${ep.key}</guid>
<description>${ep.description}</description>
<enclosure url="${ep.url}" length="1" type="audio/mpeg"/>
<category>Podcasts</category>
<pubDate>${timeNow}</pubDate>
</item>
</g:each>
</channel>
</rss>
In the interest of speed, I decided to just hack this up and grabbed the XML from another podcast and stripped out all unneccessay data and hardcoded as much as possible. The view simply iterates over the provided list of Expandos to produce the item
elements.
In the case of Miro at least, I believe all that is needed is for the GUID
value to be unique for each episode and it can track what it already has, what is new etc.
Heroku Deployment
I wont go into detail about the Heroku deployment as it is well documented here and it’s really much like any other Heroku app in that when you’re ready to deploy you:
- create an Heroku app
heroku create
- create a Procfile
- commit it to the git repo and then push
git push heroku master
And that’s it. Probably less than two hours development time with a little bit of tweakage here and there.
The feed is available here:
murmuring-depths-8428.herokuapp.com/feedFetcher/index
Code
As the code is in a Heroku Git repo, I don’t have a link to share, but maybe I’ll throw it up on Github if someone is interested.