SBS NextBus: Behind the Scenes

08 Sep 2008 · #nextbus · #techtalk

I must admit that I am slightly surprised by the positive reception to the SBS NextBus app that I announced in my previous post. I thought it would languish in obscurity like most of the other toys that I’ve created over the years but a positive recommendation from Mr Brown turned on a firehose of traffic that, after almost a week now, still hasn’t died out.

The attention also reminded me of how old this blog is now. When I started it, Sounds from the Dungeon sounded like a cool name and had some meaning to boot. Now I am old and jaded and it just grates on my nerves whenever someone writes words like, Deepak of Sounds from the Dungeon says.. It’s high time I streamlined the whole antrix.net experience, getting rid of all the archaic and disconnected stuff around here, reorganizing the content into a cohesive whole and getting rid of the whole dungeon metaphor in the process.

Anyway, getting back to topic, I do hope that the traffic to the SBS NextBus app converts to a steady userbase; there’s nothing like your users screaming at you to provide motivation to maintain the app!

Speaking of maintaining the app, I committed a couple of feature additions to SBS NextBus last night and with those, I think the app is basically done. The only updates it is likely to see are when the parsers break because SBS Transit decided to change something upstream. I do hope such situations would be really rare!

With the app basically done, I figured it would be worthwhile to describe here some of the technical bits involved in building the app. If you are the kind that enjoys such geekery, read on :-)

As is obvious from the domain - sbsnextbus.appspot.com - the app is running on Google App Engine. The basic application is fairly straight-forward but there are a couple of twists in its implementation that are worth noting. First of all, the core logic behind the app is dead-simple. When a user requests for bus arrival info at a stop, the app must:

Fetch page from the mobile iris site for a bus stop and parse that page for services operating at the stop.
For each service, fetch page from iris that shows arrival info at that stop.
Collate results and render output page.

I had the basic code structure and rough-cut implementation of this done in about an hour or so, most of which was spent in writing the html parsing bits which I wrote based on html served by the mobile iris site to Firefox. This was a mistake. When I actually started fetching pages from App Engine’s development server, the parsers broke down because iris was serving different html to these requests! One quick way to fix this would have been to emulate Firefox’s User Agent in the requests to iris. Unfortunately, App Engine’s urlfetch currently doesn’t provide any way of overriding the user agent identifier. So I had to basically rewrite the parsing bits to adapt to the different html. It didn’t take much time but still counted as wasted effort.

Once the basic app was ready, I had to tackle what I knew would be a problem going in. For those of you who are not aware of its design, App Engine does not allow for long running requests in apps and requires that each HTTP request to your app be served within a short time. If you exceed this short time interval at your disposal, you app process will be killed and the user’s request terminated. Although Google does not document the exact duration, looking at App Engine’s app monitoring dashboard, I can say that this time limit is around 8 seconds, i.e. every HTTP request to your app has to be handled and a response sent back within 8 seconds. Since App Engine also forbids spawning of threads or processes, you are essentially limited to less that 8 seconds of sequential processing from request to response. For the NextBus app, if it needs to query the iris site for 10 bus numbers at a particular bus stop, all those requests have to happen sequentially and there’s no telling how long each remote request would take to complete. Thus, it is quite likely that the 8 second limit is exceeded and your app process is killed for taking too long to complete. The fact that App Engine’s urlfetch call does not support specifying request time outs just exacerbates the problem.

Thankfully, the App Engine environment doesn’t just summarily kill your app on exceeding the alloted time quota. Instead, your app receives a DeadlineExceededError exception and is given a few more cycles to finish cleaning up before it is killed for good. So we can exploit this last chance at redemption in a manner similar to this psuedocode:

try:
    for service in services:
        if service not in memcache:
            # fetch page from iris
            # parse html and extract timings
            # add to memcache
        else:
            # fetch timings from memcache
        # add timings to http response
    #return http response
except google.appengine.runtime.DeadlineExceededError:
    # send as response a HTTP redirect to ourselves:
    # 'http://domain/stop/?number=[stop number]&random=[random int]'

So what’s happening here? The core idea is that in the event that our application process receives the DeadlineExceededError exception, we save our state and redirect the user to make the same request one more time. In the subsequent request, we continue from where we left off last time and hopefully in the span of this new request, we’ll be able to finish processing the request. If not, rinse and repeat!

State communication between requests can be done via url request parameters or via saving/reading from the Datastore or via memcache. For this app, I chose the memcache approach. Once a bus service’s arrival timing is known, it is immediately saved to memcache with an expiry time of 1 minute since after a minute, the arrival timings would be practically useless.

Thus, this trick of redirecting to self and spreading the processing load between multiple requests allows us to effectively work around the 8 seconds limitation imposed by App Engine. However, there’s one final issue to be resolved. The class of the exception thrown seems to vary with environment. After a bit of googling and testing, I ended up with something like this:

try:
    from google.appengine.runtime import DeadlineExceededError
except:
    # on the development server 
    from google.appengine.runtime.apiproxy_errors import DeadlineExceededError

try:
    # this was noticed in the logs after deployment
    from google3.apphosting.runtime import DeadlineExceededError as DeadlineExceededError2
except:
    DeadlineExceededError2 = DeadlineExceededError

# some app code

try:
    # app logic
except (DeadlineExceededError, DeadlineExceededError2):
    # serve redirect

Hopefully, the inconsistency in the exception class resulting in the boilerplate above will be fixed in a future App Engine update.

That wraps up all that I remember as worth writing about the SBS NextBus app. But before I end, a few words on the general experience of working with App Engine. After oohEmbed.com, this is the second app I’ve built that runs on App Engine. For these kind of apps with bare-bones functionality, using the App Engine is a no-brainer simply because the deployment is so damn easy! For the NextBus app, after I had everything working on the local development server, I just had to go online and register the app id (sbsnextbus), upload the code and visit sbsnextbus.appspot.com. The code just worked and I went from development to deployment in under one minute! The only code change I had to make subsequently was regarding the import of the DeadlineExceededError exception as noted above. Compare this to the process of configuring Python scripts to be served under cgi, fcgi or mod_python in traditional hosting environments where getting the deployment right often takes as long as developing the entire app.

From these two app building experiences, I can confidentally say that if you are writing simple Python web apps, you should strongly consider using the App Engine environment. However, this is a very qualified recommendation since I haven’t yet used the Datastore and have no idea how easy or difficult it would be to adapt to that data model.