nosewheelie

Technology, mountain biking, politics & music.

Processing large datasets in real time

with one comment

I’ve just written up a summary of our latest project on the Workingmouse wiki:

I’ve had the good fortune to recently complete a project at Veitch Lister Consulting (VLC) (a transport planning consultancy) processing large datasets in real-time.

As a consultancy we usually work in corporate environments and as such are bound by the architectural constraints of the organisation. This usually includes the usual “enterprise” constraints such as language (Java 1.4 is common), application servers (usually WebLogic or WebSphere) and databases (Oracle or DB2). However this was one of those rare projects where you have no technical constraints and have the fortune to be working with great people.

Going in, we had three basic requirements, 1) must be callable from a Rails-based web app, 2) must return responses to typical requests in real-time to the webapp (30 seconds was our target) and 3) must query against a dataset that was initially estimated at around 45 TB, but later came down to around 100 GB. Also, as we were contracted for a finite period, we also needed to make sure we had trained up the existing three developers in whatever tools we chose. I won’t be talking about the process we used, suffice to say it was XP-like, supported by tools such as my BDD framework Instinct and web-based agile project management tools.

Source: Processing Large Datasets in Real Time.

Written by Tom Adams

December 20th, 2007 at 10:30 am

Posted in HPC

One Response to 'Processing large datasets in real time'

Subscribe to comments with RSS or TrackBack to 'Processing large datasets in real time'.

  1. Hey Tom,

    Great article as I said before. It might be good to mention that we profiled with YourKit and got the biggest performance boost by removing redundant objects and optimizing hashCode() - so simple, yet so good for you! :)
    Also premature optimization is totally unnecessary. None of the things we thought would really hold us back (such as text files over binary files) did in reality. Profiling with realistic data showed us exactly what should be optimized.

    sanj

    20 Dec 07 at 8:39 pm

Leave a Reply