nosewheelie

Technology, mountain biking, politics & music.

Archive for December, 2007

Processing large datasets in real time

with one comment

I’ve just written up a summary of our latest project on the Workingmouse wiki:

I’ve had the good fortune to recently complete a project at Veitch Lister Consulting (VLC) (a transport planning consultancy) processing large datasets in real-time.

As a consultancy we usually work in corporate environments and as such are bound by the architectural constraints of the organisation. This usually includes the usual “enterprise” constraints such as language (Java 1.4 is common), application servers (usually WebLogic or WebSphere) and databases (Oracle or DB2). However this was one of those rare projects where you have no technical constraints and have the fortune to be working with great people.

Going in, we had three basic requirements, 1) must be callable from a Rails-based web app, 2) must return responses to typical requests in real-time to the webapp (30 seconds was our target) and 3) must query against a dataset that was initially estimated at around 45 TB, but later came down to around 100 GB. Also, as we were contracted for a finite period, we also needed to make sure we had trained up the existing three developers in whatever tools we chose. I won’t be talking about the process we used, suffice to say it was XP-like, supported by tools such as my BDD framework Instinct and web-based agile project management tools.

Source: Processing Large Datasets in Real Time.

Written by Tom Adams

December 20th, 2007 at 10:30 am

Posted in HPC

The large scale computing space is getting interesting

without comments

Via Gruber via Winer comes Amazon’s SimpleDB:

Amazon SimpleDB is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers.

Based on the limited architecture information you can glean from the API docs it looks like it may be a column store under the hood. Have recently come off a project where we used both Amazon’s EC2 and S3, if this thing is as good, it will be a nice addition to the arsenal. Of course it means not having your data close, and possibly not having as much fine-grained control over indexing etc., but for most people this won’t be an issue (indeed in Kowari/Mulgar we indexed everything like SDB does).

Semantic web developers have known for years that there is no such thing as the universal database any more (if there ever was), it all depends on what your data looks like and how you access it. It’s good to see the mainstream is starting to catch up.

Written by Tom Adams

December 17th, 2007 at 8:36 am

Posted in HPC

Instinct 0.1.6 Release

without comments

I’m happy to announce the release of Instinct 0.1.6. Thanks to all our new users and especially to the guys at VLC & SAP who’ve helped us apply Instinct in anger, and Sanjiv, who’s come on board with development.

Downloads are available from the project site.

This release includes a raft of updates, most notably auto-creation of specification doubles (mocks, stubs & dummies), automatic reset and verification of mocks, a cleanup of the state-based expectation API, fixes for Eclipse JUnit integration, custom classpath support in the Ant task and inherited contexts.

Here’s the full list of updates:

Core

  • Remove the need for @Context annotation.
  • Automatic creation of specification doubles: mocks, stubs and dummies.
  • Automatic reset and verification of mocks.
  • @BeforeSpecification?, @AfterSpecification?, @Specifications (and naming convention equivalents) can be used across base and subclasses.

Expectation API

  • Make expectations more like natural language. eg. isEqualTo(), doesNotEqual(), etc. Existing code using equalTo(), etc. will need to be updated.
  • Collection checkers: hasTheSameContentAs(Collection) and hasTheSameContentAs(E…). These only check content and not the order of elements.
  • Ensure all “collection” classes (Array, Map, Set, List, String, SharSequence?) have similar size checkers available.
  • Added file checker
  • Better error messages for hasBeanProperty and hasBeanPropertyWithValue.

JUnit integration

  • Fix Eclipse unrooted context.

Ant integration

  • Support for custom classpath.
  • Quiet specification result formatting (only shows errors and pending specs).
  • Use correct project logging level for errors, etc.

jMock integration

  • Support states: Mockery.states(String).

Infrastructure

  • Removed reliance on Boost, transferred all relevant Boost classes locally.
  • jMock 2.4.
  • Downgraded to CGLib 2.1.3 (for Maven integration).

Bugs

  • Miscellaneous NullPointerExceptions? and null related problems in state expectation API.
  • (defect-3) IterableChecker? should have a containsOnly method or something.
  • (defect-8) @BeforeSpecification? does not run if implemented in an abstract base class.
  • (defect-20) CEclipse Junit4 InstinctRunner? shows tests under the “Unrooted Tests” node.
  • (defect-22) Context treeview shows baseclass and subclass when only subclass is run.
  • (defect-23) Overridden specifications run twice.

Written by Tom Adams

December 14th, 2007 at 4:57 pm

Posted in Agile,BDD,Instinct,Java,TDD

Barry Schwartz: The paradox of choice

without comments

Following on a little from yesterday’s post on stuff, here’s a video about how having too much choice may not be a good thing.

Psychologist Barry Schwartz takes aim at a central belief of western societies: that freedom of choice leads to personal happiness. In Schwartz’s estimation, all that choice is making us miserable. We set unreasonably high expectations, question our choices before we even make them, and blame our failures entirely on ourselves. His relatable examples, from consumer products (jeans, TVs, salad dressings) to lifestyle choices (where to live, what job to take, whom and when to marry), underscore this central point: Too many choices undermine happiness.

Source: The paradox of choice.

Written by Tom Adams

December 12th, 2007 at 11:21 am

What is the Story of Stuff?

without comments

From its extraction through sale, use and disposal, all the stuff in our lives affects communities at home and abroad, yet most of this is hidden from view. The Story of Stuff is a 20-minute, fast-paced, fact-filled look at the underside of our production and consumption patterns. The Story of Stuff exposes the connections between a huge number of environmental and social issues, and calls us together to create a more sustainable and just world. It’ll teach you something, it’ll make you laugh, and it just may change the way you look at all the stuff in your life forever.

Source: Story of stuff.

Written by Tom Adams

December 11th, 2007 at 1:03 pm