nosewheelie

Technology, mountain biking, politics & music.

The dreaded 0xE800003A

with one comment

We’ve recently been seeing a lot of errors in new projects when trying to deploy the application onto a real iPhone (i.e. not the simulator) using Xcode, but not hitting the same problems with ad hoc releases. This is the dreaded ApplicationVerificationFailed 0xE800003A error. Run in horror! Here is a good explanation of the error and one potential fix.

There’s plenty of advice out there for how to fix this, ranging from restarting Xcode to simply ensuring your app ID is the same in your Info.plist bundle identifier as it is in your profile (on the dev. portal). I’d been through all these with no luck though (as had another of our developers).

Even though fairly different, they all boil down to the one thing, follow a meticulous set of steps and ensure everything is neat & consistent. So what do you do if this still doesn’t work? This is what worked for us, it may or may not work for you, good luck.

Here’s the problem in a nutshell: development provisioning profiles are generated with a get-task-allow of true (open one in a text editor and check) and if you are including an entitlements file in your Debug & Release configurations (remember consistency) you will have set it to false as per the Apple documentation.

Here’s the solution: remove the entitlements file from your Debug & Release configurations (it’s the “Code Signing Entitlements” section in your project or target properties).

That’s it. All good. Ship it!

So why were ad hoc builds working for us? The setting for get-task-allow in the entitlements file must match what is in the provisioning profile. This was true for our ad hoc profiles, both the entitlements and the profile were set to false. If you include the entitlements in a Debug configuration and change get-task-allow to true, you should now be able to deploy on device using a development profile (remember these have get-task-allow set to true). Of course you don’t want to do this, as you need it set to false for ad hoc and App Store builds. So the easiest way to resolve it is to remove the entitlements from Debug & Release configurations.

Written by Tom Adams

May 22nd, 2009 at 11:09 am

Pip Thorley Memorial Ride 2009

with 2 comments

Early in the morning on the 3rd May, about 75 cyclists gathered outside the pool in Beaudesert to embark on the 20th annual Beaudesert to Casino ride (held every Labour Day long weekend). Organised by the Beaudesert BUG, this year’s (and last year’s) ride was dedicated to the memory of former BUG member and the instigator of the first ride, Pip Thorley, who had died in a car accident a few years ago. We were lucky enough to have Pip’s two sons along for the ride, and his wife gave a moving speech at the post-ride dinner that night.

The route

The route starts at Beaudesert and heads out along the Mt Lindesay Highway through Laravale, Rathdowney and Palen Creek before starting the climb up Mt Lindesay. At the top of the climb, it heads left down the Summerland Way through Kyogle and onto Casino.

 

The ride

The riders were broken into three groups, the first (slowest) group left at 6am, our group (average speed) left at 6.45am and the last group (fastest riders) left at about 7.30am. We left in a pack of about 20 riders and headed out towards Rathdowney (~35kms) where we had our first break. We then re-grouped and headed out through Palen Creek and started the 15 km climb around Mt Lindesay. This was a beautiful climb through rainforest and calling bellbirds. The pack broken into a couple of groups up the mountain, I ended up with locals Stuart & Pete as we descended down the Summerland Way towards morning tea (~75km).

Replenished, I headed out at the start of the pack with a few of the fitter riders. We inadvertently ended up dropping the rest of the bunch, averaging around 40 km/h for the next 10 or 20 kms. Catching up to the breakaway pair of Pete & Stuart, I dropped off the back of the gun group and coasted into lunch at Rukenvale Primary School (~110 km) with the old boys.

A quick lunch later and we were back on the bikes for the final 50 km or so through Kyogle and onto Casino. This was the hardest part of the ride, very flat with long open climbs and a strong headwind. Pete, Stuart & I ended up together again, and we slipstreamed our way into Casino around 2.25 pm, among the first 5 or so in our riding group.

All up we rode about 157.53 km in 6:29:13 hours on the bike. We averaged 24.2 km/h and reached a top speed down Mt Lindesay of 56.8 km/h.

All in all a great ride, highly recommended if you can swing an invite!

Pip Thorley Memorial Ride 2009

Written by Tom Adams

May 17th, 2009 at 10:40 am

Posted in Cycling

Tagged with , , , , ,

Simplifying JSON Parsing Using FunctionalKit

without comments

Introduction

At MoGeneration we write a lot of iPhone clients that integrate with back end web services. Thankfully, most of these expose their data as JSON, which is quite easy to parse into a corresponding NSDictionary using json-framework. However, dealing with errors, non-existant values & then turning the parsed NSDictionary into domain model instances can be tricky. It often involves writing a lot of repetitive code and manual error handling. We can however do a lot better.

This post will show you the basics of improving the way JSON is parsed by using some simple techniques from FunctionalKit. There’s lots of in depth explanation going on here, look towards the code samples if you want a quick summary.

JSON

So to begin with, here’s the JSON. It comes from a current project for the guys over at Perkler. This is the result of looking up the current user’s likes (stuff they’re interested in knowing about):

{
  "meta":{
    "action":"getLikes",
    "output":"json",
    "search":false,
    "location": {
      "geo_id":"1999",
      "geo_location":"Victoria, Australia",
      "geo_latitude":"-36.558800",
      "geo_longitude":"145.468994",
      "geo_altitude":"0.000000",
      "geo_country_code":"AU",
      "geo_administrative_area":"Victoria",
      "geo_locality":"",
      "geo_thoroughfare":"",
      "geo_postalcode":"",
      "geo_accuracy":"2",
      "score":"0"
    }
  },
  "likes":["bikes","coffee","girls","haskell"]
}

For this post, we’ll ignore the metadata, we’re mainly concerned with the likes array.

Ground Rules

Let’s begin by setting up some ground rules, there’s a bunch of error conditions we need to handle:

  • We don’t know if the results are valid JSON, our parser returns nil on error;
  • The likes array may not be there;
  • The likes array may be nil;
  • The likes array may be empty;

There are other issues such as the underlying HTTP transport failing, but we’re a little further along the chain here, so we’re not worrying about that (though, these techniques work just as well there).

Baseline

Back in the bad old days we’d do a bunch of nil checks, and only proceed if things weren’t nil. We also return nil because people in Objective-C like that sort of thing.

Here’s the top level code that takes a string, parses it and invokes our parser to turn it into an array of PLLikes.

- (NSArray *)getLikes:(NSString *)jsonEncodedResults {
    NSDictionary *jsonDecodedResults = [jsonEncodedResults JSONValue];
    NSArray *likes = [likesParser parseGetLikesResults:jsonDecodedResults];
}

And here’s our parser implementation, with its myriad of nil checks, we’re letting it do all the checking (including handling the nil from the parser’s JSONValue).

@implementation GetLikesParser
 
- (NSArray *)parseGetLikesResults:(NSDictionary *)results {
    if (results == nil) {
        return nil;
    } else {
        NSArray *likes = [results objectForKey:@"likes"];
        if (likes == nil) {
          return nil;
        } else {
            NSMutableArray *convertedlikes = 
                  [NSMutableArray arrayWithCapacity:[likes count]];
            for (NSString *like in likes) {
                [convertedlikes addObject:[PLLike value:like]];
            }
            return convertedlikes;
        }
    }
}
 
@end

Isn’t that nice!

Introducing Option

We’ve established our baseline, now what’s wrong with this & how can we make it better?

To start with, we have a bunch of nil checks and the code doing the work is buried under a bunch of layers, obscuring what it actually does. Let’s start to make it better by introducing Option. Option is a way of denoting that we either have a value, or, we don’t, in other words an optional value. If we have a value we say we have a Some with the value in it, and if we have nothing, we have a None. Simple hey? These are represented in FunctionalKit using FKOption.

Let’s rewrite the above example to use Option, to make it easy to follow, we’ll do a direct translation.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeResults = [FKOption fromNil:results];
    if (maybeResults.isNone) {
        return [FKOption none];
    } else {
        FKOption *maybeLikes = [FKOption fromNil:[maybeResults.some objectForKey:@"likes"]];
        if (maybeLikes.isNone) {
            return [FKOption none];
        } else {
            NSMutableArray *convertedlikes = 
                  [NSMutableArray arrayWithCapacity:[maybeLikes.some count]];
            for (NSString *like in maybeLikes.some) {
                [convertedlikes addObject:[PLLike value:like]];
            }
            return [FKOption some:convertedlikes];
        }
    }
}
 
@end

Let’s have some commentary on the above example.

To create an option from a potentially nil value, we use the constructor fromNil:. If the value is nil we’ll get None otherwise we’ll get a Some containing the value.

We prefix the variable name with “maybe”, this is not required, it’s just something that I like to do as it denotes that it “may be” the thing we want, or it may not be.

To pull the value out of an optional value, we call the some property (or message it if you like). To construct a Some with a value we know is non-nil, we call the some: constructor.

We’re also returning an optional value, so the code in our top level method needs to handle this also, we’ve not shown that here. Notice also that as Objective-C doesn’t support parametric polymorphism, we’ve lost some degree of compiler safety, we no longer know at compile time what FKOption holds, it’s really an FKOption[NSArray[PLLike]], but we can’t enforce that.

But is this really any better? As we’ve done a literal translation, we still have a bunch of checks that really aren’t much better than what we had. They’re nil checks in a different form.

Don’t fret, we can do better.

Staying inside the Option

What all these nil check equivalents (option.isNone) are really doing is just making sure we only continue executing while we have a non-nil value, or a Some in this case. What we want is: If we have a None, return that, however if we have a Some do something with the value in that Some. We can apply this rule at each level of our checking.

At this point, we’ll also pull the function out that does the work. This will make it simpler to figure out the actual core of our problem (that our checks were obscuring) as well as giving us a nice hook to create a function from.

Let’s have a crack at that code.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeResults = [FKOption fromNil:results];
    FKOption *maybeLikes = [maybeResults bind:functionTS(self, pullOutLikes:)];
    return [maybeLikes map:functionTS(self, parseLikes:)];
}
 
- (FKOption *)pullOutLikes:(NSDictionary *)results {
    return [FKOption fromNil:[results objectForKey:@"likes"]];
}
 
- (NSArray *)parseLikes:(NSArray *)likes {
    NSMutableArray *convertedlikes = [NSMutableArray arrayWithCapacity:[maybeLikes.some count]];
    for (NSString *like in maybeLikes.some) {
        [convertedlikes addObject:[PLLike value:like]];
    }
    return convertedlikes;
}
 
@end

How it all works

Before we discuss what this code actually does, let’s have a look at it more closely. Nowhere have we actually explicitly pulled the value out of the Option. We’ve left it inside the Option; it takes care of safely unpacking the value (if it’s there) and providing it to our functions!

So what is this magic map: function doing? If you’ve used Ruby or a functional language, you’ve probably used this before. Cocoa has a similar concept in NSArray‘s -(void)makeObjectsPerformSelector:(SEL)aSelector. The important thing to note is that Option is a container class for other values, just like an array is. In fact, you can think of Option as an array that will either contain zero or one element, but no more.

Mapping across a container class is the same as iterating over it using a conventional for loop, with the benefit that the container class takes care of the iteration, as calling code we’re never exposed to it.

Here’s how map: works. For each time around the loop, the element at that point (at that index if you like) is pulled out of the container and provided to a function. The function transforms the element from its current value into another one. Take as an example an array of numbers. We could map over this array turning each number into a string. The result of mapping across this array of numbers is an array of strings. Notice also that because we’re just iterating over the container, if the container is empty we don’t start the iteration so never invoke the function (this is important!).

To abstract this a little, you start with a container of some type, and end up with a container of another type. An important concept to note is that the container type never changes; start with an array, end up with an array.

If you’re up for a little bit more of an interlude, the function that map: takes looks has types as follows: f :: a ➝ b. This means that it will take something of type a and return something of type b, where a and b can be any types at runtime, for example a function that converts a number to a string. So using our abstraction from above, if we had a container of type C (say an NSArray) that contained as (say NSNumbers) and a function that turned as to bs (NSNumbers to NSStrings) then we can get a C containing bs. Looking at the types again, map: looks like this: map :: C a ➝ (a ➝ b) ➝ C b. Phew…

Let’s get back to our Option again.

If we consider Option to be a container class (which it is) similar to an array, then if we map over an empty Option – a None – then all we get back is an empty Option – a None. However if we map over a non-empty Option – a Some – then we get back a non-empty Option – a Some. This is the magic behind how our code can deal with the presence or absence of nils; if we have nothing, the functions never get called, and we get back another (empty) Option.

This style of mapping can be chained together for as many options as we like, if we don’t have anything nothing happens, if we do, we process it and keep going. If at any point in the chain we have a nil the chain effectively stops processing. However we only write one set of code to do this, so we’re basically pretending that errors don’t happen, but if they do, they’re handled effectively.

You will also have noticed the other magic function we’re using, bind:. I’ve not talked about it in detail as it is very similar to map:. The only difference being that whereas map: takes a function from a ➝ b, bind: takes a function from a ➝ C b and produces a flattened C b, that is: bind :: C a ➝ (a ➝ C b) ➝ C b. This allows us to safely handle a potential nil coming out of an NSDictionary (for the @"likes" key) and still chain together our processing.

Back to the code

So now we've had that nice little chat, what is the code doing?

We have three basic blocks of work going on: 1) mapping over options 2) pulling out likes from an NSDictionary and 3) turning string likes into PLLikes. We've pulled these last two blocks of work out into methods by themselves. We might normally do this to clean up the code when refactoring, but in this case it allows us call the methods as the functions passed to a map:. Remember the interlude above?

FunctionalKit provides a number of ways to create functions. functionTS is a macro that constructs a new FKFunction from a target object and a selector (there's long hand non-macro ways to do this also). Sending a selector to a target object is fairly routine in Objective-C, all FKFunction does is wrap the target and selector up into a convenient package and allow us to pass it into map:. Once Objective-C gets closures the need to pull out these methods will be removed, eliminating the need to use these macros in maps, binds, etc.

Simplifying

Now that we know what mapping is, that last function is crying out to be a map! Let's see this in action.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeResults = [FKOption fromNil:results];
    FKOption *maybeLikes = [maybeResults bind:functionTS(self, pullOutLikes:)];
    return [maybeLikes map:functionTS(self, parseLikes:)];
}
 
- (FKOption *)pullOutLikes:(NSDictionary *)results {
    return [FKOption fromNil:[results objectForKey:@"likes"]];
}
 
- (NSArray *)parseLikes:(NSArray *)likes {
    return [likes map:functionTS(self, parseLike:)];
}
 
- (PLLike *)parseLike:(NSString *)like {
    return [PLLike value:like];
}
 
@end

Woah... that's lots of little functions doing not much (wouldn't you love a closure?). Luckily we can go even further as we can use the function macros on class methods, allowing us to remove that last function.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeResults = [FKOption fromNil:results];
    FKOption *maybeLikes = [maybeResults bind:functionTS(self, pullOutLikes:)];
    return [maybeLikes map:functionTS(self, parseLikes:)];
}
 
- (FKOption *)pullOutLikes:(NSDictionary *)results {
    return [FKOption fromNil:[results objectForKey:@"likes"]];
}
 
- (NSArray *)parseLikes:(NSArray *)likes {
    return [likes map:functionTS([PLLike classAsId], value:)];
}
 
@end

Looking better, but there's another function we can remove; all pullOutLikes: does is provide a nil-safe accessor our dictionary. FunctionalKit provides a nil-safe extension on NSDictionary: -(FKOption*)maybeObjectForKey:(id)key. Let's try that shall we.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeResults = [FKOption fromNil:results];
    FKOption *maybeLikes = [maybeResults bind:functionTS(self, pullOutLikes:)];
    return [maybeLikes map:functionTS(self, parseLikes:)];
}
 
- (FKOption *)pullOutLikes:(NSDictionary *)results {
    return [results maybeObjectForKey:@"likes"];
}
 
- (NSArray *)parseLikes:(NSArray *)likes {
    return [likes map:functionTS([PLLike classAsId], value:)];
}
 
@end

OK, that's a little nicer, but we've still got a function that does not much at all. Let's try another macro: functionSA, it creates a function from a selector and its argument. In our example we can pass it directly to our option's bind: method.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeResults = [FKOption fromNil:results];
    FKOption *maybeLikes = [maybeResults bind:functionSA(maybeObjectForKey:, @"likes")];
    return [maybeLikes map:functionTS(self, parseLikes:)];
}
 
- (NSArray *)parseLikes:(NSArray *)likes {
    return [likes map:functionTS([PLLike classAsId], value:)];
}
 
@end

All right, we're getting close.

Some heavy lifting

So our code is now nil-safe, but it still a little verbose for our liking. Can we remove any more of these little functions we've created? Turns out we can.

We have a function already to turn an NSString representation of a like into a PLLike representation; functionTS([PLLike classAsId], value:). We see it in use when mapping over our array of likes. Wouldn't it be nice to have a way to turn that function on an individual like into a function on an array of likes? Thankfully such a thing exists. The process of taking a function on a single element of a container class and being able to apply it to the container class itself is called lifting a function.

Let's rewrite the example using a lift to turn our parsing function on a single like into a parsing function on an array of likes.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeResults = [FKOption fromNil:results];
    FKOption *maybeLikes = [maybeResults bind:functionSA(maybeObjectForKey:, @"likes")];
    return [maybeLikes map:[NSArray liftFunction:functionTS([PLLike classAsId], value:)]];
}
 
@end

We've now been able to remove yet another little function clouding up our code. Objective-C noise not withstanding, our code is closer to the core of what we're trying to achieve: if we have results pull the likes out; if we have likes, turn each one into a PLLike.

Let's go one step further and inline the maybeLikes variable.

@implementation GetLikesParser
 
- (FKOption *)parseGetLikesResults:(NSDictionary *)results {
    FKOption *maybeLikes = [[FKOption fromNil:results] bind:functionSA(maybeObjectForKey:, @"likes")];
    return [maybeLikes map:[NSArray liftFunction:functionTS([PLLike classAsId], value:)]];
}
 
@end

Compare this completed version to the one we started with, it's one hell of a lot nicer isn't it? Which one would you prefer?

Conclusion

The process that we've followed may seem a little convoluted and foreign, but we've really just applied simple rules at each step of the way. Once you get used to functional techniques these patterns become easier to spot and their application easier to handle. Granted, understanding this kind of code does take some time, but the benefits of doing so are massive, as we've seen, code literally melts away, becomes clearer and more bug-free.

Another major benefit is each of these little chunks of logic can be viewed in complete isolation from each other, allowing you to easily reason about the behaviour of each one, and then as well as the whole. The code is now also closer to the actual semantics of what we're trying to achieve; we have no for loop boiler plate clouding what we're trying to achieve.

If you're developing iPhone apps or even for Mac OS X give FunctionalKit a go and get in touch if you're interested in contributing.

Written by Tom Adams

April 1st, 2009 at 2:18 pm

Processing Large Datasets in Real Time

without comments

Introduction

This is an article I wrote in late 2007 for the Workingmouse wiki. With the WM wiki no longer running, I’ve republished it here for posterity. Some content may no longer be current or relevant.

Tom Adams, Workingmouse, December 2007

I’ve had the good fortune to recently complete a project at Veitch Lister Consulting (VLC) (a transport planning consultancy) processing large datasets in real-time. This article is a summary of the technical aspects of the project, I won’t be talking about the process we used, suffice to say it was XP-like, supported by tools such as my BDD framework Instinct and web-based agile project management tools. The project ran for around 7-9 weeks, the team was comprised of five developers (3 full-time on the project) consisting of two Workingmouse developers, myself and Sanjiv Sahayam, and three VLC developers, Jamie Cook, Glen Maddern and Nick Partridge.

As a consultant we usually work in corporate environments and as such are bound by the architectural constraints of the organisation. This usually includes the usual “enterprise” constraints such as language (Java 1.4 is common), application servers (usually WebLogic or WebSphere) and databases (Oracle or DB2). However this was one of those rare projects where you have no technical constraints and have the fortune to be working with great people.

Finding a working solution

Going in, we had three basic requirements, 1) must be callable from a Rails-based web app, 2) must return responses to typical requests in real-time to the webapp (30 seconds was our target) and 3) must query against a dataset that was initially estimated at around 45 TB, but later came down to around 100 GB.

Our technical approach initially was two-fold, firstly to understand the nature of the problem and size of the datasets, and secondly to research potential solutions to the problem. I won’t talk too much about the nature of the problem domain and type of data, except to say that the data is the output from transportation simulations and ended up being around 100 GB (of raw uncompressed text), which has since been converted into a binary format. To use SQL nomenclature, our data was divided into three tables, our requests to the data storage layer consisted of a single query consisting of two joins between these three tables. The system then performed further data analysis based on the results of the query. We had complete control over the data generation (this company wrote the piece that generates the data) and storage format, something that we could exploit to provide performance far above conventional generic storage engines (i.e. our solution can not be generalised). The system we were developing was completely read only, which simplified our design considerably.

Our dataset was initially estimated to be around 45 TB (based on extrapolations from the data generation component), which led us to think we’d need to distribute the data across multiple machines. Our first investigations (e.g. Hadoop) were based around this. Later, as we worked out we could batch process the majority of the data offline (as an input to the real-time system) and input only a subset, we were able to look towards conventional tools such as SQL databases.

As there were enough problems to solve already, we didn’t want to write our own storage engine. We looked towards conventional SQL databases to achieve this, including:

We spent several weeks tuning the data, the queries, indices and the databases themselves, however we were not able to get the performance that we required out of any of the conventional SQL databases. To be fair, we didn’t complete our investigations into DB2, and Oracle failed to complete its install on two separate occasions. PostgreSQL was the worst performer and MySQL, while good for the smaller datasets we threw at it (~10 GB), would not have been performant enough even if it maintained linear scalability with the larger dataset. We also looked at distributed memory solutions like those provided by GigaSpaces and Terracotta however we no longer needed to distribute the data, so they weren’t suited.

After switching away from traditional SQL databases, we looked at a number of column databases including:

However these were either unsupported and outdated (C-Store), core-dumped when queried (two versions of MonetDB, both source & binaries) or were not feature rich enough for our needs at the time (HBase).

We also briefly looked at commercial column stores such as:

In the end we ruled these out also as we were starting to think we could easily solve our problem using a plain vanilla filesystem and some simple parsing code, and we didn’t want to deal with the sales teams at these companies (I’ve personally been there before and knew the process would not be pretty).

The data storage solution we came up with was laughable simply. We stored CSV files on a filesystem (ext3 on our dev boxes) and indexed into the data using directories. As we had to search linearly (table scan equivalent) through some of the CSV files we split these files into chunks. We were thus able to distribute the requests based on chunk to each of our nodes. The downside of this approach was that the chunking was a manual process, we had to select the chunk size (31 initially) up front, there was no dynamic indexing. The upside was that we didn’t need to maintain the indexing scheme (using B or AVL trees) and the files could be batch updated with only a small configuration change (number of chunks). We’ve found that our data storage was extremely performant, contributing only 18% to the total time it takes to process a request, the remainder of the time is spent in analysing the data and distribution overhead. Subsequent conversion of the CSV files to a binary format result in even faster processing as we no longer need to parse out the CSV delimiters.

Our initial performance tests of the core algorithm showed that we’d need to distribute the computation (and potentially the data) in order to achieve our “real-time” goal. Based on this we looked at languages such as Erlang and Scala for their Actor concurrency model, both languages also have ways of distributing work across multiple machines. Although our algorithm (and data) was parallelisable, we didn’t feel that it was amenable to a high level of concurrency, and our research showed us that we could build the system using Java-based tools. This along with the fact that none of the developers were proficient with either Erlang or Scala swayed us back towards Java (even considering all of its problems). Workingmouse does perform research around functional languages including the newly developed Scalaz library, however at the time only one of our guys was highly proficient in Scala and we were on a very tight time line. The risk associated with four out of five in the team learning a new language and supporting tools was thought to be too great.

Now that we had a candidate language, we looked at a bunch of Java-based “grid” or parallel computing frameworks including:

These tools perform different functions, some distribute only the data and some distribute only the computation. By now we were fairly sure that we only needed to distribute the computation, so technologies such as Coherence, Terracotta and GigaSpaces were not appropriate. During this time we’d been prototyping simple algorithms using Hadoop and GridGain. We’d found GridGain very easy to get started and performance tests showed that it added only 10-200 ms to the overhead of calls when compared to local (in-JVM) invocations. Hadoop was promising, however it was also quite complicated to setup (installation and creation of readers for input) and use, and provided components that we didn’t need, such as the distributed filesystem. We were also unsure how suited it was for interactive use, it looked to be optimised for long running batch operations. In the end, the ease of use and flexibility of GridGain made it a good candidate for the initial distribution, and continues to be used to date.

GridGain

GridGain has what it calls SPIs (service provider interfaces) which are a fancy way of saying an API whose implementation may be provided by a third party and which I thought had long since been relegated to the bad old days of XML parsers. These make it easy to swap in different implementations of node discovery or node-to-node communication for example. We used this to great effect when we discovered that Amazon’s EC2 did not support IP multicast, we easily swapped this out for a JGroups implementation, which has since been swapped out for a JMS-based implementation.

For its good points however, there are some downsides (I’m being quite picky here, it’s a really good framework to use and works quite well).

  • We initially had some issues with GridGain, such as jobs being lost and nodes being lost off the grid. To counter this, we went back and wrote a sample project that performs word counts, and encountered no issues, so these looked to be an issue with our code.
  • The error messages that come out of GridGain are horrible. They’ve attempted to clean up the stack traces, but have made it worse, as they’ve interrupted the normal flow of Java stack traces, and interspersed it with marketing and documentation material. This makes reading errors quite hard.
  • The code itself is quite poor, we had lots of trouble following the flow of execution from the startup script to discover how nodes are actually started. This complicated our understanding of the system and its nomenclature. Most methods are huge, tens of lines long with nested try-catch blocks with variables initialised outside the try-catch block and later used. I know this is normal coding practice for Java developers but there are much better and cleaner ways to handle this. The whole code base needs a good refactor. I actually think this is a reflection on their non-test driven approach to development. I read a post on InfoQ regarding TDD from one of the developers, who said they’d tried it but basically given up. This reflects in poor design and tight coupling of the classes making them hard to use outside their original context (see Spring example below).
  • The documentation while quite extensive, does not cover things such as the architectural overview, what a “grid” is (it’s a node), how the queueing mechanism works, etc. All stuff you’re going to need if you use GridGain in anger.
  • While GridGain provides SPIs, if you need to change the way GridGain works outside of these SPIs, it’s not very easy to do so. We encountered two instance of this. Firstly, we found the built in JBoss serialisation orders of magnitude slower than the default Java 1.6 serialisation for the objects we were sending (lists of simple structures containing 2-3 integers) and it produced an order of magnitude more bytes on the stream. There doesn’t appear to be a way to replace this without re-implementing a lot of code. Secondly, when we changed to JGroups for discovery, we needed to pass the master node (our single node that submits jobs to the grid) a JGroups configuration file, which must to be an absolute path. As we were running inside a servlet container we couldn’t be sure the webapp had been exploded to the filesystem, and even if we were exploded we also had no easy way of knowing (from a Spring config file) where the file lived. Paths are not resolved relative to the Spring config file but relative to the GridGain home directory. This meant we had to ship a JGroups config file in a known location, outside of our normal distribution package (the WAR file). This file must also be a real file on the filesystem, you can’t pass a stream or a resource on the classpath.
  • Because the documentation was sparse, we spent a lot of time proving theories about how the “grid” behaved under certain circumstances to try to get an understanding of how it all fitted together: Can we run multiple nodes that submit jobs to the same grid (yes), does it load balance these (unsure), what constitutes a “grid” (it’s the discovery SPI, whatever other nodes a node can see are the “grid”), can we run multiple nodes in the same VM (no, though some documentation claims you can), how does the queueing work, what happens if we saturate the grid with jobs, does GridGain pre-allocate jobs to nodes or can it take advantage of new nodes coming up after a task has been split (we don’t think it can, but are unsure), can nodes participate on more than one “grid”, etc.
  • As GridGain uses serialisation to send objects across the wire, everything must be Serializable. If it’s not, you’ll get obscure errors which are hard to trace back to serialisation issues. We ended up writing a test utility to ensure that classes we expected to be Serializable (the task, job and job params) were. We also used FindBugs in the build to ensure we didn’t miss any inadvertently.
  • We needed to be able to submit a number of tasks to the grid concurrently (where each task was a user request on the Rails app). Our initial tests showed that this bombarded the nodes with jobs, until they were unable to cope, causing the master node to fail over and eventually fail (as there were no nodes “available”). There is an SPI that partially addresses this (the collision SPI), however it only addresses “collisions” (i.e. messages arriving at the same time) on the consumer end (the processing/worker node) of the connection. There does not seem to be a way to batch up messages on the master (producer) node. This becomes a problem when the code submitting tasks to the grid runs concurrently (like a web service receiving multiple requests). We hadn’t needed to address this yet so didn’t look too hard for GridGain solutions, but other possibilities include rolling your own queue (perhaps via java.util.concurrent) or batching up requests on the Rails side. This also has affects on the architecture of your system. As GridGain (seems to) sends the jobs out as soon as they’re split, only nodes that are available at the time of the initial send are available to participate in the task. So if a node fails and another comes online, it does not seem to pick up the jobs, increasing the overall processing time of the task.
  • GridGain includes a peer class loading facility, which basically means whenever a class is needed by the JVM, the classloader looks in the local classpath first, and if not found, will pull the class of any other nodes in the grid that have it available (caching it locally). This is god for development, where you can make a change to your job or job parameters class and have them automatically re-synced to all the nodes. However we were having issues with class loading (which turned out to be serialization and keeping old classes in the classpath) and wanted to turn this off. Although the documentation claims it can be disabled, we couldn’t get our grid to work without it on.
  • Submission of a task requires the task class, you cannot give it an instance, which implies a no-args constructor on the task class. This is a bit odd.
  • This isn’t really an issue, but the IP multicast stuff works too well locally. It’s great to get up and going but you can easily throw jobs onto nodes that you didn’t intend to. The ability to integrate it into an automated build also suffers because of this. We ended up using custom multicast address per local developer machine (auto-generated from the machine’s IP). Other discovery SPIs should be similarly configurable, JGroups & JMS can use localhost for example.
  • And lastly, but perhaps worst of all, the nomenclature is all wrong. The entity GridGain calls a “grid” is really a “node” on the grid. This confused us for a couple of weeks, and caused us to incorrectly name the “grid” with a single name, when in fact all you are doing is naming nodes. We spent the time talking about the “conceptual grid”, where nodes all communicated based on the “grid” name. This had an impact on our initial architecture, whereby we thought we could have nodes participating in more than one grid at the same time. This was appealing to us as we basically had two different kinds of requests, and we thought we may be able to dynamically partition our nodes into either RequestA grid or RequestB grid. This was not the case, the “grid” is defined by the discovery SPI not the “grid” (node) name.

We also encountered issues with the discovery SPI, initially driven by EC2 not supporting IP multicast, and later by issues with JGroups. With our configuration of JGroups, we found that with larger grids (8 or 16 nodes), often the master node wouldn’t discover every other node available. We could often rectify the situation by starting our nodes in a specific order (all processing nodes, then the master node) – but we would still occasionally miss a grid node or two. What was more worrying was that with long-running CPU-intensive tasks some of the nodes would drop off the grid (according to the master node’s logs). We probably would’ve persisted with JGroups even with this problem, except that once a node dropped off the grid it was never re-discovered by the master node. Eventually the grid would dwindle to our single non-processing node (our master node was non-processing) and fall over. Because of this we ended up swapping the JGroups discovery implementation out for a JMS-based one. Given we already had a JMS service running inside the application server, switching to it was fairly painless. Grid discovery with JMS seems to work just as well as IP multicast, and there does not appear to be anymore overhead on our processing times.

Results

After we’d found an initial working solution it was time to tweak it. We’d actually been doing this all along, however we now had a couple of solid weeks to spend solely on performance tuning. We had two lines of attack here, firstly we looked at tuning the JVM and its garbage collector configuration (I’d had great success with this in the past), and secondly we looked at profiling the code for CPU hotspots and memory usage. Our biggest wins turned out to be reducing the memory overhead and optimising our hashCode() and equals() implementations. Things we thought would hold us back (before profiling) turned out to be not that big an issue at all, our text-based CSV reading for example was contributing only around 2% to the overall processing time.

Part of our algorithm called for loading 3.1 million objects into memory (and retaining them for fast lookup). Each of these objects implemented a base class (Primordial) which gave us sensible (field-based) toString(), hashCode() and equals() methods. This greatly simplified development, but meant that for every object, we also had an instance of Primordial and its parent Object also in memory, giving us 9.3 million objects. Primordial also had two instance fields that delegate to provide equality and to string capabilities, giving us another 6.2 million objects, 15.5 million objects in total, requiring more than half a GB of RAM. By moving these instance fields into our objects and making them static (class fields) we reduced the object count to 6,200,002 objects and significantly improved the memory and CPU characteristics of the application as we were no longer letting the garbage attempt to reclaim space it couldn’t.

Our algorithm also makes heavy use of maps and sets, so our hashCode() and equals() methods were getting a workout. The Primordial implementation uses field values (obtained reflectively) to provide sensible defaults for these methods. While this is normally fine (we used this approach in a large batch processing application for several years with no problems), for this algorithm it proved an issue. By writing custom implementations of hashCode() and equals() for the two or three objects that required it we were able to drop our processing time by a fifth (from 25 seconds to 5 seconds).

I should note here that we didn’t perform any premature optimisations, we profiled with real requests against real data, giving us the true problems not the things we thought would be problems. In all we spent around a week and a half tuning and were able to decrease the total process from around 50 seconds to around 1 second (for a single node on the local machine). We were lucky however in that we always had performance at the forefront and had chosen a good architecture, one that didn’t need to change following performance testing (I had thought we’d need to do some radical rework to achieve desired performance).

As a sample of the numbers we were getting, here is a performance graph that comes from us trying to determine how the number of nodes affects processing time. These numbers were measured before we’d done the performance optimisations mentioned above, the current system is around 5 times faster than this. Processing time in milliseconds is on the Y-axis and the number of GridGain nodes is on the X-axis, the results for 31 nodes are extrapolated.

Grid performance results over multiple nodes

The red and blue lines show the average processing time as the number of nodes is increased (averaged across 5 and 20 requests respectively). For a single node the system performs the query in 30 seconds, for 16 nodes the system takes 7 seconds.

The green line (time delta) shows the improvement we get in going from N nodes to N + 1 nodes. For a small number of nodes this is very significant – going from 2 to 3 nodes reduces the total time by 9086 ms – however becomes less significant as more nodes are added. Going from 8 to 11, 11 to 16 and 16 to 31 nodes only drops the processing time by 1243 ms, 1255 ms and 1219 ms respectively. So for this test, almost doubling the number of nodes from 16 to 31 (our maximum number of nodes for our given chunk size discussed above) only improves processing time by one second. At some point you approach a limit where adding more nodes does not decrease (single-request) processing time significantly. It may however provide more resilience so that the grid can handle more concurrent requests and cope better with failure.

The yellow line shows the theoretical performance we should be getting by increasing the number of nodes (assuming no network etc. overhead). If we average the total overhead out across the all the nodes, as we increase the number of nodes this blows out from 95 milliseconds for 3 nodes to 210 milliseconds for 16 nodes. This overhead corresponds roughly to the GridGain overhead we saw in our GridGain sample project.

Deployment

After we’d developed a workable solution we needed a hosting environment. The guys we were working with were pretty keen not to have to host the grid themselves and were looking at Amazon’s EC2 for a low cost easy ramp up solution. The process of setting up nodes was pretty painless; we had registered, set up a node and documented the issues in about a day.

The only real issues we had with EC2 is that it doesn’t support IP multicast so we had to change the way our nodes found each other, IPs are also not static so management of clusters could become unwieldily. We also had issues with the IP address the machine thought it was on (the internal IP) and its external address, which meant code that returned the address of the server (the WSDL generation) returned the wrong (internal) address to external clients.

There’s issues with persistent data (there is none) so you need to store anything you want somewhere else like S3. Instances boot up from images (AMIs) so you can keep stuff there also, but it doesn’t get saved when the instance goes down, and there’s a limit on the amount of data you can store. We used S3 to store our data and were planning on automating the copying of the data across to nodes on boot. The EC2 VMs are quite nice, instances are easy to manage and work as advertised. The small instance can be a bit slow, we found them slower than our desktop machines, but the large instances are very nice, and 64-bit.

To ease deployment, we used Capistrano to deploy our code, start and stop our nodes and the app server. Some of the guys also wrote scripts to automate the booting of our images, returning the (dynamic) IPs for SSH access.

Conclusions

Based on our work, here are a few items to consider in summary.

  • Conventional SQL databases are not the solution to all data storage problems, often, you can do better using just the filesystem, if you don’t require a general solution.
  • Because of their nature current Map/Reduce frameworks (i.e. Hadoop) may not be appropriate for all problems. If you don’t need a distributed filesystem, there might be other solutions.
  • Discover how your application behaves and what it needs from a grid. Does it need to be distributed? If so, do you need a computation grid or a data grid?
  • Beware of the vendors marketing spin.
  • If you go with GridGain, invest the time to learn how it works, especially for your problem space. Choose the size of your split (how much work gets done on a node) appropriately. The longer a job runs, the more chance a failing job has of delaying the overall processing time as it may fail at the end of the job, requiring a resend to another node. Hadoop seems to cater for this by allowing idle nodes to pick up work not yet allocated.
  • Keep your nodes as consistent as possible, if you can keep them exactly the same all the better, it eases deployment and management of nodes.
  • Get an end-to-end solution going as soon as possible, it’ll flush out lots of issues.
  • Automation is a good thing.
  • Small teams can achieve a lot in a short period of time.
  • This is the first project I’ve used the XP idea of a metaphor on (thanks Andrew!) and it worked really well. Ours was “lightning”, we wanted the application to be small, simple and lightweight. Requests should flow in and out of the system as fast and simply as possible. This guided technology choices and design decisions.
  • Premature optimisation is a bad thing. Keeping your code well structured will aid in refactoring it if and when you meet real problems, not perceived ones.

Video

Nick Partridge and I gave a talk on this topic at the February 2008 Queensland Java User’s Group, a video of this presentation is available here: Off the Grid – Introduction to Grid Computing using GridGain.

Written by Tom Adams

March 30th, 2009 at 9:03 pm

Posted in Agile,HPC,Instinct,Java

We’re back!

without comments

So it’s been about 7 months since we started MoGeneration and about 6 months since I’ve updated this blog! Coincidence?

Thanks to Adam Cooper for the WordPress tech support.

Written by admin

March 28th, 2009 at 8:48 pm

Posted in Blog

MoGeneration releases The Australian Business Section iPhone site

without comments

MoGeneration is pleased to announce the release of our latest iPhone site, The Australian Business Section, at http://iphone.theaustralian.com.au/.

This is the latest in a series of iPhone optimised sites developed by us, and the first site released under the new company brand. If you have an iPhone, the iPhone simulator on a Mac or Safari (Mac or Windows) please check it out. Feedback is very welcome (feedback link on bottom of each page).

Written by Tom Adams

October 16th, 2008 at 1:16 pm

Posted in Technology

Brisbane to GC cycle challenge

without comments

The Leapstream Faries have just completed the Brisbane to Gold Coast cycle challenge. Came in at 12 pm making nearly five hours on the saddle over around 110 kms. Good job to Greg, Larry and Cameron (and me). Even better was the guy on the unicycle and the dads carrying kids!

Written by Tom Adams

October 12th, 2008 at 1:16 pm

Posted in Mtb

CITCON Asia-Pacific 2009

without comments

The location for CITCON Asia-Pacific 2009 has just been announced, and the winner is: Brisbane! Huzzah.

Written by Tom Adams

September 14th, 2008 at 4:45 pm

Posted in Agile,BDD,TDD

Tagged with

Bash 101

without comments

Reposting from the internal Workingmouse blog…

If anyone does any significant work in a shell, using keyboard shortcuts makes life so much easier. I find myself writing these out all the time, so here they are for posterity (these are Emacs bash bindings, not vi). Meta is the Alt key on most systems (you’ll have to enable this on the Mac and turn off Alt access to menus on Ubuntu), so replace Alt in the below examples. Meta is also almost always Escape, however, you cannot hold down the escape key (say to go 2 words back), you must release it and press it again.

Moving around

  • Ctrl-a – Cursor to start of line.
  • Ctrl-e – Cursor to end of line.
  • Meta-b – Move word back.
  • Meta-f – Move word forward.

Editing text

  • Ctrl-d – Delete character forward (same as the Delete key on most systems).
  • Ctrl-u – Delete to start of line.
  • Ctrl-k – Delete end of line.
  • Ctrl-w – Delete work back. Uses whitespace as boundary, so is good for deleting arguments to commands.
  • Meta-d – Delete word forward.
  • Meta-Backspace – Delete word back. Does not use whitespace as boundary.

History

  • Ctrl-p – Previous entry in history (also up arrow).
  • Ctrl-n – Next entry in history.
  • Ctrl-o – Execute current command (out of history) and show the next history entry.
  • Ctrl-r – Reverse search in history.Press again to search further back, Enter or Ctrl-o to execute.
  • Enter – Execute current command.
  • Meta-. – Paste in the last argument of the last command. Press again to go further back in history.

Other

  • Ctrl-l – Clear screen.

More information is available on the Man page for sh (note that this is Apple’s implementation but has good verbage).

Written by Tom Adams

September 11th, 2008 at 10:18 am

Posted in Technology

Tagged with ,

MoGeneration

without comments

After just a little planning, I’m pleased to let the world know that there’s a new mobile development house in Australia, MoGeneration, specialising in Mobile 2.0 web & iPhone development. This has been a long time coming and I’d like to thank Pollenizer for greasing the wheels. We’re off to a good start and already have a few projects and a few native iPhone products in the pipeline.

Stay tuned for more!

Written by Tom Adams

September 10th, 2008 at 2:39 pm

Posted in Mobile,Objective C

Tagged with