nosewheelie

Technology, mountain biking, politics & music.

Archive for April, 2007

Scalable Semantic Web Stores

with 19 comments

I listened with disappointment this morning to the latest Talking WIth Talis podcast with Tom Ilube from Garlik. He makes the comment that they spent 18 months building their own scalable triple store and I can’t help thinking what a waste of time and effort. He goes on to state that it now scales to several billion triples across multiple nodes, and that at the time, no triple store scaled to the level required. However Tucana/Kowari scaled to this level four years ago (a single node could hold about 500 million triples at the time).

I can’t help but wonder if it was Steve Harris’ association with the AKT project that was the real reason behind them building their own store (See Dave’s comments for a possible reason.). Another possible reason is Tucana/Kowari/Mulgara’s Jena support – originally put in to provide a migration path for companies looking to move on from research projects to scalable infrastructure – which as Jena was/is the defacto semweb tool of choice, people used to evaluate Kowari’s scalability.

He also makes the point that Garlik are one of the first companies doing semweb on a commercial scale. This is also not correct. There was Tucana, as well as a bunch of other companies and government agencies we had contact with, some of them are still around today.

I also can’t help wondering if we were just too early, from memory Tucana started in semweb in 2000/2001 and went under in 2004. Perhaps we were so far ahead that we were too freaky, people today are still freaked out by semweb, and it’s supposedly hit the mainstream.

Perhaps Dave & Brian’s startup Zepheira will have more success in a more forgiving marketplace. They’re website is certainly Web 2.0™ compliant if that helps…

Update: It seems my comments may have been interpreted in a manner that I didn’t intend. Firstly, I think the work the current batch of semweb companies are doing is great, in fact I’m a little jealous I’m not still amongst it. Secondly, the Jena support in Kowari may have hurt us in some areas, but certainly helped more than hindered us. I think it was the right decision to make at the time and I certainly hold some of the responsibility for if we lost people over Jena, as at that time I was working with sales. I’ve updated the wording of this post to make this clearer.

Written by Tom Adams

April 24th, 2007 at 7:56 am

Posted in Semweb

Dependency injection leads to procedural code?

with one comment

We use DI extensively at the day job and have evolved a specific way of doing it, we use constructors to inject all dependencies, set them as fields, then use them in other methods on the class. We end up with code something like the following (with more dependencies):

public final class BehaviourContextClass {
    private final BehaviourContextRunner contextRunner;

    public BehaviourContextClass(BehaviourContextRunner contextRunner) {
        this.contextRunner = contextRunner;
    }

    public <T> BehaviourContextResult run(Class<T> behaviourContextType,
            BehaviourContextRunStrategy behaviourContextRunStrategy,
            SpecificationRunStrategy specificationRunStrategy) {
        // do some stuff...
        return contextRunner.run(behaviourContextType, behaviourContextRunStrategy,
                    specificationRunStrategy);
    }
}

We push things such as refactoring really hard, this, combined with some historic pressure pushing us this way, has led us to code which is very procedural and not very object oriented; we have data objects for representing state, and service objects for performing work. The example above shows a service-oriented class that does a little bit of work then simply delegates off to one of its dependencies.

Some say it looks functional – and superficially it may appear so – however this was not intended, and we don’t enjoy the benefits of functional code, our methods often has side effects for example. Not being “traditionally” object oriented may not be a bad thing, there are problems with OO, and in the end it usually comes down to a series of trade-offs. For example, our code is not OO, but it is DRY, well factored, loosely coupled, easy to understand, maintain and reuse, and very well tested.

The real problem comes in when we try to introduce what I call dynamic dependencies – instances whose value changes, such as a number or a string – using our DI container Spring. Spring supports fixed value injection, such as a database URL, but provides no way to perform partial wiring, allowing you to specify additional dependencies. In the above example the dynamic data is behaviourContextType, which is passed into run(...). In the current context, the constructor exists only to allows Spring to pass in dependencies, it has no meaning from a traditional OO point of view.

In fact, for the large majority of our code, our constructors exist solely for this purpose. Unfortunately with Spring, this is the lesser of a number of evils, setter injection is even worse as the object can be created and used in an invalid state. You can overcome this with static analysis of the affected class to ensure it’s never created outside of the container, however this solution isn’t obvious when looking at the code and limits what you can do with the class.

From a design perspective, the example class encapsulates a type (it is a representation of a Type/Class) that is runnable in some form. A more objected oriented way to achieve this notion would be to pass the type into the constructor as follows.

public final class BehaviourContextClass {
    private final Class<?> behaviourContextType;
    ...

    public <T> BehaviourContextClass(Class<T> behaviourContextType) {
        this.behaviourContextType = behaviourContextType;
    }

   ...
}

This is certainly better, but how do we get our dependency in? I’ve seen a few things tried, reflective field access for example (“the container is everybody’s friend”), however in its raw form this is not very explicit. For example how does the container know which fields to inject, only the null ones? A better way would be to mark the fields in some way as being injected, this provides explicitness in the code and allows the container to easily identify fields that require injection. The code now looks like the following.

public final class BehaviourContextClass {
    @Injected private BehaviourContextRunner contextRunner;
    private final Class<?> behaviourContextType;

    public <T> BehaviourContextClass(Class<T> behaviourContextType) {
        this.behaviourContextType = behaviourContextType;
    }

    public BehaviourContextResult run(BehaviourContextRunStrategy behaviourContextRunStrategy,
            SpecificationRunStrategy specificationRunStrategy) {
        // do some stuff...
        return contextRunner.run(behaviourContextType, behaviourContextRunStrategy,
            specificationRunStrategy);
    }
}

We now have a real object that contains a state and behaviour so we’re good OOers, and we’re fully DI Compliant™. Obviously you need tool support to allow this, you need DI container support as well as testing infrastructure that will drop in test doubles when you need them. One down side is that you can no longer new up classes without getting NullPointerExceptions, so we are arguably no better off than when using setter injection. So you need a way to create classes without using new that is DI aware. In Java, another approach might be to leverage Aspects so that when using the new keyword, the DI container captures this and drops in the configured dependency. Groovy’s test infrastructure does something similar for example.

So is all this effort worth it, or are we barking up the wrong tree?

While I’ve been working on Instinct, I’ve been toying with not using DI, partly as I don’t want to have to depend on a container and partly as I want to explore the design classes have without using it. Specifically, is it DI that drives this procedural code, or is it our usage of it?

You may argue (somewhat effectively perhaps) that the problems we are having are due to us using DI the wrong way or too much. But, where do you draw the line? I’ve heard people say “I’d only wire up the Service objects” with a straight face, but what constitutes a “Service object”? Are they the API that provides services to external facing clients? Why aren’t internal clients considered in the same manner? Why shouldn’t we exploit the benefits DI gets us on “internal” code also?

To be fair, we we’ve been making some assumptions. We want every dependency to be wired in – this gives us greater flexibility (in that the binding to our implementations is now behind a level of indirection) at the expense of explicitness in the code. However sometimes this tradeoff isn’t worth it. There is no single “service layer” – all code is created similarly and requires the same flexibility and demands the same benefits as the traditional service layer. We take testability very seriously – everything is test driven, so we need to be able to drop doubles in during tests. In fact, I’ve had someone tell me that he believes that this is the major driver for us using DI.

I’m not convinced of the answer, as usual, it comes down to a series of tradeoffs. This is an interesting space to watch, it seems like more people are exploring it (google Spring and anaemic domain model), though perhaps our extreme usage of DI is our only downfall.

Finally, here are some links that discuss dependency injection.

Written by Tom Adams

April 10th, 2007 at 4:08 pm

Posted in BDD,Instinct,Java,Ruby,TDD