nosewheelie

Technology, mountain biking, politics & music.

Scalable Semantic Web Stores

with 19 comments

I listened with disappointment this morning to the latest Talking WIth Talis podcast with Tom Ilube from Garlik. He makes the comment that they spent 18 months building their own scalable triple store and I can’t help thinking what a waste of time and effort. He goes on to state that it now scales to several billion triples across multiple nodes, and that at the time, no triple store scaled to the level required. However Tucana/Kowari scaled to this level four years ago (a single node could hold about 500 million triples at the time).

I can’t help but wonder if it was Steve Harris’ association with the AKT project that was the real reason behind them building their own store (See Dave’s comments for a possible reason.). Another possible reason is Tucana/Kowari/Mulgara’s Jena support - originally put in to provide a migration path for companies looking to move on from research projects to scalable infrastructure - which as Jena was/is the defacto semweb tool of choice, people used to evaluate Kowari’s scalability.

He also makes the point that Garlik are one of the first companies doing semweb on a commercial scale. This is also not correct. There was Tucana, as well as a bunch of other companies and government agencies we had contact with, some of them are still around today.

I also can’t help wondering if we were just too early, from memory Tucana started in semweb in 2000/2001 and went under in 2004. Perhaps we were so far ahead that we were too freaky, people today are still freaked out by semweb, and it’s supposedly hit the mainstream.

Perhaps Dave & Brian’s startup Zepheira will have more success in a more forgiving marketplace. They’re website is certainly Web 2.0™ compliant if that helps…

Update: It seems my comments may have been interpreted in a manner that I didn’t intend. Firstly, I think the work the current batch of semweb companies are doing is great, in fact I’m a little jealous I’m not still amongst it. Secondly, the Jena support in Kowari may have hurt us in some areas, but certainly helped more than hindered us. I think it was the right decision to make at the time and I certainly hold some of the responsibility for if we lost people over Jena, as at that time I was working with sales. I’ve updated the wording of this post to make this clearer.

Written by Tom Adams

April 24th, 2007 at 7:56 am

Posted in Semweb

19 Responses to 'Scalable Semantic Web Stores'

Subscribe to comments with RSS or TrackBack to 'Scalable Semantic Web Stores'.

  1. Hey, Tom.

    Don’t take this one so personally. I spoke with Steve Harris at ISWC 2006 in December. He said some interesting things then, such as the Garlik store is pure RDF triples. There is no attempt at in-store analysis, nor a full query language. He acknowledged Tucana/Kowari/Mulgara’s lead in those areas. He claims that was all Garlik needed and so he built them a fast and scalable store to do it. They also had an immediate need for federation; something we didn’t focus our efforts on.

    Personally, I’m glad to see Garlik’s success and am happy to riding the SemWeb wave, now that it is finally here.

    David Wood

    24 Apr 07 at 2:02 pm

  2. Actually, ISWC was November. *shrug*

    David Wood

    24 Apr 07 at 2:03 pm

  3. Tom

    want to come on and do a podcast to talk about Tucana/Kowari/Mulgara ?

    They’ve been mentioned in a few of the recent ‘casts, so maybe we should hear an inside perspective too?

    Paul

    Paul Miller

    24 Apr 07 at 11:34 pm

  4. Dave,

    I suppose I do take it a little personally, mostly because I see it as a big lost opportunity. I think it\\\’s great that there is still work going on in the area, especially Zepheira and Garlik (in fact I\’m a little jealous). I don\’t have any animosity towards them, perhaps my comments came out a bit harsh, I think some have interpreted them that way which wasn\’t my intention.

    Coming from my current pragmatic/agile viewpoint Steve did exactly the right thing. In fact I came up against a similar thing when I started the Kowari SPARQL work. I found it too hard to develop on top of Kowari as it was, and moved my work to JRDF where Andrew did a great job with it.

    Tom

    Tom Adams

    25 Apr 07 at 9:00 am

  5. I did remember another reason for Jena - to try and save us from having to write our own SemWeb demos.

    Andrew

    25 Apr 07 at 3:07 pm

  6. Actually, that should probably be “so many of our own SemWeb demos”.

    Andrew

    25 Apr 07 at 3:11 pm

  7. What David says is right, plus we have a very high churn rate, and tight schedules - the Garlik store imports at over 70k triples/sec and that’s only just fast enough to keep up with incoming data.

    At the time I started working on what became Garlik, late in 2005, the Tucana store wasn’t fast enough (I’ve not tried it since), plus there was no evidence I was aware of that it scaled to the capacity the Garlik store was written to reach. People were only publicly talking about hundreds of millions of triples at that time.

    - Steve

    Steve Harris

    25 Apr 07 at 7:22 pm

  8. Steve,

    Wow, those reads are quite impressive, I think we were only getting around 4-5k t/s at that time, not sure of the latest figures.

    As for scaling, from memory the public figures we released mid-2004 were around 250m per node, though we could scale to around 460m. Dave could probably quote the latest figures.

    As I’ve stated, I think it’s great that so much active work is going on in this area, I wish you guys all the best.

    Tom

    Tom Adams

    30 Apr 07 at 3:23 pm

  9. Hi Tom,

    I’m pretty sure the slower “import” rates for Kowari were being caused by the Jena RDF/XML parser. I’m sure this is where the profiler was showing up a bottleneck (along with the construction of URI objects).

    Tate would know the specific details.

    Rob

    Rob

    15 May 07 at 1:53 pm

  10. Nbkvqvl

    23 May 07 at 5:09 pm

  11. people are stranger

    emurhfkq

    21 Jun 07 at 6:38 pm

  12. Hello! Good Site! Thanks you! gdzwrjjbrucrb

    jlfgnkeqqe

    6 Jul 07 at 10:26 pm

  13. Basically nothing seems worth doing, but whatever.
    I just don’t have anything to say recently, but eh. Not much on my mind right now, but that’s how it is.

    home insurence

    2 Aug 07 at 8:21 am

  14. I’ve just been staying at home waiting for something to happen.
    More or less not much happening right now. I just don’t have anything to say right now, but it’s not important.

    tropicial fish

    4 Aug 07 at 4:42 pm

  15. I haven’t been up to anything recently. My life’s been really boring lately.
    Whatever. Today was a complete loss. Basically nothing seems important.

    8gb n91 nokia

    9 Aug 07 at 7:21 am

  16. I haven’t been up to anything recently. My life’s been really boring lately.
    Whatever. Today was a complete loss. Basically nothing seems important.

  17. I just don’t have much to say these days, but I guess it doesn’t bother me.
    What can I say? I haven’t been up to much these days.
    More or less nothing seems worth bothering with.

  18. I can’t be bothered with anything lately. Maybe tomorrow. Such is life.
    I haven’t been up to much recently, not that it matters.
    Basically not much happening these days, but eh.

    bussiness-travel

    18 Aug 07 at 10:21 am

  19. WOW, my fatcher boght a new computer for me today!

    jgsoftxa

    15 Sep 07 at 10:36 am

Leave a Reply