23.02, Tuesday 23 Jul 2002

Have a quick look at Dirk. It's the opposite extreme from the www in a way. On the web, there are pages and pages are linked together by unidirectional embedded hyperlinks of a single flavour (that is, there's only one type of link: you click on it, it takes you to another page). Dirk is based around the idea of ultra-simple objects which are connection in both directions by nonembedded links that have information (that is, if you follow a link you can follow it back, and the link has a piece of text starting "because..." attached to it). Two extremes of hypertext. What they share is the concept of connections. Nodes and arcs. To-object, from-object and connection.

This is where RDF comes in, as a way of explaining this kind of information. The (very good) RDF Primer explains these three portions like this: "the part that identifies the thing the statement is about is called the subject. The part that identifies the property or characteristic of the subject that the statement specifies is called the predicate, and the part that identifies the value of that property is called the object" (I've edited that down slightly). It's good terminology.

Now RDF has a number of advantages. First, it can be written down in XML (it doesn't need to be) so there are already standard tools to query it and make it. Secondly, it's machine readable. Thirdly, it represents structures we're familiar with (the web, databases, metadata) and consistent ways. Fourthly (and here's the good bit), all of the subject, object and predicate can be URIs. That is, they can reference another location on the www.

More about that last property. Imagine you want to say that the property "written-by" of this webpage is "Matt Webb". Instead of just using the text "written-by" as the predicate, you can pull in Dublin Core into RDF (Dublin Core is a metadata standard, and by "pull in" we're taking advantage of XML namespaces meaning your document can inherit standards other people have written) and reference the standard Dublin Core [or DC] "creator" predicate instead. And instead of just saying "Matt Webb" you could use the URI "mailto:matt@interconnected.org" which uniquely identifies me online. Why is this good? It's good because machines can then identify the sameness of the DC "creator" attribute, and the sameness of me as the creator. Semantic Web, here we come.

But you don't have to just use RDF for Dublin Core metadata (page titles, subjects, creator data mainly). RDF is extensible in other ways too. Friend Of A Friend [or FOAF] is a way of fleshing out that URI we used earlier to identify a person. What's more, part of FOAF lets you specify who your friends are, and as part of that point to where their publicly accessible FOAF file is. These aren't hard to make -- in fact, here's a simple Javascript app to make your own file: FOAF-a-matic.

Why do I suddenly mention all of this? It's because I see:

  • Blogchalking which encourages people to put their location information in metatags so we can use Google to find weblogs by location. But Dublin Core already supports location information, and if people inserted the RDF into their weblog templates too then tools like Blogdex could be geographically sensitive.
  • Blogrolling which lets you manage links to your friend's weblogs without seeing the HTML, and they could be using the same data to write out interlinking FOAF files so we could browse recommended weblogs in novel and innovative ways.

...and if the creators of these tools had easy access to simple RDF tools, we could be boosting up the lower-common denominator of weblogs to a point where they comprise a lush substrate on which to build fascinating and useful tools to explore and filter the www. And this is in exactly the same way that the architecture of the www with URIs in the first place even allowed things like hyperlinks, and weblogs, to occur.

The best start I've seen is Movable Type's TrackBack which embeds RDF on the page to start linking weblog posts together. But it's still only a start. I'd like to see a grand conversation between the authors of publishing tools pinning down the properties weblogs need to fulfil their potential, and then building these in invisibly for the user. Because weblogs have yet to expand as much as they will, and when they do their course will be hard to change. The future has to be built now, in this microcosm, in this monobloc.