03.14, Friday 13 Aug 2004

What I'd like from Technorati is for me to be able to plug into them, loosely joined, as part of the ecosystem of the www.

I'm at Hypertext 2004 seeing all these people who can apply cool algorithms to linked nodes, and know how to display them. That'd be great to do with weblogs. In fact, it's ideal territory. The problem is this: First, you need loads of people trying different algorithms on anything new like this to see what works and what doesn't. Then everyone publishes (puts a website up), the best are copied, and we iterate. But the problem is that spidering and parsing blogspace is really hard. Like, so hard that people do the spidering and parsing stuff, extract some links, and then they're worn out, they stop. And mostly people don't even get that far, outside research labs (I met a guy from IBM with a local copy of the web).

Feature request #1: Figuring out permalinks is hard. Context is hard. Sorting a weblog in posts is hard. So let's not bother. What's the lowest-level thing that anybody would get value from? All we need is the url of a link, the link text (and title, if available), the url of the weblog on which it was found, and the time. That's all, Technorati have it already. I'd like a pubsub mechanism to get that from Technorati, a data stream, like the data stream of updated weblogs at weblogs.com, that other services can sit on top of and build on. I'd like to be able to set a script running on my computer and pipe every single link found, and when and where it was found, into a database, for analysis. To enable the large numbers of people hacking that evolvability requires. But, you say, that's Technorati's USP, to give timely search results! Fine, do what the stock markets do: Delay the data by 24 hours, it doesn't matter. But give us a stream, and be part of the ecosystem of the www.

What'll happen? Who knows. But when I've looked at analysis done of the way links move round blogs, it's the "links spread like infections" story most times. We could look for other patterns. We could have interesting visualisations. We could identify blogs that move in the same circles (because they get the same links in the same kind of time order), or see communities of interest. I don't know, and that's the wonderful thing. What I do know is that the people with the maths and the algorithms and the expertise to do this don't have the corpus of data to operate on. And that Technorati do have this data (already!), and can give it away so we can loosely join to them.

I've also been looking at XFN, a way of marking up links according to whether the person at the other end is a friend or collegue or whatever. Distributed data for a social network, basically. Tantek Celik and Eric Meyer are here, presenting it. It has a controlled vocabulary (accepted words, basically) which you can use to describe your relationships with people. But controlled vocabularies make me uncomfortable for some uses (this is one), and why stop at people? I'd be more interesting in getting more value out of the links we post. See above, in other words.

But I'm also looking at del.icio.us and seeing that people sort and tag their links there in a wonderful way: the tag vocabulary is bottom-up. By giving immediate use to metadata, people add metadata. By giving use and visibility to the general usage of metadata, there's an incentive to converge on the same vocabulary. Great. But it's still a closed system, we can have a thousand algorithms blooming and so on. So, let's merge XFN, del.icio.us, and feature request #1.

Feature request #2: Let's have a web-wide del.icio.us. Let's define that, for every link I post on my weblog, I can add an attribute tags, and I'd use it like: tags="wayfinding brain"

All I want blogging systems to do: Let me attach those tags to all links, just as del.icio.us does. They will, if people want it, or we'll do it ourselves with plugins.

All I want Technorati to do: Push those tags out in the stream with the url, link text, blog url and time. That's all. It's a regular expression away.

What I want someone else (or me) to do, which they (or I) will: clone the del.icio.us interface for this stream. It won't work for some blogs, fine. But it'll work for many. Give visibility to the tags of people I'm subscribed to, to provide a pressure for us to converge vocabularies (bottom-up, see). It's useful because I can have easy access to my own links, and everyone else's links. But it's merged with what I already do, all the feedback loops are in place. It should just work. And the bottom-up categorisation of the www can begin, only useful for each of us, each step of the way. (imagine, automatic deductions that say "sci-fi is usually used with books", a look category hierarchy that is fluid and handy. Opinion, fact, all the rest. Pragmatic).

Why we don't have these things already? Because parsing blogs is hard. But some people do it already, people like Technorati. Technorati have bootstrapped off the ping mechanism that made the first wave of blog tools - aggregators - possible. Now it'd be great if we could bootstrap off it.