09.15, Tuesday 24 Aug 2004

Diego Doval's atomflow. Go get, go play. Command-line tools for manipulating and querying an Atom datastore. And yes, as Ben says, simple, powerful, and our first fruit of EuroFoo.

So here's the objective: Blogging tools are monolithic. The same tool that deals with capturing the metadata also deals with user management and templating, and all the other integration features that get thrown in too. It doesn't have to be like that. The people who make great templates should also have to be the people who know what metadata to keep (last modified date? author? what sort of categories?), and everyone who writes a blog store should be able to benefit from the great templates that are written.

I've talked before about why I like Atom. It's because it's the fixed point around which all the rest can crystallise. The thing is, software decisions about what metadata to keep, about where to draw the abstraction layers, and how to design the interfaces: these things are really hard. We can attack the problem in two ways. Firstly, the design of Atom and Atom API codifies a number of these design decisions. Second, for the problems that haven't been decided yet (like how to query the datastore to get something that can be templated back), we take a best guess at how it should be done, and also make sure the system is evolvable so it can adapt as we learn.

See also my old post on adaptive design for weblog software. I've been working towards this myself, a little, but not using Atom (it didn't have a stable draft at the time of writing), and that's what I was demoing on Saturday (to Diego and Ben), after Ben's session and us talking with the whiteboard (which I wish I had a photo of).

The requirements then, given the above, are this:

You need a command-line tool to implement the Atom API. You could call it like

  • # cat entry.atom | atom-api --method=service.post
  • # cat entry.atom | atom-api --method=service.edit --post-id=http://example.org/etc
  • # cat entry.txt | make-atom --author=X --categories=A,B,C | atom-api --method=service.post

Then around that you can add an HTTP wrapper for your editing tools (or maybe it implements an HTTP service itself). The editing tools deal with user management (maybe integrating with permissions). Notice that atom-api is a command-line equivalent of the Atom API implemented as a REST web service, and we can cat other tools onto it to convert into Atom Entry format.

This tool is absolutely essential, the core of the whole system, and it can be this because everything is tightly specified.

The next tool in the stack is to query the store, to take different slices.

  • # atom-slice --day 2004-08-24
  • # atom-slice --latest 15

And this would output an Atom Feed. You need a slice for every way you can access your weblog: Day archives, month archives, by category, by keyword, by post. The interface for this tool is less tightly defined, but it can basically grow the same way as grep has: continually being refactored, made more powerful, faster, more flexible, but essentially stay the same.

Last we need a templating tool. This is nothing more than a tool that takes an Atom Feed and some XSLT and produces HTML.

  • # atom-slice --latest 15 | transform --xsl blog-template.xsl

These is the least well defined. I suspect we'll evolve, then specify some kind of templating standard: what exit codes are required to return a 404, how to add information about the archives from the datastore and so on. But that's fine, so long as it works as first, then it can be refined and replaced later.

The system I've been working towards myself (that I was showing to Diego and Ben on Saturday) is really simple. The components are:

  • Inject script to convert a weblog post from plain text into the xml format required for the datastore, and to store it (or instead of cat, use the pbpaste command): cat entry.txt | winject
  • Sync script to move my local weblog store up to the webserver: wrsync
  • Slice scripts to pull a collection of posts from the datastore: dateslice.py --day=2004-08-24 (or latest.py -n15)
  • Then finally we get to the web view, where there are cgi scripts to handle a query like week.cgi?date=2004-08-24, which just fetches some posts (using the command-line slice tool) and applies some XSL.

It's having built this that I believe that saving an entry to the datastore is a separate job from converting plain text into an entry. It's also this which leads me to believe we need a standard (extensible, adaptable) interface for making slices across the datastore (because of the requirement to add other weblog feeds into the mix, archive information and so on: the front page of a weblog isn't just a transform of a feed, it's a transform of a compound document).

So where will atomflow lead? I think the fixed point of storing Atom documents in a store is the most important, a command-line implementation of the Atom API is where it should lead. It's the kernel of the whole system. Next is the slicing script and the transform standard. With these two we can have:

  • A front-end cgi and manager that lets you pick and choose different templates
  • Shared templates in XSL
  • Drop-in transform script replacements that understand, say, some Perl templating language instead of XSL
  • A stand-along script to transform the day's posts to email and send them out
  • Separate user-management and posting tools, integrating into different packages, or some different markup language or something

Then there are comments, category management, tagging and so on. All complexities that can be built on the same foundation, or an evolved version thereof.

I can see packaged systems that specialise on templating systems that distribute these command-line tools transparently, so people never know they're there... until they want to install a new indexing and word burst association tool and it just works.

What this feels like to me is a system open to change. Current installable weblog systems don't feel like this. Okay, Movable Type's APIs are really good and provide for a scriptable and reasonably flexible system. But we shifted from RPC web services to REST as an architecture for flexibility, from API to loosely joined. And command-line tools connected by pipe are loosely joined too, and not as flexible as APIs. We're just taking the strength of the Atom stack, which goes upwards, and pushes it back downwards, behind the scenes. And that's why, because this is what I care about, I like Atom. RSS doesn't operate in this territory--and to be fair, it doesn't need to, and shouldn't. You add complexity to get this type of functionality, RSS's simplicity is avoiding this. RSS is simplicity for feed producers; Atom is simplicity for feed consumers, entry consumers, transformed entry consumers, entry and feed processors, everything.

Anyway, atomflow. A fantastic first step. I outlined above what got scribbled on the whiteboard. Diego's atomflow is already different, because it's evolved and he's taken into account pragmatics and what's useful. That's good, the more minds, the more changes, the better. And here's the thing: Diego wrote it in a matter of hours! It's as fully functional as we need, already! He's thought hard about the interfaces and now nobody needs to rethink that design, it's already good, and importantly, calcified into the code. And listen to what he says, about how he's using it already: So I built a scraper for News.com that takes their stories on the frontpage, and outputs them to stdout as Atom entries. Then I pipe the result into atomflow, which allows me to query the result anyway I need, and, more interestingly, subsequent pipes to atomflow calls that can narrow down content when the query parameters are not enough. Awesome.

I'll be transforming my weblog store to Atom Entries and my templating to work with Atom Feeds very soon and running with it. Let's see what happens. I'll add the extra command-line tools I need, and release those back out again. A thousand flowers, etc.