14.43, Tuesday 2 Jan 2001

PageRank is the technique Google use to order their search results. Essentially, it works by ranking higher pages with many other pages linking to them, each of these pages contributing a score which is higher if they themselves are more popular. Fair enough (it explains why Google has all those cached pages available. It's a side effect of having to keep them for calculations). But what I didn't know is that instead of following every link on every page, a randomised typical usage pattern is used. From a given page, the "surfer" follows a random link, and then another random link on the new page, and so on, until they get bored (typical surfers only link click a short distance) and start with another randomly selected page. The overall ranking is the same, but much easier to calculate (there's no state information to maintain, and the ranking program that runs can be much smaller in memory and less processor intensive). The original paper by Google cofounder Larry Page is extremely interesting and has an easy-to-follow presentation: PageRank: Bringing Order to the Web.

Thought the first: This explains why weblogs rank so high in Google. The integrity of PageRank relies on the fact that you only own your own page, so you can't force much linking to your 'site to up your rank. The weblog community has several features that break this model: The tendency for links lists to be on every weblog page (and there are often many pages of archives too), and the large amount of reciprocal linking. The community appears as a very highly connected network, and this effect is magnified because of the large amount of the links on weblog pages compared to other 'sites on the web.

Thought the second: Could I use this technique with Dirk? Currently I rank objects by second degree connection. I wonder if there is a way to rank connections that would depreciate the less useful ones?