Post at 21.27, on Wednesday 27 Oct 2004 (Interconnected)

21.27, Wednesday 27 Oct 2004 Link to this post

Normalized data is for sissies [good thread at Kottke] said Cal [pdf]. I don't buy the technical reasons given in the comments pro normalisation--Cal's right, and solid tech will get around the problems. Unnormalised data is an emulation of view tables anyway. Anyway, the best reason for normalisation I can think of is to do with the social structure of the development team and how it changes over time, coupled with the ways the db is accessed.

Half of software architecture is making sure that somebody can fix a bug in a hurry, add features without breaking it, and be lazy without doing the wrong thing. A lot of that depends on whether you're training developers, how big the codebase is (and whether you should expect people to hold all of it in mind when adding new features), etc.

If you use the db pretty raw, without wrappers to take care of accesses, if the schema changes a fair amount, if it's heavily interdependent, if it's pretty big, if the database schema needs to grow quickly, or if you want developers to work on a small bit of the system without risking weird bugs, then you should normalise the data.

Unnormalised data means you've got the potential of changing one bit and leaving a bit that depends on that data - or replicates it - inconsistent. That's action at a distance, something to be avoided in software architecture. The world doesn't work like that, and people don't think counterintuitively when they're in a hurry.

Mind you, a small team, good developers, experienced developers, a site which is fluid in other ways (a decent staging environment, nodes which are easy to pull out in an emergency), code which is easy to fix + a fast rollout system--all of those make flying close to the wire much more tempting and pretty much harmless. Given the team over at Flickr, they can do pretty much anything they want. I would be interested to know how their infrastructure (especially the db) influences the adaptiveness of the code. Doesn't it mean that adding similar-sized features will take more and more work, as the code base grows? This drive towards just-do-it, go back and refactor later, in code--it seems to me like it'd cause potential earthquakes later, where a small change turns into a cascade of refactoring, and it could be any small change, completely unpredictable. Again, with a small company and no ship schedule, not a problem. Another way the architecture is bound up with the social structure of the developers and the business model.