Tuesday, February 15, 2005

Making it from scratch (with TDD and Python)

Ah, the smell of fresh-baked bread. Or maybe it's fresh-cut wood, depending on your choice of metaphor for programming.

The last couple of weeks, I've been working on a prototype platform API for Chandler, called "Spike". It's purely a speculative feasibility prototype, so it's not guaranteed that it or anything remotely like it will ever end up in Chandler. (So don't bug any OSAF folks about it, especially since I haven't even finished the first of its five implementation layers!)

Anyway, my design goal was to create a "dirt simple" API for developing Chandler content types, and so I chose to create Spike "from scratch", using a test-driven development approach and no libraries except for what already comes with Python 2.4. For one thing, I want to minimize the number of things somebody has to understand in order to do something useful. For another, I don't want any quirks of an external library (including the libraries I wrote myself) to get in the way of explaining or "selling" the API.

The interesting thing is just how much fun it's been! If you prefer a carpentry metaphor, then it's like getting to tailor every joint to the kind of wood, the placement of the knots, and the final placement of the piece. If you prefer a cooking metaphor, then it's like getting to use the best seasonal ingredients to bring out the true flavor of the dish. Either way, it's fun. :)

It also gives you a chance to experience your mastery, or lack thereof, of the art in question. In this case, the Spike schema API may not be the tiniest domain metamodelling facility I've ever built, but it's probably one of the most featureful, as well as one of the smallest in terms of code size (about 500 lines), and simplest in internal implementation. It is quite simply a thing of beauty, an irridescent jewel of interlocking features. (Which is of course independent of whether it's actually the right API for Chandler to offer, which is why you should not assume it will ever end up in there!)

A few weeks ago I was reading Christopher Alexander's classic article, "A City is not a Tree", which describes the principle of overlap in design, so perhaps my design choices in Spike were subconsciously influenced by it. Alexander points out that the life of a city (and the beauty of a lot of art) comes from the overlaps it contains, where adjoining areas share a common functional part. He argues for cities whose structure is more like a semilattice and less like a hierarchy (i.e. not a tree).

Spike's overall design is based on a concept of layers, but its lowest layer (the one I've almost finished building) overlaps itself in a number of interesting ways, because I made various "adjoining areas" share common functional parts. This rich overlap reduces the number of things you have to learn in order to be able to do useful things.

For example, the event system is used to implement various kinds of validation and constraint maintenance operations (such as updating bidirectional references), rather than having separate features for each of these uses. All attributes are represented uniformly using sets, so there is only a single implementation type needed, and only a single event type for all "change events".

These pieces will have even more overlap as the other layers are added, because the event system (inspired by PyDispatcher) will also be used to implement the persistence lifecycle, connecting to the UI layers' MVC system, and various kinds of hooks needed for Chandler extensions to connect with the platform.

So, low-level concepts like events will get reused at all levels of the system, which means someone learning the API will encounter them over and over again, without having to learn separate event models for GUI objects, query changes, object changes, etc.

This is the same kind of conceptual density that Python itself has. In Java, you have to use "new" plus a constructor to create an object, but Python just makes classes callable, and reuses the concept of "calling" to invoke the constructor. This kind of synergy is a repeating theme in Python, where simple concepts like calling, iteration, mappings, sequences, and attributes occur over and over again, reused for different purposes. My hope is that these concepts of repetition and simplicity will make the Spike API very easy to learn and use, in the same manner that Python itself is.

But of course, only time will answer that question. In the meantime, I'll enjoy the pleasant smell of pie (Py?) wafting through my "kitchen", and get back to work on making the next dish.

Update: on rereading this after posting, I realized that it sounds like I'm practically breaking my arm patting myself on the back for my design choices. What somehow got lost in the writing was that these design choices mostly emerged spontaneously out of the process of test-driven development itself! That is, using TDD forced me to use a very bottom-up process of moment-to-moment design and development, during which the project's overall time pressure forced me to figure out how to write things "once and only once".

In other words, I originally intended a big part of this article to be about exploring how TDD helps push you towards simple, beautiful designs. I've designed many metamodel systems, but Spike was the first I designed using TDD, and it turned out much better than any of the ones I designed without TDD. Anyway, I'm not going to go back and rewrite the whole damn article now, so if you want to read it as me being incredibly arrogant, feel free, 'cause you probably would have done that anyway no matter what I said. :)