Wednesday, April 06, 2005

Reaping the Whirlwind

Wow. Almost two weeks since PyCon, and one week since my last post. Lots of interesting stuff going on. On Friday, I'm off to Las Vegas with my wife for the International Lingerie Show. No, not that kind of lingerie show; it's a trade show, where store buyers gather the new catalogs and place orders. In any case, I won't actually be going to the show, I'll be doing my normal day's work from our hotel room while she goes off to wander the aisles in search of deals and hot new items for her store. The only really interesting thing I saw at the ILS one year was Penn Jillette (of magic team Penn and Teller) checking out a sex swing with his girlfriend. At least I assume it was his girlfriend. And by "checking out", I mean examining the construction of, not actually using it. Get your mind out of the gutter. :)

Python Eggs are coming along nicely, but slowly. They're competing for my attention with numerous other projects for my off-work hours, such as getting PyProtocols' generic functions ready for a "1.0a0" release - minimal docs, and hopefully usable code. I got an e-mail from David Mertz the other day, who plans to do an IBM developerWorks article about generic functions, and he was really pushing for an actual release to point people to. I don't know if I'll be able to get it ready in time for his article, though. I still need to refactor the indexing mechanism for inequalities just a bit. Well, a lot actually. I'm trying to get it so that you don't pay an O(n^2) price at registration time if you're only doing '==' comparisons, and ideally I'd like to get even the other comparison operators down to an O(n log n) price.

These costs are only paid when building the rule indexes, not when actually calling the function, so it's not that bad, except that '==' operations are actually pretty common, especially since I started refactoring peak.web to use generic functions. In peak.web, a generic function is used to map the combination of an object type (or individual object) and a name, to the method, template, or other mechanism used to obtain or render the named subobject in a URL. This involves lots of "name=='something'" rules being created, so O(n^2) starts to add up fast. There is really no reason the algorithm has to be O(n^2) for '==' operations; it's only that way because the same index is usually used for inequality comparisons on the same expression. So, I need to work out a way that adding rules with '==' is an O(1) operation, meaning that it's O(n) to add n rules, instead of O(n^2). Anyway, once I get this done, along with some related API cleanups, I'll feel comfortable making a release that David can talk about in his article.

Oh, and speaking of articles, I just got my author's copies of the Dr. Dobb's Journal "Algorithms" issue. Yes, I actually got an article published in Dr. Dobb's. On a personal level, it's pretty weird, because 10 or 15 years ago I would have been so pumped to have an article in any computer magazine, let alone the prestigious DDJ. But now, the whole world of print seems so much less relevant; I've published so much more material on the net that a simple 3 page article in print seems like a shadow of nothing. "Publish or perish", at least in the computer field, seems to mean, "Publish to the net", because if you're in print you're already months out of date.

It was interesting though to look at the ads, and take a glimpse back into the world of yesteryear when I wrote proprietary software and open source was still an obscure concept. I look at a lot of the stuff and go, "Wow, people still pay for these things? So that they then get code they can't share with anybody else either? Wow." These days, the idea of paying for software that then comes with chains, seems pretty silly. (Note that I'm not saying there is never a time when proprietary licensing is a good thing, nor am I saying that there are no products worth paying for. I am in fact not saying anything that I didn't actually say in plain words here. Summarize or paraphrase at your peril, as I will soon begin giving public Illiteracy Awards to people who can't distinguish between what I actually typed, and what thoughts popped up in their mind while they read it.)

Oh and speaking of thoughts that pop up in the mind -- boy, how's that for a generic transition? -- it is now reasonably definite that Chandler is going to end up with a schema definition API similar to the one I drafted in Spike. Apparently, the experience at the PyCon sprint of folks struggling with the current XML-based schema mechanism was sufficient to establish a consensus that (at least for the schema) XML Must Go, to be replaced with simple Python class definitions and properties, as are used by almost any other Python metadata framework or object-database mapping.

I think this is a really good thing because, aside from validating my prediction that XML would be a barrier to developer adoption, it means that the code is going to get smaller and simpler. Currently the parcel loader actually has to make two passes over the XML, first to load schema, and then to load instances of the schema. So, when all the schema is in Python rather than XML, that whole pass can go away, and the startup time overhead along with it. Second, the Python format for defining schema is considerably less verbose and doesn't involve using XML namespaces to do the equivalent of "import". So, it will be a lot easier to type and read, and the total number of lines of code being written will go down. Developers also won't have their classes split across two files, one a .py file with the methods, and the other an XML file with the attributes.

It does mean, however, that I'm going to be busier than usual for a while, as I work to port the tools I built for Spike to work with the new API, which will be a bit different, as it needs to integrate with the rest of how Chandler exists today. And we're only doing the schema definition API, not the schema events APIs or any of that, so it's probably not going to look much like Spike in overall architecture. But it will allow for unit testing Chandler objects without an active database, which was the original impetus behind me investigating this approach.

Whew. So much to do, so little time. Maybe when I'm in Vegas I'll have a chance to post one of the longer stories I've been wanting to write this past week.