Sunday, May 22, 2005

Eggs get closer to hatching...

Well, I finally managed to squeeze out a few cycles to work on Python Eggs, which I'd left virtually untouched since the work Bob Ippolito and I did on them at Pycon. The net result is that this weekend I finished the core dependency resolution engine, the part of the egg runtime that lets eggs specify what other eggs they depend on (including required version(s) and requested optional features), and lets applications request that eggs be searched for and automatically added to sys.path along with their dependencies. There's even a hook that allows you to add support for automatic downloading of dependencies, although no such support will be included with the base system. (Automated downloads just have too many security issues and policy questions, so it'd be crazy to turn them on by default. In any case, GUI applications will want to integrate the download process into their UI in some fashion.)

The two big things that aren't done yet are: 1) actually scanning sys.path directories for .egg files and .egg-info directories, and 2) support for "namespace packages" so that mega-frameworks like Zope, PEAK and Twisted can be split into independent .egg files. In addition to these two big things, there are also a lot of little features and cleanups that would be useful to have. For example, peak.web can't be made .egg-friendly for web components until there's an API equivalent to listdir() for .egg file contents. You can pretty much see all the open issues in the Implementation Status section of the wiki page.

Still, this is an exciting milestone, because the egg system can not only handle cyclic dependencies, report version conflicts, and all sorts of other details, it can now handle "option" dependencies as well. An option is some feature of a package that may or may not be used by a given user of the package, and which may incur other dependencies. For example, let's say I was going to create an .egg for peak.web, with a distribution name of "PEAK-Web". PEAK-Web will depend on PEAK-Core, and also on the WSGIRef library. It also has optional support for FastCGI, but in order to use that support, you would need the FCGIApp library.

In a more simplistic dependency management system, PEAK-Web would have to do one of the following things to support this optional dependency:
  1. depend on FCGIApp (forcing you to install it when you don't need it)
  2. not depend on FCGIApp (forcing you to figure out whether you need it)
  3. create a PEAK-Web-FastCGI package whose only purpose is to depend on PEAK-Web and FCGIApp, which you then depend on in place of depending on PEAK-Web.
These are all ugly, so we invented a better solution for Python Eggs. PEAK-Web will instead define an option called "FastCGI", and it will have an "EGG-INFO/depends.txt" file that looks something like this:

This tells the egg runtime that PEAK-Web always depends on PEAK-Core and WSGIRef, but it only needs FCGIApp if the FastCGI option is requested.

How do you do that? Well, in your application's top-level script, you could call require("PEAK-Web[FastCGI]>=0.5a4"), and this will find and add to sys.path all the necessary eggs, or raise a DistributionNotFound (or VersionConflict) error if the right eggs can't be found, or if two eggs have conflicting version requirements. (Or if an egg that's already on sys.path has a version incompatible with your requested version, or that's incompatible with your request's dependencies' requirements.) While this doesn't eliminate the need for you to be aware of a package's optional features, it does at least eliminate the need to have dummy packages just to bundle optional dependencies.

Anyway, you can't actually use this yet, because I still haven't implemented that part that scans specified directories for .egg files to use. Oops. Hopefully I'll get that done next weekend. In the meantime, if you're adventurous, you can check out the latest setuptools from the Python CVS sandbox and play around with it. Ian Bicking and I also just added some updated documentation to the Building Eggs section of the wiki page, that should make it a bit easier to understand how to package your own or someone else's libraries as .egg files.

Update as of May 23: I squeezed in a few more minutes this evening and managed to actually hack up a halfway working distribution scanner, so the require() API now appears to be working. If anybody wants to start experimenting, I look forward to hearing about your experiences.