Monday, March 21, 2005

The Eggs are Coming...

Bob Ippolito and I hacked out a lot of the Python Eggs implementation today. (My new shorthand for explaining them is "An egg is to a Python as a jar is to Java".) I wrote a "bdist_egg" command for setuptools, which is a Python sandbox project where I prototype potential new distutils features. (Two of my previous setuptools features made it into Python 2.4; I'm hoping eggs will be a no-brainer for inclusion in 2.5, especially since the bdist_egg command doesn't really depend on anything else in setuptools.)

Anyway, if you want to play with it, you'll need to check out python/nondist/sandbox/setuptools, and change your to import the "setup" function from setuptools instead of from distutils. Then, you can use " bdist_egg" or " bdist --formats=egg" or any of the other ways you can specify that same thing.

The resulting .egg file can be added to PYTHONPATH or sys.path, as long as it contains only pure Python modules, or if you have the "pkg_resources" runtime installed. Bob has been working on "pkg_resources", but due to various hitches (including a weird bug in Python's zipimport.c), it doesn't actually work yet. But the idea is that .egg files, unlike normal Python zipimport files, can include C extensions as well as pure Python. The "bdist_egg" command automatically generates pure Python stub modules that request extraction of the extensions when the stub is loaded by the normal zipimport machinery. Extensions (and any other contained files that absolutely have to be "real" files) are extracted to a (configurable) cache directory.

Which reminds me. Remember a few paragraphs ago when I mentioned a couple of setuptools features that made it into Python 2.4? Well, one of those features is a way to specify "package data" files, which are data files that get installed within a package's target directory. "bdist_egg" supports this feature by packing those data files into the egg, where they can be accessed via the standard PEP 302 "loader.get_data()" facility. To simplify your use of this, the "pkg_resources" module offers a resource access API that doesn't care whether the data you're accessing is in a "real" directory, an egg, or a regular zipfile. So, we'll be advocating the addition of pkg_resources to Python 2.5, at least once it's working and stable. In the meantime, we'll probably add an option to the bdist_egg command to bundle pkg_resources inside generated eggs, to ensure that it's available when the egg is used.

There's still a fairly long list of features we want to implement in the runtime, but I'll be pretty happy once we get C extensions and "namespace packages" working right, because that'll give us the same zero-install convenience as Java's "jar" files.

<< Mar 18: Time to fly...
>> Mar 21: Interesting...

^^ Home