Well, I finally managed to squeeze out a few cycles to work on Python Eggs, which I’d left virtually untouched since the work Bob Ippolito and I did on them at Pycon. The net result is that this weekend I finished the core dependency resolution engine, the part of the egg runtime that lets eggs specify what other eggs they depend on (including required version(s) and requested optional features), and lets applications request that eggs be searched for and automatically added to sys.path
along with their dependencies. There’s even a hook that allows you to add support for automatic downloading of dependencies, although no such support will be included with the base system. (Automated downloads just have too many security issues and policy questions, so it’d be crazy to turn them on by default. In any case, GUI applications will want to integrate the download process into their UI in some fashion.)
The two big things that aren’t done yet are: 1) actually scanning sys.path
directories for .egg files and .egg-info directories, and 2) support for “namespace packages” so that mega-frameworks like Zope, PEAK and Twisted can be split into independent .egg files. In addition to these two big things, there are also a lot of little features and cleanups that would be useful to have. For example, peak.web
can’t be made .egg-friendly for web components until there’s an API equivalent to listdir()
for .egg file contents. You can pretty much see all the open issues in the Implementation Status section of the wiki page.
Still, this is an exciting milestone, because the egg system can not only handle cyclic dependencies, report version conflicts, and all sorts of other details, it can now handle “option” dependencies as well. An option is some feature of a package that may or may not be used by a given user of the package, and which may incur other dependencies. For example, let’s say I was going to create an .egg for peak.web
, with a distribution name of “PEAK-Web”. PEAK-Web will depend on PEAK-Core, and also on the WSGIRef library. It also has optional support for FastCGI, but in order to use that support, you would need the FCGIApp library.
In a more simplistic dependency management system, PEAK-Web would have to do one of the following things to support this optional dependency:
- depend on FCGIApp (forcing you to install it when you don’t need it)
- not depend on FCGIApp (forcing you to figure out whether you need it)
- create a PEAK-Web-FastCGI package whose only purpose is to depend on PEAK-Web and FCGIApp, which you then depend on in place of depending on PEAK-Web.
These are all ugly, so we invented a better solution for Python Eggs. PEAK-Web will instead define an option called “FastCGI”, and it will have an “EGG-INFO/depends.txt” file that looks something like this:
PEAK-Core==0.5a4
WSGIRef>=0.1
[FastCGI]
FCGIApp>=0.1
This tells the egg runtime that PEAK-Web always depends on PEAK-Core and WSGIRef, but it only needs FCGIApp if the FastCGI option is requested.
How do you do that? Well, in your application’s top-level script, you could call require("PEAK-Web[FastCGI]>=0.5a4")
, and this will find and add to sys.path
all the necessary eggs, or raise a DistributionNotFound
(or VersionConflict
) error if the right eggs can’t be found, or if two eggs have conflicting version requirements. (Or if an egg that’s already on sys.path
has a version incompatible with your requested version, or that’s incompatible with your request’s dependencies’ requirements.) While this doesn’t eliminate the need for you to be aware of a package’s optional features, it does at least eliminate the need to have dummy packages just to bundle optional dependencies.
Anyway, you can’t actually use this yet, because I still haven’t implemented that part that scans specified directories for .egg files to use. Oops. Hopefully I’ll get that done next weekend. In the meantime, if you’re adventurous, you can check out the latest setuptools
from the Python CVS sandbox and play around with it. Ian Bicking and I also just added some updated documentation to the Building Eggs section of the wiki page, that should make it a bit easier to understand how to package your own or someone else’s libraries as .egg files.
Update as of May 23: I squeezed in a few more minutes this evening and managed to actually hack up a halfway working distribution scanner, so the require()
API now appears to be working. If anybody wants to start experimenting, I look forward to hearing about your experiences.
I don’t get it. Part of the reason Java is so hard to teach to newbies is that the classpath/jar stuff confuses the hell out of them.
What makes pyeggs more than a solution in search of a problem?
Well, we’re in about the same boat with sys.path, so it’s not that much better than Java right now. But I think in general the comparison to JARs isn’t a very attractive one.
The useful parts, to me, are (a) versioned imports (b) easy local installation (you can drop an egg in a local directory easily), (c) no installation (it’s all one file or directory), (d) an implicit index of installed packages (but read from the actual files, so no sync issues).
Personally I think the (optional) zip file aspect isn’t very interesting. From a management point of view, I think directories are nearly as easy to manipulate as zip files, and avoid a whole slew of problems.
“””I don’t get it. Part of the reason Java is so hard to teach to newbies is that the classpath/jar stuff confuses the hell out of them.”””
That’s what ‘require()’ is for. The major difference between eggs and jars is that with the dependency resolution system in place, you can just put the eggs into a directory on sys.path, instead of having to put each individual egg on sys.path. Then, an application need only call ‘require(“SomePkg>=1.2”)’ to automatically locate it and add it to sys.path — along with all its dependencies! This is dramatically simpler than what you have to do with jars, unless you’re using OSGi or Eclipse or something, and even then it’s a lot more complex than just calling ‘require()’.
Oh, and by the way, have you found that teaching the distutils to Python newbies is any easier than explaining jars? I’d find that surprising, as my experience has been quite the opposite.
“””What makes pyeggs more than a solution in search of a problem?”””
Well, I could go on and on and on about this — and I have — but here are a few high points in addition to the ones Ian mentioned: breaking up monolithic mega-packages into smaller components. Making it easier to distribute and install a small package that depends on other packages. Making it easier to create “plugins” for use with applications that are extensible in Python. Providing an infrastructure for applications to automatically install/update/download plugins. Simplifying dependency management for executable builders like py2app and py2exe, while providing a runtime API for executable-packaged applications to be able to access their data files.
On the surface it might seem as though eggs are just a trivial feature addition to Python, but I expect them to make a significant improvement in how Python libraries and frameworks get packaged, distributed, and used. I also expect them to be a huge boost to the utility of WSGI, once there’s a way to deploy WSGI applications using eggs.
Thanks for the explanations.
(What I found easier to teach newbies in python is “python setup.py install” and then you’re done. You don’t have to get into sys.path for a long time.)
“””What I found easier to teach newbies in python is “python setup.py install” and then you’re done. You don’t have to get into sys.path for a long time.”””
Ah, well, my “newbies” are usually developers who are experienced in other languages, or even in Python, but new to the distutils. It’s amazing how quickly you hit issues like the install not working because you can’t update the OS install of Python, and if you’re using a package that uses an install subdirectory (like some of the SciPy stuff used to and maybe still does), then you’re going to find that you can install to an alternate location, but then the package doesn’t work unless you munge sys.path some more. If you don’t own the box you’re installing on (e.g. a non-root user on Unix), using the distutils to install is a bitch. Also, if you’re installing on Windows and don’t have a C compiler, it can be another kind of bitch. And so on.
What this generally leads to is that the Python community doesn’t have a strong tradition of code reuse across developer-organization boundaries. People tend to have one major thing that they distribute, and it tends to have everything it needs bundled in, and it tends to reinvent wheels. This is not something that happens as often with the Perl and Java communities, because Perl has CPAN (and CPAN alternate-location installs actually work!) and Java has jars.
I think that if it’s really *easy* to obtain and use packages, then the reuse boundaries will start to break down. With eggs it will get a lot easier, but to get *really* easy, we’re also going to need to make command-line and GUI tools (like apt-get and the CPAN shell) to download a desired package and its dependencies. For that, we’re going to need some more cataloging improvements so that the packages can be located for downloading. But I think that step will come after the basic egg infrastructure is available and working.
Indeed, I think that if I get the basic egg infrastructure out there, and even if I don’t do anything else, it’s likely that multiple people will independently come to the conclusion that a search-and-download tool for eggs would be a handy thing to have, and write competing ones. 🙂
Anyway, as I said, the pkg_resources module has a hook built in to allow easy creation of custom downloaders; the AvailableDistributions class can be subclassed so that it can find distributions that aren’t physically present on the system, and put them on a download list while resolving a list of requirements (like “make it so I can use Chandler”). ‘require()’ is just an abbreviation for creating a stock AvailableDistributions object and asking for a list of the distributions needed for the requested packages, so to create a downloader all you really need is to have a metadata database plugged into a subclass of AvailableDistributions.
Hm, now that I’m thinking of it, it might be handy to add a __main__ section to pkg_resources so that if you do “python -m pkg_resources install ‘FooBar>=1.2′” it’ll download stuff from PyPI. On second thought, it should probably be a separate ‘pkg_install’ module, so as not to load extra junk into the runtime. Ah well, that’s all for later anyway.
You might want to check out Zero Install and Autopackage if you’re looking for inspiration outside of the Python/Java world.
Phillip – I’ve been thinking about that download-and-install problem lately, and for the moment it’s my personal front line. Call it the scattered priorities of open source development… but anyway, give a look at catalog-sig, as I brought some stuff up there recently about extending PyPI, and I hopefully will be able to write it up more formally tonight. I think the changes are fairly minimal.
From there my own priority will be downloading and installing distutils packages, since that’s the norm for the moment, but that code should be transferable to a more automated/integrated system like eggs and require().
Ian, for what it’s worth, the problems of downloading and installing distutils packages are very different from those involving eggs, so you may find you’re creating a lot more infrastructure than you’d like. For example, distutils packages have no reasonable way to specify dependencies, but they have only one real downloadable – the source. Conversely, eggs deal with dependencies quite smashingly well, but there are downloadables for each platform unless the package is “pure”. These are very distinct problems, and after spending time early last year trying to do distutils dependency support, I concluded that the egg problems are much more solvable; you just need to be able to find out the download URL for your platform for a given egg, and the egg runtime can take care of the rest. (By which I mean that once you download the egg, the runtime can tell you what other eggs you need to download, if any.)
Of course, that doesn’t mean you shouldn’t go ahead and solve the download-and-build problem, I’d just encourage you to have the process capable of building eggs, so that people can more easily put up “build mirrors” that download non-egg packages and then publish them as eggs. 🙂
The biggest open issue that I currently see for egg distribution is that distutils’ “platform” strings kind of suck. If I understand correctly, you can have platform strings that are different, yet compatible, and also platforms that are incompatible, whose platform strings are the same. (E.g. “win32” is used even on 64-bit windows.)
Currently, the egg system is using distutils platform strings to identify an egg’s build platform, but this needs to change before egg usage becomes widespread.
Speaking of becoming widespread, you mention that distutils source packages are the norm, but keep in mind that if you really solve the download-and-install problem, whatever solution you provide will in fact become the norm quite rapidly, as everybody starts bugging distributors to make something that’s usable with the download-and-install system. So, make it use eggs! That way, I can piggyback on your success. 🙂
I just see one little task within reach — automatically downloading and installing packages — that would be useful and attainable. I’m definitely not going to try to reproduce Eggs. If I have to enumerate all the required packages, including indirect dependencies, and give explicit releases (or more abstract most-recent-release or svn-trunk), that’s okay with me — it’s further along than we are now. And installation probably will simply mean “run setup.py and install in X”, where if you aren’t installing globally I construct some sensible set of distutil options for installing locally (of which there are several, and I’d like to insulate users from the quite boring specifics). Almost all of this (except the distutils options) should be applicable to eggs.
Anyway, that’s my goal. I’m heading into kind of a pinch, so I don’t know if I’ll get there in a timely fashion, especially when it requires discussion; I’m not feeling terribly discussive lately 😉 But I’m hopeful.
Hi,
I’ve had the problem to create a self contained python release that I could install as user without subverting the host OS. I ended up creating pyvm.sourceforge.net
my personal python machine similar in aim to java jdk/jre. It is relocatable, it has support for the setup.py build system cycle
and it does what I need.
Fell free to give it a try (if you happen to be on a SuSE/redhat system you could use the binary).
regards,
antonio cavallo