Thursday, July 21, 2005

The Quiet Revolution

Python Eggs are popping up everywhere now. The Trac project will apparently become the first extensible, open-source Python application to use eggs as a plugin format. (I had thought Chandler would have earned that distinction, but it's going to be a good few months before it will really get on the roadmap.) There's been talk on the Python Mac SIG about platform version tags for eggs on OS X. Ian Bicking is blogging about eggs as a packaging and deployment mechanism for web components. The ever-linkable Django uses setuptools to simplify its setup, although none of the tutorials or installation instructions mention it. (If they would just run "setup.py register" after adding a little more info to "setup.py", then people could use "easy_install Django" to find and download it via PyPI.)

I've also gotten emails from a few people who are using eggs to manage software projects in their workplaces, as well as to manage external dependencies, and Ian Bicking also recently asked about using egg metadata for an at-work project, too. Not bad for an 0.5 alpha. Not bad at all.

There's a CVS version of 0.6 in the works too. If you're planning to extend or embed setuptools or the pkg_resources API, you should be working with the CVS version, because the APIs went through some changes relative to 0.5, as I refactored them to conform with eggs' new architecture overview and terminology guide.

All in all, it looks like the beginning of a quiet revolution in open source software development with Python. I'm still not promoting these things to the wider world of Python users, though, as I don't want too many people stubbing their fingers and toes on my rough alpha edges. (In fact, I sometimes worry that Django's current popularity might lead to lots of people bumping into problems with setuptools, and creating a backlash. So far that hasn't happened, though.)

So how long till beta? How the heck would I know? Try the alphas and send me bug reports, feature requests, and patches. Better yet, try the CVS version and build more tools that work with eggs. Paul Moore has volunteered to create some simple package management tools, for example, to list or delete installed packages. What ideas do you have?

Monday, July 11, 2005

The Way Forward is to Look Back

Backward compatibility is a blessing for users and a curse for developers, but often the only way to move a technology forward is by providing backward compatibility. With that in mind, I'm pleased to announce setuptools and EasyInstall version 0.5a9, which now attempt to deal sanely with such things as old "unmanaged" package installations (i.e., direct to site-packages or elsewhere) and packages that may or may not be using __file__ or __path__ to look at source or data files. (This was a lot of work, even though I somehow managed to squeeze it in this weekend while also working nearly 14 hours on helping my wife get her store's new location set up, toting heavy boxes and stacks of building materials about!)

In addition to all the cool new features, I also just finished the Developer's Guide and Command Reference for setuptools. The "extending and reusing" section is still all XXX's, but anything else you might want to know should now be there, including stuff on how to use it with Pyrex, how automatic package and source file finding works, how to easily do daily builds and tagged snapshot builds, automatically delete outdated snapshots, how to manage different options for multiple projects while keeping your scripts simple, zipfile safety analysis, and much much more.

Indeed, it's practically a tome on "everything you might want to know about practical build and distribution management for Python libraries" using setuptools and Python Eggs. Well, as much of a tome as has ever existed for that, anyway. It does assume, however, that you have read and understood the distutils manuals, especially Distributing Python Modules. (But Installing Python Modules is a good read, too.)

Anyway, I think I'm finally satisfied with the end-user and developer/packager functionality provided by EasyInstall and setuptools, respectively, so it's time to go back to work on the pkg_resources runtime module, which still needs some improvements. About six weeks ago, I shifted gears from work on the runtime to work on all the user-visible flash and dash, because I was worried that my "market window" for getting Python Eggs some "mindshare" in the community was expiring. Now that EasyInstall and setuptools are getting some attention and use among the SIGs, I can now afford to back off on the marketing/features side, and go make the core engine really work right. :)

For that matter, I can now finally start using setuptools for the use cases I had in mind when I first conceived of it almost a year and a half ago. Namely: distributing packages without bundling their dependencies.

Back then, I envisioned a future bustling marketplace of Python libraries, where reuse was a practical reality instead of just a theoretical possibility. That day isn't here yet, but I can now begin setting an example with the gradual breakup of PEAK into lots of narrowly focused packages like DynamicImports, SecurityRules, VersionTool, PyUUID, PyCLI, and so on. I'm also looking forward to being able to reuse others' packages, without giving more than a moment's thought to "how will I make sure users have this?", and a few seconds to add it to my setup.py.

Yes, looking back is definitely the way forward, because if Python Eggs were only a solution for new and refactored packages, they would be no solution at all. Looking back means looking towards our "legacy" - the wealth of packages that we inherit via PyPI and indeed any web page that links to a Python package's source distribution or win32 binary distribution. With "legacy" often being considered a bad word in the IT field, it's important sometimes to remember its original meaning as an inheritance, especially an inheritance of great value. EasyInstall now lets us connect our rich past, with our even richer future.

Saturday, July 09, 2005

Blood, Toil, Tears, Sweat, and Reference Documentation

There's now a Command Reference for setuptools. Enjoy.

Friday, July 08, 2005

Let there be docs...

for setuptools.

Not a lot there as yet, but if you just want to have your package depend on things, or include some data files, or other relatively simple stuff (and even some advanced stuff), you'll probably find what you need.

In the new features department, setuptools 0.5a8 adds quite a few, like rotating distributions (i.e. "keep the N newest of this type") and command aliases (per project, user, or site) that let you define shortcuts for lengthy setup commands. There's upload support to let you put your packages (source or egg) on PyPI. And a whole lot more, none of which is documented yet, except for entries in the table of contents. But at least it's a start.

Wednesday, July 06, 2005

Reinvent no more forever

Python as a community is plagued by massive amounts of wheel-reinvention. The infamous web framework proliferation problem is just the most egregious example.

Why is Python "blessed" with so much reinvention? Because it's often cheaper to rewrite than to reuse. Python code is easy to write, but hard to depend on. You pretty much have to:
  1. limit yourself to platforms with a suitable packaging system,
  2. bundle all your dependencies into your distribution, or
  3. make your users do all the work themselves
Ouch. No wonder rewriting looks easier. The only way to stop this trend is to make it easier to reuse than to rewrite, which has been my mission with setuptools and EasyInstall. As of setuptools-0.5a5, it's possible to sanely develop packages with dependencies, just by listing them in the 'install_requires' keyword to setuptools.setup(), and including the 'ez_setup.py' bootstrap file in the same directory as your 'setup.py'. When a user does 'setup.py install', your package is installed as an egg, and all your dependencies are downloaded and installed if not already present. Similarly, if a user installs your package with EasyInstall, the same thing happens. What more could you want?

Well, seriously, there are lots of little improvements still on my list -- like documentation, for example. A tutorial on how to convert a package to using setuptools would be awesome. I also want to add some support for automatically detecting whether a package can be safely installed as a zipped .egg file, using a conservative algorithm based on the absence of data files outside the metadata directory.

But for right now, I plan to start by putting my dependencies where my mouth is. I will bundle no more forever, for I am tired of keeping them up-to-date with their upstream maintainers. I shall rend my monolithic packages into smaller projects, so that people can make use of cool parts of PEAK without having to install anything but the part they want. I will find out if this stuff really works. From where the setuptools now stand, I will bundle no more forever.

Friday, July 01, 2005

It's log, log, log

What rolls down stairs
Alone or in pairs;
Rolls over your neighbor's dog?
What's great for a snack
And fits on your back?
It's log, log, log!
-- Ren & Stimpy
It's good to see that other people are experimenting with alternative logging systems for Python. I've never been entirely happy with PEP 282 myself. The interface is okay, but not great; it drops some of the syslog levels and ends up with a set of levels that are either too verbose or not verbose enough. For all practical purposes you can't run with less than WARN level on output, and most of what you log will be either INFO or DEBUG, which basically translates into verbose or insanely verbose.

PEAK adds back in a NOTICE level between INFO and WARN, which allows you to use that as your default filtering level for typical logs and thereby avoid the incessant yammering of overenthusiastic components. But that's not really the big problem with PEP 282, which does after all allow you to add extra levels if you want (just without the benefit of a shortcut method unless you monkeypatch the Logger class).

The big problem with PEP 282 logging is the configuration. Or rather, the lack thereof. If you don't configure it, you get annoying spew from your loggers about how you didn't configure anything for that particular logger, going right to sys.stderr. This then encourages people to manipulate log handlers in library code in order to avoid this, which is exactly the wrong thing to do. Library code should never ever ever ever mess with logging configuration, that's not its job and it will break applications. On the other hand, it's stupid to make an application overly responsible for log configuration in the simple case; most logging should be directable via environment variables or configuration files in standard locations. If I'm writing a simple script, it ought to be easy to get it to do the right thing.

Unfortunately, what library authors want to do with their log output may well be in conflict with what the application author wants. An issue that the library considers critical may well be of trivial consequence to the application. To push the burden of configuring appropriate log levels and filters from 50 library authors onto the users is just plain silly.

In essence, the simple level-based logging concept works only in situations where there is an end-to-end notion of levels , and this ignores the essential concept of modular information hiding. Let's say I have a module (like distutils.dir_util) that manages directories, and logs various things about what I'm doing to them. Depending on the application -- or even different operations in the same application! -- I may want different verbosity to apply.

I ran across this issue recently working on EasyInstall, because the code in different distutils modules has widely varying ideas of what constitutes INFO versus DEBUG level, and for certain operations I wanted it to treat them differently. Similarly, EasyInstall runs packages' setup.py scripts with a lower verbosity level than it employs itself; this is because for the EasyInstall user, the point is mostly about what EasyInstall is doing. The setup script runs in a sandbox and is prevented from actually installing anything directly, so the user probably doesn't really care about the details.

No logging system of which I'm currently aware -- except for an early prototype of PEAK's logging facility, and an earlier logging framework that Ty developed -- actually offer this kind of modularity, where log messages bubble up through a dynamic, component context or caller/callee context, rather than a fixed logger hiearchy or a keyword system. I suspect there are ways to abuse the PEP 282 logging system and the new "py" logging facilities so that they sort of do the same thing, but there's an even more fundamental flaw going on here.

The fundamental flaw of logging systems is that they all basically assume they're creating text. Oh sure, most logging systems, including PEP 282 and the ones Ty and I developed, have some notion of a log record, with information like level, timestamp, and all that sort of thing. But the payload of these logs is basically text, rather than a programmatically-useful description of an event. You're forced to use crude tools like regular expression matching or simple level cutoffs in order to do any kind of content filtering, and god forbid you want to do something like analyze how often event B occurs within X time of event A.

When I first started writing this little rant a couple of days ago, I didn't really know what to do about that. I just thought I'd be writing a wistful little rant about the futility of using logging systems and that would be the end of it. But a funny thing happened midway through the text; I realized that there's not only a good solution to this problem, but in fact it already pretty much exists.

Here's the thing that I realized. Anything you want to "log" is really a record of some event that occurred. An "event" is really an operation, like a function or method call. In the broad sense, what you want is to be able to define flexible, contextual mappings between events and log output, timing analysis, profiling, or other operations upon the events in some form. Ideally, you would also want to define textual messages for recording the operations in a log file or showing to a human viewer, but these definitions' only purpose would be translating events to output, if your filtering rules chose to output them.

The obvious way to link these message translations is with decorators, which coincidentally will encourage you to structure operations into coherent chunks. Sometimes, you'll want to log something that doesn't really have a method of its own, in which case you might create an empty method that doesn't do anything; it's just a stub you call for purposes of offering a progress report to interested callers.

For this to work, we'd need some sort of flexible interception capability that can efficiently wrap an object's methods on an instance-by-instance basis, so that a higher-level component using that object can translate its events into events that make sense for the higher-level component. It can then aggregate those events, log them, ignore them, whatever. Of course, that component is itself really just propagating events in the same way, so only the highest-level component in a given assembly really controls the ultimate output.

Of course, I already have a nice efficient rule processing system in generic functions, and they already have nice flexible wrapping. To make them work with existing objects, I'd need only add some type of interception/injection facility to turn existing functions into generic functions. The rest, while not easy, is at least doable. Mostly it amounts to developing simple APIs that provide shortcuts for commonly-used features, although ideally I'd also like to make the system able to "wire" or "stack" rules together so that any "bubbling up" occurs in a single generic function invocation, rather than resulting in any kind of recursive delegation to higher component levels.

But let's consider a simple case. I have a method, let's say it's copyFile(). I decorate it with a message like "Copying %(source)s to %(destination)s" and a default level like INFO. This wraps the method with a generic function interceptor that will call another generic function, something like logMessage(sender,level,message,args), which then can be configured with rules as to where the message goes. However, this is basically the default message and level for this method. I can define the rules so that it never actually gets logged at all, e.g. by having a rule that if "self is someObject", do nothing. Or I can make it call some event function on my own class. Or do something completely different.

All this rule definition would get exceedingly tedious if you had to hardcode it in terms of specific objects and manually create rules. However, the rules themselves can be expressed as concepts like "the foo attribute of objects of class Bar should have these methods intercepted with such and such rules". This can then be used to create a virtual component hierarchy among objects, even if they aren't "placeful" components like those in PEAK or Zope 3. These "higher order" rules can potentially also do things like, "while this method is executing, intercept this non-method function in such and such a way" -- a useful technique for wrapping utility functions for progress reporting, timing, or logging, or to wrap "legacy" use of the PEP 282 logging framework.

If you think this sounds a lot like AOP, you're right. I pretty much decided about a year ago that AOP = Generic Functions + Transparent Interception, but I've been treating the interception part as a YAGNI (You ain't gonna need it), and therefore haven't implemented it before. Now, however, logging and the like seem like an excellent use case. I'll probably draw some on the accumulated wisdom of the Object Teams approach to AOP in order to develop this, as their conceptual model is heavily based on this kind of "wiring" of components.

I don't expect, however, that this is something I'll be able to actually implement soon. For one thing, my current generic function implementation is unlikely to perform well if it has to add and remove a lot of methods as objects come and go; it's optimized for functions that don't change much. For another thing, I have yet to actually make a release of the generic function package, and I plan to split it off into a separate RuleDispatch project, separate from PyProtocols, although still depending on it. That project in turn is waiting on completion of some features in EasyInstall and setuptools, to make it possible for packages to install their dependencies even when run using "setup.py install".