Saturday, August 13, 2005

Ruby Gems, Python Eggs, and the beauty of unittest

It's interesting to look at the Ruby Gems project documentation, as it's amazingly similar in some ways to Python Eggs. Well, similar to what I want eggs to be, anyway. Eggs have got an awful lot of Python-oriented and plugin-oriented features that gems don't appear to, but gems are a lot more, um, polished. :) More specifically, they've already got their basic code signing approach worked out, along with tools to list installed packages, uninstall packages, and they even have a built-in webserver to let you share your local packages. Which is wild, but cool. It's sort of like if PyPI were to work by scanning eggs in its local filesystem in order to serve a package index. I imagine that might make it rather slow to start up, but at least with eggs you have nearly all the useful information in the directory entries and don't have to open the files. So, I imagine you could actually get a pretty good index going that way.

A webserver isn't the top priority for eggs at the moment, though. The 0.6 development version in CVS is going through some growing pains, like the need to finish a basic manual for the pkg_resources module, and to add some more explanatory messages to ez_setup and setuptools about the downloading process. There are also some discussions in progress about the "test" command, and supporting "py.test". I think the unfortunate thing about py.test is that it doesn't extend Python's unittest module.

unittest has gotten something of a bad rap, I think. Regardless of whether you like its basic testing facilities or not, it is an extremely good framework. In fact, I think it's one of the most beautiful frameworks in the Python standard library. Its functionality is cleanly separated into four roles, each of which can be filled by any object implementing the right interface: runners, loaders, cases, and results. Because of this exceptionally clean factoring, the basic framework is amazingly extensible. But a lot of people don't realize it, and so they create competing, incompatible frameworks like twisted.trial and py.test.

I'm not saying these other frameworks are bad, it's just that the additional functionality could usually be added by implementing a replacement loader, runner, case, or result class, depending on what kind of features you want to add. For example, all of py.test's many advertised features could be cleanly implemented as backward-compatible extensions to the unittest framework, that would then run under other test runners like unittestgui, cleanly integrate with the existing setuptools "test" command, etc.

People gripe about stuff like finding and running tests with unittest, but I think that's because they don't know about extending it, or perhaps how to use it properly in the first place. Earlier this year, I wrote a simple 20-line "loader" for unittest that scans subpackages for modules whose names started with "Test". And the unittest.main() function lets you specify what loader you want to use. So, by passing a module or package name on the command line, I can recursively scan all packages for tests, and I didn't have to write a whole new test framework to do it.

I gather, from the hype around various unittest replacements, that this is considered a big deal. Unfortunately, it seems like nobody realizes how easy it is to extend unittest to do these things.

So, a brief tutorial is in order, I think. A "runner" is the top-level thing that runs tests and reports on the results. A "results" object records the success or failure of individual tests as they are executed, possibly reporting on the progress as it occurs; usually it is created by the runner and is specific to the runner. For example, there's a TextTestResult class that outputs the dots or "ok" messages, and a similar class that updates the progress bar in the GUI version. A "loader" finds and accumulates cases to be run into a TestSuite. (A TestSuite is technically a "case" too; but you'll probably never need to subclass it unless you want to do something fancy like implement py.test-style incremental gathering instead of the default gather-then-test behavior.) The default loader can scan a module for test case classes, or run a function returning a case or suite of cases, among other things.

Contrary to apparent popular belief, it is not necessary for you to subclass any particular unittest implementations of any of these ideas. (The default loader uses isinstance() and issubclass() to identify test cases in a module, but this is easily changed in a custom loader if needed.) A "case" object need only implement __call__(result), __str__(), and shortDescription() to be fully compatible with the runner. The __call__ method should call result.startTest(case), result.stopTest(case), and in between call one of result.addSuccess(case), result.addError(case, sys.exc_info()), or result.addFailure(case, sys.exc_info()), as appropriate. The rest is up to the case to manage.

The only methods you need to implement a custom loader are "loadTestsFromName" and "loadTestsFromNames". However, if you subclass the default loader class (unittest.TestLoader) you can selectively override various aspects of its functionality. For my recursive scanning loader, I just overrode the "loadTestsFromModule" method, so that it checked whether the module was a package, and then also searched for subpackages, and any contained modules whose name began with "Test". This was easier than writing a completely custom loader, but if I wanted to create a py.test-like test finding algorithm, I'd definitely start out the same way, by subclassing TestLoader and adding new functionality. That's because creating a custom script to run unittests using a custom loader is just a couple of lines: "from unittest import main; main(module=None, testLoader=MyCustomLoader())". That's not so hard, is it?

If I also wanted to change the progress reporting, I'd create a new "result" class, and then subclass TextTestRunner, overriding _makeResult() to return an instance of my altered result class. unittest.main() also accepts a "testRunner" keyword argument, so that's maybe another line to add to my script.

In short, there's very little reason to create a whole new testing framework; the one we have is just fine, even if its default features may not be to everybody's liking. But it's a framework, which means you're supposed to put something in it. Because it's a standard framework, we have the opportunity to let all our testing tools work together, instead of forcing people to jury rig their tests together. The doctest module in Python 2.4 provides APIs to create unittest-compatible test cases from doctests, which means that I can use doctest in conjunction with my existing 800+ unittest-based test cases. I can't do that with py.test or twisted.trial or any other from-scratch framework that discards the superlative design of the unittest module. unittest may be a mediocre tool, but it's an excellent framework that would allow us to all develop tools that work together, much like WSGI enables interoperability between web applications and web servers. unittest deserves a lot more recognition than it gets, as it could be just the thing to stop us ending up with as many mutually-incompatible testing frameworks as we have mutually-incompatible web frameworks.