It’s interesting to look at the Ruby Gems project documentation, as it’s amazingly similar in some ways to Python Eggs. Well, similar to what I want eggs to be, anyway. Eggs have got an awful lot of Python-oriented and plugin-oriented features that gems don’t appear to, but gems are a lot more, um, polished. 🙂 More specifically, they’ve already got their basic code signing approach worked out, along with tools to list installed packages, uninstall packages, and they even have a built-in webserver to let you share your local packages. Which is wild, but cool. It’s sort of like if PyPI were to work by scanning eggs in its local filesystem in order to serve a package index. I imagine that might make it rather slow to start up, but at least with eggs you have nearly all the useful information in the directory entries and don’t have to open the files. So, I imagine you could actually get a pretty good index going that way.
A webserver isn’t the top priority for eggs at the moment, though. The 0.6 development version in CVS is going through some growing pains, like the need to finish a basic manual for the pkg_resources module, and to add some more explanatory messages to ez_setup and setuptools about the downloading process. There are also some discussions in progress about the “test” command, and supporting “py.test”. I think the unfortunate thing about py.test is that it doesn’t extend Python’s unittest module.
unittest has gotten something of a bad rap, I think. Regardless of whether you like its basic testing facilities or not, it is an extremely good framework. In fact, I think it’s one of the most beautiful frameworks in the Python standard library. Its functionality is cleanly separated into four roles, each of which can be filled by any object implementing the right interface: runners, loaders, cases, and results. Because of this exceptionally clean factoring, the basic framework is amazingly extensible. But a lot of people don’t realize it, and so they create competing, incompatible frameworks like twisted.trial and py.test.
I’m not saying these other frameworks are bad, it’s just that the additional functionality could usually be added by implementing a replacement loader, runner, case, or result class, depending on what kind of features you want to add. For example, all of py.test’s many advertised features could be cleanly implemented as backward-compatible extensions to the unittest framework, that would then run under other test runners like unittestgui, cleanly integrate with the existing setuptools “test” command, etc.
People gripe about stuff like finding and running tests with unittest, but I think that’s because they don’t know about extending it, or perhaps how to use it properly in the first place. Earlier this year, I wrote a simple 20-line “loader” for unittest that scans subpackages for modules whose names started with “Test”. And the unittest.main() function lets you specify what loader you want to use. So, by passing a module or package name on the command line, I can recursively scan all packages for tests, and I didn’t have to write a whole new test framework to do it.
I gather, from the hype around various unittest replacements, that this is considered a big deal. Unfortunately, it seems like nobody realizes how easy it is to extend unittest to do these things.
So, a brief tutorial is in order, I think. A “runner” is the top-level thing that runs tests and reports on the results. A “results” object records the success or failure of individual tests as they are executed, possibly reporting on the progress as it occurs; usually it is created by the runner and is specific to the runner. For example, there’s a TextTestResult class that outputs the dots or “ok” messages, and a similar class that updates the progress bar in the GUI version. A “loader” finds and accumulates cases to be run into a TestSuite. (A TestSuite is technically a “case” too; but you’ll probably never need to subclass it unless you want to do something fancy like implement py.test-style incremental gathering instead of the default gather-then-test behavior.) The default loader can scan a module for test case classes, or run a function returning a case or suite of cases, among other things.
Contrary to apparent popular belief, it is not necessary for you to subclass any particular unittest implementations of any of these ideas. (The default loader uses isinstance() and issubclass() to identify test cases in a module, but this is easily changed in a custom loader if needed.) A “case” object need only implement __call__(result), __str__(), and shortDescription() to be fully compatible with the runner. The __call__ method should call result.startTest(case), result.stopTest(case), and in between call one of result.addSuccess(case), result.addError(case, sys.exc_info()), or result.addFailure(case, sys.exc_info()), as appropriate. The rest is up to the case to manage.
The only methods you need to implement a custom loader are “loadTestsFromName” and “loadTestsFromNames”. However, if you subclass the default loader class (unittest.TestLoader) you can selectively override various aspects of its functionality. For my recursive scanning loader, I just overrode the “loadTestsFromModule” method, so that it checked whether the module was a package, and then also searched for subpackages, and any contained modules whose name began with “Test”. This was easier than writing a completely custom loader, but if I wanted to create a py.test-like test finding algorithm, I’d definitely start out the same way, by subclassing TestLoader and adding new functionality. That’s because creating a custom script to run unittests using a custom loader is just a couple of lines: “from unittest import main; main(module=None, testLoader=MyCustomLoader())”. That’s not so hard, is it?
If I also wanted to change the progress reporting, I’d create a new “result” class, and then subclass TextTestRunner, overriding _makeResult() to return an instance of my altered result class. unittest.main() also accepts a “testRunner” keyword argument, so that’s maybe another line to add to my script.
In short, there’s very little reason to create a whole new testing framework; the one we have is just fine, even if its default features may not be to everybody’s liking. But it’s a framework, which means you’re supposed to put something in it. Because it’s a standard framework, we have the opportunity to let all our testing tools work together, instead of forcing people to jury rig their tests together. The doctest module in Python 2.4 provides APIs to create unittest-compatible test cases from doctests, which means that I can use doctest in conjunction with my existing 800+ unittest-based test cases. I can’t do that with py.test or twisted.trial or any other from-scratch framework that discards the superlative design of the unittest module. unittest may be a mediocre tool, but it’s an excellent framework that would allow us to all develop tools that work together, much like WSGI enables interoperability between web applications and web servers. unittest deserves a lot more recognition than it gets, as it could be just the thing to stop us ending up with as many mutually-incompatible testing frameworks as we have mutually-incompatible web frameworks.
Interesting. Probably one of the reasosn people don’t do this is that it’s far from clear that it’s possible, just based on the documentation.
OK, you could look at the code (which I assume is how you found this out) but what is there to suggest that this is worth doing? On the contrary, I’ve seen postings claiming that unittest is hostile to extension, because it uses “private” names internally. That sort of allegation, unfounded or not, has a bad habit of shaping people’s thoughts…
Would it be worth submitting a patch to the unittest documentation covering this?
I think at least I have stated it is hostile to extension. Reading this description, perhaps not; if I had given more credence to the documented API and ignored the implementation I might have done fine. But I worked from the code, and trying to subclass the given classes, and I found that very discouraging.
About testing frameworks:
I disagree about unittest being easy to extend, having worked hard at it. The implementation isn’t factored well, the interdependencies are difficult to work around.
Filter which tests to run based on a regular expression.
Run tests in parallel with a thread pool.
Not fun with unittest’s current factoring.
And do it all so the test suites themselves don’t require any changes. (e.g. a different superclass).
The basic problem is that test suites run themselves recursively: to modify the running behavior, you need to change the TestSuite classes themselves (TestRunner being misnamed – it only calls a suite’s run method, it doesn’t run each test).
Not to dis unittest, I think it’s great! That’s where TestOOB comes in. (http://testoob.sourceforge.net)
It’s a testing framework that works out of the box with unittest tests. It’s a refactoring of unittest’s services – the test suites themselves are considered only as containers of individual tests.
It’s pretty cool, although I’m biased. Since we don’t use unittest’s code for running or reporting, we refactor it to be as simple and extensible as we can.
BTW – I think py.test is cool too, but integrating with existing suites is higher on my priorities, so instead I get inspiration from its nice features for inclusion in TestOOB.
If you’re missing a testing feature, instead of solving it once, solve it for everyone. Even when it’s possible to implement a feature simply, it would be nicer to implement it once for everyone’s use, and to make sure that the implementation plays nice with other features (features implemented privately rarely do).
So help TestOOB be better, and ask for any new features you’d like, or even send a patch for a feature you’ve implemented.
“””Filter which tests to run based on a regular expression.”””
That’s so easy to do with unittest it’s practically obscene. Just subclass TestLoader to create MatchingTestLoader(match_expr) and override a few of the right methods (getTestCaseNames(), and maybe loadTestsFromName() if you want to filter class names as well as individual test names).
“””Run tests in parallel with a thread pool.”””
Also obscenely easy; in your TestLoader subclass, set ‘suiteClass’ to a ParallelTestSuite class that implements your desired threading logic. (Similarly, to do py.test-style incremental running, just make a TestLoader subclass with a suiteClass set to an IncrementalSuite class.)
Now, I’m ignoring here the fact that running an arbitrarily-selected collection of tests in a thread pool is almost certainly a Very Bad Idea. An explicitly-designated ParallelTestSuite, however might be useful for certain kinds of tests. In any case, bad idea or not, you can certainly do it with near-miraculous ease; it’s even easier to implement compatibly than regex matching is!
“””Even when it’s possible to implement a feature simply, it would be nicer to implement it once for everyone’s use, and to make sure that the implementation plays nice with other features (features implemented privately rarely do).”””
My point is that none of the *publically* implemented features (py.test, TestOOB, trial, etc.) are compatible with each other. If they all implemented the unittest-defined interfaces, they would have at least a prayer of compatibility. Those who fail to study unittest are doomed to reinvent it – needlessly and usually incompatibly.
Moving from the general discussion about unit test to the more specific of setuptools and unit testing…
I can certainly see the arguments for writing to a standard interface, and the unittest interface does have a good set of extension points.
Like others, I’ve been spoiled by py.test’s convenient collection of test methods. I don’t care about the fact that you don’t have to subclass TestCase or anything like that. I just don’t want to manually write test suites, which is in line with the example you gave in the text.
I’d like to make it so that running python setup.py test will run the tests. setuptools lets you specify a test suite, but does not let you specify TestLoaders or anything else like that.
The path of least resistance would be for me to write a suite script that collects up the tests itself. Another approach would be to monkeypatch the defaultTestLoader, replacing it with my own. A final option would be to have something in setuptools that lets you specify a TestLoader.
What do you think is a good way to easily run all of the tests for a project?
Phillip, I completely agree with your preference for using existing, popular frameworks and recognizing their power. I also agree that many disparate frameworks usually mean efforts spent that can’t be used together.
Here’s the history and reasoning that led me to write TestOOB, even though unittest existed and I was using it and having fun.
I wanted some some features not built into unittest, and so I implemented them. Each feature was a new effort to plug into unittest’s extension mechanisms. Not entirely difficult, but I kept writing many lines of code for an idea that could be expressed in a short sentence. Each extension had to play nice with the others, in different combinations.
Here is the fundamental point were we disagree – you claim that the only prayer for compatibility is using unittest extension points. But when you implement 5 different TestLoader subclasses, how do you combine their abilities? How do you make sure your test suite (e.g. a thread-using test suite) works correctly with non-thread-safe TestResults?
In my own experience, I had to create compatibility mechanisms myself. At one point, I realised that what I’ve written is, in fact, a testing framework that can handle unittest test suites as an input. A good case of compatibility – you can use it with tests written for unittest.
The final step was realizing that I was not prepared to invest so much work on this framework for private use, so I made it a public project.
Extending unittest with its standard mechanisms does not, in my opinion and experience, lend to interoperability.
But developing a framework, or contributing to an existing one, is a much better step along this road, because your code will be available to more people than yourself, will be tested and debugged in the context of other features and environments, and so on with the benefits of a released package.
You can create a ‘unittest extensions’ framework which uses only unittest’s API, and collect your extensions and other’s, and I would think that’s a great idea! It’s what I had in mind with TestOOB at first, until I saw benefits in refactoring the internals and decided to leave unittest purity behind.
I still believe that only if you create a framework or contribute to an existing one will you get any sort of long-term compatibility.
“””A final option would be to have something in setuptools that lets you specify a TestLoader.”””
Yeah, I’m thinking I’ll add that in 0.6a2 or thereabouts.
“””How do you make sure your test suite (e.g. a thread-using test suite) works correctly with non-thread-safe TestResults?”””
By putting the thread-safety in adapters that get passed down to the tests, of course.
“””I still believe that only if you create a framework or contribute to an existing one will you get any sort of long-term compatibility.”””
Me too – the existing framework, however, is unittest, which is conveniently already distributed with the stdlib. Also, I think it’s important to note that there is now a new trend in Python frameworks, just barely beginning: *open* frameworks, that span individual authors and package distributions.
For example, setuptools now has three people distributing subframeworks: easydeb, testido, and buildutils are all setuptools extension frameworks. Setuptools establishes the basis for these other frameworks and tools to plug in, but the “framework” encompasses them all, because it’s open-ended.
WSGI is another “open framework”, in that it allows web application writers and server authors’ systems to interoperate.
In the same way, the unittest framework can be such an open framework standard. Python really needs to make this leap, from closed, monolithic frameworks to open standards.
Now after reading this article it seems so easy to extend unittest. Now my first task (defined here).
I want to write a TestRunner, which simply just repeats tests X times, X given as the parameter –repeatX (or -rX).
I looked into the code and it doesn’t seem that easy anymore …
It’s not just writing a new TestRunner, I would need to extend the TestProgram (which parses the cmd line args) and extend the TestRunner then.
Or is it easier than I can think of, and I just didn’t understood your article?
–Wolfram
Well, you could just change the TestProgram, and have it call the TestRunner N times. Or you could have a MultiTestRunner that gets the count from an environment variable. But for an integrated setup, yeah, you’d need to subclass both.
“””I’m not saying these other frameworks are bad, it’s just that the additional functionality could usually be added by implementing a replacement loader, runner, case, or result class, depending on what kind of features you want to add.”””
I recently had to extend unittest considerably to handle a few very advanced features and I don’t find it so modular.
Specifically I needed to customize unittest so that a certain setup and teardown would happen in the beginning of running all the tests in a Class in certain cases, and in the beginning of a Suite run in other cases.
One single line in the unittest module, and all the subsequent assumptions about how the tests were loaded, made this incredibly difficult. Line 498
return self.suiteClass(map(testCaseClass, testCaseNames))
The rest of the module assumes that the tests are loaded in this manor, and you can’t change that assumption without altering the rest of the module via inheritance.
I’d love to break that up and make it more modular but I can’t find a SIG or any other place that I might go to solicit a proposal.
“””The rest of the module assumes that the tests are loaded in this manor, and you can’t change that assumption without altering the rest of the module via inheritance.”””
You can make your own loader(s) that do whatever you like, as long as the resulting object acts like a test suite. Have a look at ‘setuptools’ or ‘nose’ for examples.