Saturday, February 10, 2007

WSGI Middleware Considered Harmful

<rant mode="on" type="get-off-my-lawn-you-crazy-younguns">

WSGI middleware is hazardous to your API's health.  There, I said it.

It's not the actual idea of WSGI middleware I have a problem with, but what a lot of people seem to think the idea of WSGI middleware is.

Unfortunately, the way I wrote the part of  PEP 333 that talks about how maybe frameworks can someday become middleware, has produced an entire school of thought that's taken this to mean that middleware should be used to create protocols for extending WSGI itself!

More precisely, people seem to think that if you're going to offer a service via middleware, you have to stick the API for that service into the WSGI environment.  Apparently, they think that because WSGI puts things in the environment, and because servers are allowed to provide extension APIs via the environment, that it makes sense to have middleware whose sole purpose is to put APIs into the environment.

But this doesn't make any sense.  If your application requires that API to be present, then it's not middleware any more!

Middleware, you see, exists in order to wrap applications with additional behavior.  If you can't arbitrarily decide whether to stick it in front of an application, it's not middleware any more.  It's part of the application.

And if it's part of the application, trying to put it on the outside of the application is a complete freaking waste of time!

Meanwhile, you then have to pull stuff out of the environment in order to use it, doing crap like environ['some.session.service'].whatever() in order to do something that could've been written more clearly as SomeSessionService(environ).whatever(), and doesn't require you to stack a bunch of so-called "middleware" on top of your application object.

So please, end the madness now.  If you are tempted to put something into the environment using middleware, so that an application can pull it out later by direct access to the environment, please, just don't.

Instead, write library functions or classes that do whatever needs to be done, and which (maybe) store data in the environment, using library-specific keys.  Do not make users of your library access these keys directly; instead, give them functions or methods that do useful things with this private data.

Do not make users wrap their application with your so-called "middleware", when a callback or decorator will do.  Yes, technically that's still middleware, but practically it's a lot easier to use middleware expressed via a decorator or by passing a callback to a library function, rather than having to configure a complex middleware stack at the server deployment level.

Reserve such true middleware...  i.e., deployment middleware, for tools that are not specific to the application.  Tools like testers, debuggers, loggers, proxies, routers, caches, compressors, and so on.

There are many, many legitimate applications for deployment-stack middleware like this, and almost without exception they are tools that don't care what application they're used with, and neither does the application know they exist.

But application "middleware", while being WSGI middleware in a technical sense, should not be forced into the deployment stack, where it's harder to deploy.  Give your users a real API, for crying out loud, and let them use it inside their application.

Okay.  'nuff said.


P.S. One more reason to avoid implementing APIs as middleware, is that it's extremely difficult to write correct, WSGI-compliant middleware, and very easy to do some very stupid things in the process.  Always test your middleware using wsgiref.validate on both sides of the middleware.  That is, wrap a validator around not only the middleware itself, but also the application the middleware is wrapping!  If you don't do that, you'll miss all kinds of compliance bugs.

P.P.S. Notice that this means WSGI 1.0 has failed in its goal of making middleware "both simple and robust".  I have some ideas for a WSGI 2.0 that would drastically simplify both applications and middleware, but at the cost of eliminating the write() callable for unbuffered streaming.  My idea for 2.0 would allow WSGI 1.0 apps to be called from a 2.0 server (using 2.x middleware, of course), but write() operations could only be streamed if the middleware used greenlets or threads or some other async mechanism.

Here's what a WSGI 2.0 application could look like:

def simple_app(environ):
    """Simplest possible WSGI 2.0 application object"""
    return (
        '200 OK', 
        ['Hello world!\n']

I'll leave it as an exercise for the reader to figure out how much simpler everything (including libraries, applications, and middleware) can be under this approach, at the expense of driving write()-driven streaming applications into second-class citizens that have to be run in another thread... which they already have to be under servers like Twisted, anyway.