fbpx
dirtSimple.orgwhat stands in the way, becomes the way
WSGI Middleware Considered Harmful

WSGI Middleware Considered Harmful

<rant mode=“on” type=“get-off-my-lawn-you-crazy-younguns”>

WSGI middleware is hazardous to your API’s health.  There, I said it.

It’s not the actual idea of WSGI middleware I have a problem with, but what a lot of people seem to think the idea of WSGI middleware is.

Unfortunately, the way I wrote the part of  PEP 333 that talks about how maybe frameworks can someday become middleware, has produced an entire school of thought that’s taken this to mean that middleware should be used to create protocols for extending WSGI itself!

More precisely, people seem to think that if you’re going to offer a service via middleware, you have to stick the API for that service into the WSGI environment.  Apparently, they think that because WSGI puts things in the environment, and because servers are allowed to provide extension APIs via the environment, that it makes sense to have middleware whose sole purpose is to put APIs into the environment.

But this doesn’t make any sense.  If your application requires that API to be present, then it’s not middleware any more!

Middleware, you see, exists in order to wrap applications with additional behavior.  If you can’t arbitrarily decide whether to stick it in front of an application, it’s not middleware any more.  It’s part of the application.

And if it’s part of the application, trying to put it on the outside of the application is a complete freaking waste of time!

Meanwhile, you then have to pull stuff out of the environment in order to use it, doing crap like environ['some.session.service'].whatever() in order to do something that could’ve been written more clearly as SomeSessionService(environ).whatever(), and doesn’t require you to stack a bunch of so-called “middleware” on top of your application object.

So please, end the madness now.  If you are tempted to put something into the environment using middleware, so that an application can pull it out later by direct access to the environment, please, just don’t.

Instead, write library functions or classes that do whatever needs to be done, and which (maybe) store data in the environment, using library-specific keys.  Do not make users of your library access these keys directly; instead, give them functions or methods that do useful things with this private data.

Do not make users wrap their application with your so-called “middleware”, when a callback or decorator will do.  Yes, technically that’s still middleware, but practically it’s a lot easier to use middleware expressed via a decorator or by passing a callback to a library function, rather than having to configure a complex middleware stack at the server deployment level.

Reserve such true middleware…  i.e., deployment middleware, for tools that are not specific to the application.  Tools like testers, debuggers, loggers, proxies, routers, caches, compressors, and so on.

There are many, many legitimate applications for deployment-stack middleware like this, and almost without exception they are tools that don’t care what application they’re used with, and neither does the application know they exist.

But application “middleware”, while being WSGI middleware in a technical sense, should not be forced into the deployment stack, where it’s harder to deploy.  Give your users a real API, for crying out loud, and let them use it inside their application.

Okay.  ’nuff said.

</rant>

P.S. One more reason to avoid implementing APIs as middleware, is that it’s extremely difficult to write correct, WSGI-compliant middleware, and very easy to do some very stupid things in the process.  Always test your middleware using wsgiref.validate on both sides of the middleware.  That is, wrap a validator around not only the middleware itself, but also the application the middleware is wrapping!  If you don’t do that, you’ll miss all kinds of compliance bugs.

P.P.S. Notice that this means WSGI 1.0 has failed in its goal of making middleware “both simple and robust”.  I have some ideas for a WSGI 2.0 that would drastically simplify both applications and middleware, but at the cost of eliminating the write() callable for unbuffered streaming.  My idea for 2.0 would allow WSGI 1.0 apps to be called from a 2.0 server (using 2.x middleware, of course), but write() operations could only be streamed if the middleware used greenlets or threads or some other async mechanism.

Here’s what a WSGI 2.0 application could look like:

def simple_app(environ):
    """Simplest possible WSGI 2.0 application object"""
    return (
        '200 OK',
        [('Content-type','text/plain')],
        ['Hello world!\n']
    )

I’ll leave it as an exercise for the reader to figure out how much simpler everything (including libraries, applications, and middleware) can be under this approach, at the expense of driving write()-driven streaming applications into second-class citizens that have to be run in another thread… which they already have to be under servers like Twisted, anyway.

Join the discussion
8 comments
  • Yes, getting rid of start_response as a separate entity would make it way way easier to write middleware. Input-filtering middleware isn’t very hard as is (e.g., authentication middleware, which should be done as middleware). Output filtering is harder — though WSGIFilter makes it fairly simple. The other problem is adding more handlers for the end of the request (app_iter.close()) — it’s tricky to do in a generic fashion, and can adversely effect performance (because you have to twiddle around with iterators that might lead to covering up their true nature, when all you want to do is add another function to close). A list of close functions would be much easier to handle.

    I haven’t brought up any of these things because I’m personally very reluctant to see WSGI change at this time — while it certainly has some difficult points to get around, it’s also an extremely stable base to build on. I’m not sure if the simplification is worth it.

  • I think you make a good point although one thing I have noticed regarding writing applications as middleware is that it suggests a programming style, which seems to help with reuse. If each part of the application is middleware and there is coupling between middleware, I agree it really isn’t middleware anymore. But, if I am writing many similar applications, placing different parts as middleware makes it possible for me to create my own (albeit coupled) system for writing apps.

  • “””A list of close functions would be much easier to handle.”””

    WSGI 2 would do away with close functions, in favor of normal GC semantics. This is sufficient for the vast majority of applications, and a WSGI< ->WSGI2 converter can easily be implemented on this basis using __del__ (or try/finally in Python 2.5).

    I don’t foresee a WSGI 2 *replacing* WSGI 1, but rather supplementing it. There are a lot of weak spots and holes in the existing spec that need fixing anyway; if there’s going to be a new version, we might as well simplify things, and provide a backward-compatibility route in the process. WSGI 1 and WSGI 2 should be lossless-ly interchangeable, modulo the issues with write().

  • I suppose if app_iter isn’t allowed to, like, do stuff, then you can generally close stuff with try/finally or just generally when the application returns (e.g., write out any session data). But if the person writing the app_iter expects to have context similar to what the original function call was like, this won’t work.

    I’d be fine with greater restrictions on generator app_iters, as generally the request should be mostly figured out by that time, and so putting any state you need into the app_iter object itself is fine. I might almost be inclined to clear the environ before iterating over app_iter to make sure it doesn’t try to get at anything it shouldn’t (or at least doing that in wsgiref.validate).

    There are also some problems with people sending response data through the environ (basically by putting callbacks in the environ). Sometimes maybe that makes sense, but not usually. Some better way of extending the response would be a useful addition to WSGI, as there’s no room for it currently (except custom headers, limited to string values and with issues about filtering those out before they go to the client, and with extra attributes on app_iter or other custom app_iter types).

  • “””if the person writing the app_iter expects to have context similar to what the original function call was like, this won’t work.”””

    Actually, there’s no difference from the existing WSGI spec here. Context is context. All you need to implement a close() method is to store a weakref to the iterator in some global location with a callback to the original item’s close(). This can also be trivially implmented using an iterator object whose __del__ calls the original close(), if any. (Which is what a WSGI2->WSGI1 wrapper would do.)

    “””I might almost be inclined to clear the environ before iterating over app_iter to make sure it doesn’t try to get at anything it shouldn’t (or at least doing that in wsgiref.validate).”””

    Uh, wha? You appear to be misunderstanding the spec. The environ passed to an application belongs to the application, not the server. The application is free to do as it pleases with it, and the server shouldn’t mess with it thereafter. (The only reason this latter idea isn’t in the spec is because it never occurred to me anybody would be crazy enough to DO that.)

    “””There are also some problems with people sending response data through the environ (basically by putting callbacks in the environ).”””

    Well, any time you don’t comply with the spec, you invite problems. This is as explicitly forbidden in the spec as I knew how to make it. i.e., the spec says clearly that if you offer alternate ways of sending data, it’s your responsibility to ensure that you don’t break the WSGI-compliant ways, including ways that involve arbitrary middleware between you and the “real” application.

    “””Some better way of extending the response would be a useful addition to WSGI, as there’s no room for it currently”””

    Intentionally so, and IMO it should get *more* so, not less so.

    “””(except custom headers, limited to string values and with issues about filtering those out before they go to the client, and with extra attributes on app_iter or other custom app_iter types).”””

    This is exactly the kind of nonsense I ranted against in this article. All you need is library functions to operate on the *actual* response — you only think you need to communicate with the middleware in-band like this, because you’re thinking that components that need this are “middleware”. When they’re what Ben’s calling “MFC”s (Middleware Framework Components), the right thing to do is to explicitly communicate with them *out-of-band*, via library calls, decorators, etc.

    The motivation for WSGI 2 is to make it much easier to have “library middleware” or “MFC”s that you apply from an application, rather than everybody get stuck in the ass-backwards thinking that APIs should be tunnelled through WSGI, when they should be sent out-of-band.

    It’s like people trying to do everything over HTTP instead of defining specialized protocols. Except in this case, MFC’s are *inside* the WSGI firewall; there’s no need to put them outside and then tunnel to them. And thinking that way just creates the illusion that WSGI needs to do more than just HTTP.

  • What are your thoughts on the case of using middleware as endpoints in a Rails stack to increase the response time? The Rails stack can add a lot of overhead and if my API only needs to go as deep as a database hit and converting a collection to JSON I want that to be as performant as possible.

dirtSimple.org

Menu

Stay In Touch

Follow our feeds or subscribe to get new articles by email on these topics:

  • RSS
  • RSS
  • RSS

 

Get Unstuck, FAST

Cover photo of "A Minute To Unlimit You" by PJ Eby
Skip to toolbar