Tuesday, August 02, 2011

WSGI, Web Frameworks, and Requests: Explicit or Implicit?

In Python web programming and frameworks, there is a constant juggling act that takes place between "explicit" and "implicit".  Too explicit, and the code may get too verbose or unwieldy.  Too implicit, and the code may lose clarity or maintainability.

And nowhere can this tension be more clearly seen, than in the area of "request" objects.  After all, nearly every web programming framework has some notion of a "request" at its core: usually some sort of object with an API.

Now, as you may recall from my previous article, the Web-SIG originally set out in 2003 to standardize a universal "request" API for Python, but I diverted this effort towards a different sort of request API -- the WSGI "environ" object.

Where web framework request APIs usually emphasize methods and properties, the WSGI "environ" object is just a big bag of data.  It doesn't have any operations or properties.

But the upside to this downside, is that the enviornment is extensible, in a way that a request object is not.  You can add whatever you want to it, and you can call functions on it to do things that a request object would do with methods.  (Yay, freedom!)

But the new downside to that upside, is that if you want to use library functions on the environ instead of framework "request" object methods, you now have to pass the environ back into the library functions!  (Boo, hiss.)

Binding To The Environment

So, WSGI-era web libraries (like WebOb and Werkzeug) tend to define their next-generation "request" objects as wrappers bound to the environ.  As Ian Bicking put it:

"Everything WebOb does is basically functions on the environ"

Of course, this isn't the only strategy for managing request information.  Some web app frameworks dodge the argument-passing issue by using thread locals, or worse yet, global variables.  But they're still trying to solve the same problem: connecting actions that a web application needs to perform, with some notion of the "current request".

And in both cases, a key driver for the API design is brevity and ease-of-use (implicit) vs. clarity and consistency (explicit).

On the explicit side, It's annoying to be constantly saying "foo = bar(environ, etc)", if only because it somehow looks less Pythonic than "foo = request.bar(etc)".

So in effect, what we want in our frameworks is a way to (implicitly) bind operations to the "request", so that it isn't necessary to explicitly spell out the connection in every line of code.  (Even if we're still explicitly referencing the request object.)

In fact, we don't even want to have to include boilerplate like 'request = Request(environ)' at the top of our apps' code, and so we'd much rather have this binding take place outside our code entirely.

Now, this is where things get really interesting!  In order to get rid of this boilerplate, web libraries and frameworks will usually do one of two things.  Either:

  1. They provide a decorator to change the calling signature while keeping external WSGI compliance (like WebOb), or
  2. They ditch WSGI entirely and use a different calling signature  (like Django)

And in either case, we're now more or less back where we started, pre-WSGI, as you are now writing code with a calling signature that's implicitly coupled to a specific library or framework.

Sure, you get certain benefits in exchange for making this commitment, and you're less tightly coupled to libraries using option 1.  But it's still a pretty exclusive commitment.  If you want to use code from more than one library, you're going to have to write the boilerplate for each of them, except for whichever one you choose to be your "primary" - the main one that calls you and/or decorates your code.

The Original Goal Of WSGI

Now, the original idea for WSGI (well, my original idea, anyway) was that by letting "request" objects wrap the environ, and using "functions on the environ", we could get out of this situation.  As I wrote in the original PEP 333 rationale section:

"If middleware can be both simple and robust, and WSGI is widely available in servers and frameworks, it allows for the possibility of an entirely new kind of Python web application framework: one consisting of loosely-coupled WSGI middleware components.

"Indeed, existing framework authors may even choose to refactor their frameworks' existing services to be provided in this way, becoming more like libraries used with WSGI, and less like monolithic frameworks. This would then allow application developers to choose "best-of-breed" components for specific functionality, rather than having to commit to all the pros and cons of a single framework."

But what I didn't understand then, was just how annoying it is to have to explicitly pass the environ into every library function you wanted to use!

(Actually, it's not just that it's annoying from a number-of-keystrokes point of view, it's also more foreign to a Python programmer's sensibilities.  We don't usually mind receiving an explicit "self", but for some reason, we seem to hate sending one!)

And that (in a somewhat roundabout way) is how I ended up adding the experimental "binding" protocol to WSGI Lite.

Specifically, what the binding protocol provides, is a way to generically bind things to the environ dictionary, and pass them into your application's calling signature, while retaining WSGI compliance for any code that calls your function.

In other words, the binding protocol is a way to make it so that you can use as many libraries, functions, or objects for your request as you want, without needing to pass an 'environ' parameter to them over and over.

Now, in the simplest case, you can just use the binding protocol as a generic way to obtain any given library's request objects.  You can say, "my 'request' parameter maps to a WebOb request", for example.

But the really interesting cases come about, when you stop thinking in terms of "request" objects, and start thinking about what your application reallly does.

The Meaning of "Lite"

For example, why not bind a session object to your function's 'session' argument?  Or maybe what you really want is to just receive an authenticated user object in your 'user' parameter, and a cart object in your 'cart' parameter, instead of first getting a session, just so you can get to the user and cart.

In other words, what if you made your application goals more explicit?

Now currently, getting access to such application-specific objects requires either painfully-verbose boilerplate off of a raw WSGI environment, or an increasingly tight coupling to an increasingly monolithic framework that does more of the work for you.

But, with the Lite binding protocol, you can now represent anything that's tied to "the current request", just by creating a callable object that takes an environment parameter.

Which means you don't really need "request" objects any more in your main code, because you can simply arrange to be called with whatever objects you need, to do the thing you're actually doing.

And so your application code stops being about manipulating "web stuff", to focus more on whatever it is that your app actually does...  while still being just a WSGI app from the point of view of its caller.

(This by the way, is part of why I dubbed the concept "WSGI Lite", despite the fact that it adds new protocols to WSGI: it effectively lets you take most of the "WSGI" out of "WSGI applications".)

The Great "Apps vs. Controllers" Debate

Now, if you look at how non-WSGI-centric, "full-stack" frameworks (like Django, TurboGears, etc.) operate, they often have things they call "controllers": functions with more specialized signatures for doing this kind of "more app, less web" kind of stuff.  However, these frameworks tend to end up being very un-WSGI internally, because plain WSGI doesn't handle this sort of thing very well.

However, with the WSGI Lite binding protocol, you can write controllers with whatever signature you like, while remaining "WSGI all the way down".  Anything you want as an argument, you can just create a binding rule for, which can be as simple as a string (to pull out an environ key) a short function that computes a value, or a tiny classmethod that returns an object wrapping the environ.

And, if it's a callable (like a function or a method), it too can use the binding protocol, and ask for its arguments to be calculated from the request.

And that means that you can take, say, a generic binding rule that fetches a parsed form, and use it to write an application-specific binding rule that looks up something in a database.

At which point, you can now write a controller that uses that binding rule to get something it needs as an argument.

Where All This Is Going

Now, if you look at where all this is going, you'll see that you're going to end up with a very small application body: just the code that actually does things with the information that came in, and decides what to send back out.

Something, in fact, that looks very much like a "controller" would in a non-WSGI, full-stack web framework...  yet isn't locked in to one particular full stack framework.

Now, I don't know how clear any of the above was without code examples.  (Probably not very.)  But the endgame that I'm trying to describe, is a future in which both "full stack" and "WSGI-centric" frameworks use a common protocol to provide their features to applications.

And, more importantly, a future where full-stack features do not require learning a full stack framework.

And where every application is its own framework.

In effect, the binding protocol is a tool that allows every app to define its own embedded DSL: the set of high-level data objects and operations that it needs in order to do whatever it does.

And these high-level, application-specific objects and operations are composed of lower-level, domain-generic objects and operations (such as form parsers and validators, URL parameter extractors, session and cookie managers, etc.), obtained from libraries or frameworks.

And all of these objects are passed around via the environment and binding rules, while retaining WSGI Lite calling signatures...  making the entire thing "WSGI all the way down".

And yet, the code contained in those applications would not look like "WSGI" as we know it today.  For example:

@lite(
    user = myapp.authorized_user,
    output = myapp.format_chooser,
)
def todo_list(environ, user, output):
    return output(user.todo_items())

Or, perhaps the Python 3 version would look like this:

@lite
def todo_list(
        environ,
        user:   myapp.authorized_user,
        output: myapp.format_chooser
    ):
    return output(user.todo_items())

Neither of these looks anything like "WSGI" code as we know it today - it's more like a full-stack framework's code. But, where the bindings in a full-stack framework are implicit (like automatically formatting the output with a template or turning it into JSON), all of the bindings here are explicit.

And not only is explicit better than implicit, but...

Readability Counts!

You can see right away, for example, that this app is using some sort of chooser to render the output in some request-determined format, and you can track down the relevant code, without having to first learn all of the implicit knowledge of a particular framework's construction.

And, the point of this app function is immediately obvious - it displays a user's todo list. (Something that would otherwise be hidden under a pile of web I/O code, if this were written to plain WSGI or with a WSGI-centric library or framework.)

And what this means is, if this approach becomes a focal point for Python web development, then being a Python web programmer would not be a matter of being a "Django developer" or "TurboGears developer" or "Pyramid Developer" or any other sort of developer...

Other than a Python developer.

Because any Python developer could pick this up, without having to have all the implicit, framework-specific knowledge already in their head.

And hopefully, this will help get us to a situation where, instead of people saying, "you should use Python for your web app because framework X is great"...

People will say, "you should use Python for your web app because it lets you focus on what your application is really doing, and no matter what libraries you use, your code will be readable and maintainable, even by people who haven't used those libraries."

Or maybe just, "you should use Python for your web app because it's a great language for web development!"

Plumbing The Pipe Dream

Now, is all that just a pipe dream?

Maybe so. After all, there are still a lot of hurdles between here and there!

(For starters, I think that the actual binding protocol probably still needs some work!)

But if you want to make a "pipe" dream real, you've got to start with the requirements for the plumbing.

So right now, I'm collecting use cases from frameworks as I encounter them, to see what services the popular frameworks provide, and how they could be expressed as bindings.

But I'm also really interested in the problems that such frameworks have, in terms of how they currently communicate state, configuration, and other information to user code. Are there any open issues the binding protocol could solve now, or could solve with some additions?

Because that's what's really going to make the difference to adoption here. The authors of established libraries and frameworks aren't going to change things just beacuse I said this is a neat idea!

But if we can make the protocol solve some existing problems -- like helping to get rid of thread-local objects, for example -- then folks have another reason to get on board with a common protocol, besides it being a common protocol.

So, that's the interesting question that lies ahead:

Do you have any warts in your current app, library, or framework that this might help you solve? Or a feature you think it could help you add?

Leave me a comment here, or drop me an email via the Web-SIG!