Sunday, September 18, 2005

The New Kid on the Templating Block

When Kevin Dangoor sent me the TurboGears announcement yesterday, and I went to check the site out, the first thing that impressed me was the page on the Kid template language. I just had to click over to the main Kid site to check it out.

The truth is, I didn't really want to be impressed, because I'd written another XML template language that's very sophisticated and extensible and all that, and I was not-so-unconsciously looking for some defect in Kid to pounce on. But there really weren't very many, if any. In fact, as far as I can tell at this point, Kid appears to be (roughly) power-equivalent with my own PWT (peak.web.templates) language, but much easier to understand, use, and extend. It's more Pythonic, too.

I'd previously been under the impression that Kid was just a TAL offshoot mixed with embedding Python code instead of TALES. And maybe that was true at one time, but it sure isn't now. Ryan Tomayko has actually found a way to do constrained embedding of Python in processing instructions, such that it can't devolve into PHP-style slop. And, other constrained embeddings in attributes, text, and TAL-style control attributes complete the mix. I have to admit, I'm really impressed with the basic design of the language, and very tempted to ditch PWT in favor of either Kid or something that rips off its design. :)

See, it turns out that in PWT I seem to have ended up creating another one of those XML-based domain-specific language boat anchors. I tried to make a language that was simple and very peak.web-centric, but over time it accreted more and more functionality that probably could've been avoided if I'd used a more Python-based approach. Although PWT integrates security access checks to attributes, and wrapping objects in dynamically-selected views, these would probably be straightforward things to add to Kid. Indeed, since most of the viewing and security features of PWT are implemented in peak.web's viewing context objects anyway, in Kid it would just be a question of using those features, perhaps with a little API cleanup.

At this point I haven't actually used Kid for anything, nor read enough of its source code to really understand its execution model. I gather that its templates are compiled to Python code for speed, which is certainly nice, but I suspect it may then lose some speed compared to PWT by generating DOM events rather than outputting strings. (PWT gloms adjacent non-dynamic XML elements and text snippets into a single string constant for output.) It's not clear to me yet when match templates get applied - is it at compile time or execution time? I guess it must be execution, because there's dynamic manipulation of the DOM tree.

And I'm not sure how I'd use Kid with peak.web's resource system, which is designed to allow dynamic skinning via components supplied in eggs. That is, it should be possible to apply a layout chosen at runtime, whereas Kid appears to only allow choosing a layout at compile time. Although I suppose you could implement the dynamicness by having the Kid layout look up and invoke a function, so that's not so bad.

I do see why Ryan considers the extends/match functionality rough, though. I'd personally prefer it to take a more PWT-ish approach. In PWT, you apply DOMlets by tagging child elements as parameters, and then the invoked DOMlet can call those elements like functions. This is less powerful than Kid's match construct, in that the DOMlet doesn't get to manipulate the DOM structure of parameters - they're opaque. On the other hand, it means the template author retains control of their template, and the system can be simpler and do streaming output instead of manipulating sub-DOMs. I can envision a variant of Kid that's generator based, yielding Unicode snippets instead of generating SAX events.

Yeah, I think I would personally prefer to ditch the extends/match pair. I'm not sure if Kid allows you to use "def" inside a "replace", but if it did, and it evaluated the "replace" expression after the nested def's were executed, you would then have a rough equivalent to DOMlets, although certain PWT features might be more awkward to express. (For example, PWT's "content:is" would require both a "py:def" and a "py:strip" in Kid.) Anyway, you would just do something like py:replace="domlet(arg1,arg2,...)" and use py:def="arg1" etc. on the nested elements. Or perhaps there could be a py:apply="domlet" and it would be called with keyword arguments for the nested def's automatically.

Or maybe not. Thinking about how I'd implement DOMlets in Kid makes me realize just how awkward they are compared to Kid's match facility. I think perhaps I've been blinded by premature optimization, because in my experience Python DOMs are slow as molasses compared to streaming text. The real difference between Kid's extend/match and DOMlets is that DOMlets are designed to allow the XML content they manipulate to be completely opaque, so as to avoid any DOM manipulation at runtime. This means that the PWT language has to have a predefined way ("content:is" and "this:is" attributes) to identify chunks of XML to be manipulated, so that all the matching happens at compile time.

Amusingly, PWT is actually more powerful in principle here, because an outer DOMlet can pass parameters to inner DOMlets at runtime, whereas Kid is limited to postprocessing DOM segments that are already generated. Outer DOMlets also can control the parsing of their contents, allowing new mini-languages to be created. Kid can do this too, but apparently only by "interpreting" sub-DOMs, not by "compiling" them. So, PWT is actually an extensible language, whereas Kid is not. At first this might seem like an advantage to PWT, but in fact it's only a (potential) performance advantage, because Kid allows you to create new interpreted mini-languages as an alternative to extending Kid itself.

Finally, I think Kid is somewhat more verbose than PWT, in that it sometimes requires multiple constructs where PWT only needs one, or explicit parameter passing that PWT does not. On the other hand, this isn't necessarily a disadvantage either. Explicit parameter passing seems to make Kid templates easier to follow, while PWT's implicit "current object" (stolen from Twisted's "Woven") has to be mentally tracked as you trace through subtemplates.

So as you can see, as I traced through my comparison, I found that each thing I first thought to be a disadvantage for Kid, turned out to be not so much of a disadvantage. I have the general impression that anything PWT can do, there's probably a way to do it just as cleanly -- maybe more so -- with Kid.

Indeed, the only real limitation I've found so far is in error handling. PWT's "content:uses" and "this:uses" attributes allow you to make an element optional, if a certain value is "not found" or "not allowed", by trapping those errors during the evaluation of the path expression. ZPT's TALES expression language lets you do something similar, in a more verbose way. Kid doesn't appear to have a comparable facility at all, unless you emulate it by creating an interpreted mini-language. This is one of the few places where it seems like it might really be worse to have distinct mini-languages rather than a single extensible language.

However, the big picture is pretty clear. Just as I've previously ranted about Java programmers bringing their past perspectives to Python, I too have been bringing outdated coping strategies to my web templating design. My experience was colored by things like having developed applications that needed to process thousands of variable insertions and conditionals in a single page, and do it lots of times per second. (It was also colored by how shockingly slow the earliest versions of DTML were at doing them.)

But the applications that had those requirements were written to support "version 4" browsers (IE 4 and NS 4), and would most likely be done today with JSON and AJAX if the target audience's browsers allowed it. Which means that the slow parts of the rendering would have happened client-side and not been a performance issue for the templating system anyway. Other kinds of complex pages -- like portals with lots of portlets -- could probably be managed by caching key DOM parts.

So, starting with the assumption that a templating system's performance needs to be measured in thousands of inserts per second on commodity hardware may not be the best way to get a really usable templating system. Proving once again, I suppose, that premature optimization is the root of all evil.

At the same time, I'm quite curious about what kind of performance Kid would actually get with heavy use of match templates, and how it would compare to using XSLT or DOMlets. (Of course, if you can get away with using browser-side XSLT, I'm sure there's no comparison in terms of server load.) But I'm definitely intrigued enough to want to take a closer look at Kid as a possible replacement for PWT, especially in conjunction with "dynamically-scoped" or "contextual" variables. (A new idea for Python that I'm working on, and which I'll post more about later. The basic idea is somewhat like Lisp dynamic variables, but with the ability for pseudothreading systems like Stackless, Twisted and peak.events to easily switch the set of all active variables when context-switching.)

Saturday, September 17, 2005

What's a megaframework?

Okay, so TurboGears is cool. Not because it's another Python web framework, but because it's a... megaframework? Okay, so it's not a framework, but what the heck is a megaframework?

Not that I have a better name for it, mind you. It is definitely a new thing in the Python world - a compelling project built almost entirely from separately-packaged Python components, powered by setuptools and EasyInstall.

It seems to me that I once blogged about how most of a thing's value comes from its incidentals - creature comforts like a good installation process. Or maybe I didn't, but it's true anyway, and it's a big reason that I've put so much effort into setuptools and especially EasyInstall. TurboGears might not be the web framework to end all Python web frameworks, but it sure as heck doesn't reinvent any wheels! While there were other efforts out there to pull together some of the very same tools to make a Rails-killer, Kevin's version of the approach has installation instructions that basically amount to:
  1. Install Python and EasyInstall
  2. Tell EasyInstall to install TurboGears
And that's a Really Cool Thing.

Of course, as I've mentioned here previously, Django also uses setuptools, and so does Paste. It seems it's becoming de rigeur for new Python web frameworks, and rather rightly so. Even Python web applications like the increasingly ubiquitous Trac are using it. I find it rather amusing that there are actually more people using setuptools and WSGI to implement web frameworks, than there are people implementing applications with all the web frameworks I've ever written! Of course, I also feel like a proud papa. Neither of my kids (WSGI and setuptools) are flashy or outspoken, and as long as they do their jobs, you hardly notice that they're there. But things go better when they're around, and that's a nice feeling to have.

Update as of September 18th: setuptools 0.6a2 is now out; it adds support for automatically generating platform-appropriate console scripts (or .exe wrappers on Windows) from your "main" functions, which can be in any module you like. Just list your main functions in setup.py and it'll do the rest. No more script files to create, no more filename munging to put .py extensions on Windows scripts while trying to leave them off for other OSes. Future versions will support a similar mechanism for GUI scripts (similar to .pyw on Windows).

Wednesday, September 14, 2005

Setuptools 0.6a1 has been released

See the announcement and documentation links, or visit the PyPI setuptools page.

Sunday, September 04, 2005

Made to SCALE

Last week, I started to describe SCALE, the Syntax for Configuration And Language Extension. I haven't had enough time yet to actually implement it, but this evening I was able to whip up an implementation of its low-level parser. As it turns out, the low-level parser is surprisingly useful on its own. For instance, the current parser documentation includes an example of a Python source code reformatter, that gets rid of leading tabs and changes all indentation to a uniform depth. The example reformatter is just seven lines long, but it knows the difference between lines that begin statements, and those that continue statements, and it's smart about multi-line strings as well.

Basically, the low-level parser for SCALE understands "Python-like" syntax, in that it can handle any language that uses Python's rules for string quotation, comments, line continuations, indented blocks, identifiers, and numeric constants. (Oh, and PEP 263-style source encoding declarations.) This means, for example, that you can build higher-level languages that use Python to express parts of their semantics. If you wanted to implement a parser generator that took a grammar file with embedded blocks of Python in it, you'd only need to walk the parsed block to extract the grammar rules, then use the dsl.detokenize() function from SCALE to turn the nested Python blocks back into strings - at whatever indentation level you need in your generated output.

That wasn't so much what I originally intended. I first set out to just make a simple configuration language based on Python expressions: SCALE itself. However, in order to implement SCALE, I found that it was easier to test the low-level parsing facilities if they were kept fairly independent of the language I was planning to build on them. And instead of taking a SAX-like approach, I ended up creating simple abstract syntax trees instead. One advantage to this is that as long as the overall lexical structure is correct, you can actually do lazy parsing of statements if you want to. (Because the low-level parser identifies statements and blocks without needing to "understand" any of the contents.)

I still need to implement a few more low-level utilities like eval_stmt() and exec_block() before I implement the SCALE parser. I also need to finalize my ideas for what the default semantics of a SCALE block should be, based on the WSGI use cases. Most likely this will involve resuming the earlier Web-SIG discussion about deployment configuration formats, in particular the debate about positional vs. keyword arguments, whether pipelines are allowed by default, etc.

In the meantime, though, the low-level parser is available via Subversion for experimenters to play with.

Update as of 2005-09-04: I've now also implemented parsing of SCALE declaration statements, so you can also experiment with creating and parsing SCALE-based configuration formats, although the semantic support (i.e. eval/exec, scope lookup etc.) is still nonexistent. In other words, you can parse SCALE files and determine their overall syntactical validity, but there's nothing there to let you actually execute a SCALE file yet. I probably won't get around to that until next weekend, though in the meantime I may experiment with creating a SCALE dialect for defining Chandler UI components.