Sunday, September 04, 2005

Made to SCALE

Last week, I started to describe SCALE, the Syntax for Configuration And Language Extension. I haven't had enough time yet to actually implement it, but this evening I was able to whip up an implementation of its low-level parser. As it turns out, the low-level parser is surprisingly useful on its own. For instance, the current parser documentation includes an example of a Python source code reformatter, that gets rid of leading tabs and changes all indentation to a uniform depth. The example reformatter is just seven lines long, but it knows the difference between lines that begin statements, and those that continue statements, and it's smart about multi-line strings as well.

Basically, the low-level parser for SCALE understands "Python-like" syntax, in that it can handle any language that uses Python's rules for string quotation, comments, line continuations, indented blocks, identifiers, and numeric constants. (Oh, and PEP 263-style source encoding declarations.) This means, for example, that you can build higher-level languages that use Python to express parts of their semantics. If you wanted to implement a parser generator that took a grammar file with embedded blocks of Python in it, you'd only need to walk the parsed block to extract the grammar rules, then use the dsl.detokenize() function from SCALE to turn the nested Python blocks back into strings - at whatever indentation level you need in your generated output.

That wasn't so much what I originally intended. I first set out to just make a simple configuration language based on Python expressions: SCALE itself. However, in order to implement SCALE, I found that it was easier to test the low-level parsing facilities if they were kept fairly independent of the language I was planning to build on them. And instead of taking a SAX-like approach, I ended up creating simple abstract syntax trees instead. One advantage to this is that as long as the overall lexical structure is correct, you can actually do lazy parsing of statements if you want to. (Because the low-level parser identifies statements and blocks without needing to "understand" any of the contents.)

I still need to implement a few more low-level utilities like eval_stmt() and exec_block() before I implement the SCALE parser. I also need to finalize my ideas for what the default semantics of a SCALE block should be, based on the WSGI use cases. Most likely this will involve resuming the earlier Web-SIG discussion about deployment configuration formats, in particular the debate about positional vs. keyword arguments, whether pipelines are allowed by default, etc.

In the meantime, though, the low-level parser is available via Subversion for experimenters to play with.

Update as of 2005-09-04: I've now also implemented parsing of SCALE declaration statements, so you can also experiment with creating and parsing SCALE-based configuration formats, although the semantic support (i.e. eval/exec, scope lookup etc.) is still nonexistent. In other words, you can parse SCALE files and determine their overall syntactical validity, but there's nothing there to let you actually execute a SCALE file yet. I probably won't get around to that until next weekend, though in the meantime I may experiment with creating a SCALE dialect for defining Chandler UI components.