Last week, I started to describe SCALE, the Syntax for Configuration And Language Extension. I haven’t had enough time yet to actually implement it, but this evening I was able to whip up an implementation of its low-level parser. As it turns out, the low-level parser is surprisingly useful on its own. For instance, the current parser documentation includes an example of a Python source code reformatter, that gets rid of leading tabs and changes all indentation to a uniform depth. The example reformatter is just seven lines long, but it knows the difference between lines that begin statements, and those that continue statements, and it’s smart about multi-line strings as well.
Basically, the low-level parser for SCALE understands “Python-like” syntax, in that it can handle any language that uses Python’s rules for string quotation, comments, line continuations, indented blocks, identifiers, and numeric constants. (Oh, and PEP 263-style source encoding declarations.) This means, for example, that you can build higher-level languages that use Python to express parts of their semantics. If you wanted to implement a parser generator that took a grammar file with embedded blocks of Python in it, you’d only need to walk the parsed block to extract the grammar rules, then use the dsl.detokenize() function from SCALE to turn the nested Python blocks back into strings – at whatever indentation level you need in your generated output.
That wasn’t so much what I originally intended. I first set out to just make a simple configuration language based on Python expressions: SCALE itself. However, in order to implement SCALE, I found that it was easier to test the low-level parsing facilities if they were kept fairly independent of the language I was planning to build on them. And instead of taking a SAX-like approach, I ended up creating simple abstract syntax trees instead. One advantage to this is that as long as the overall lexical structure is correct, you can actually do lazy parsing of statements if you want to. (Because the low-level parser identifies statements and blocks without needing to “understand” any of the contents.)
I still need to implement a few more low-level utilities like eval_stmt() and exec_block() before I implement the SCALE parser. I also need to finalize my ideas for what the default semantics of a SCALE block should be, based on the WSGI use cases. Most likely this will involve resuming the earlier Web-SIG discussion about deployment configuration formats, in particular the debate about positional vs. keyword arguments, whether pipelines are allowed by default, etc.
In the meantime, though, the low-level parser is available via Subversion for experimenters to play with.
Update as of 2005-09-04: I’ve now also implemented parsing of SCALE declaration statements, so you can also experiment with creating and parsing SCALE-based configuration formats, although the semantic support (i.e. eval/exec, scope lookup etc.) is still nonexistent. In other words, you can parse SCALE files and determine their overall syntactical validity, but there’s nothing there to let you actually execute a SCALE file yet. I probably won’t get around to that until next weekend, though in the meantime I may experiment with creating a SCALE dialect for defining Chandler UI components.
I’d be interested in seeing some examples written in SCALE to see where it’s better than plain Python for doing configuration.
I have a hunch that the SCALE syntax will eliminate extra punctuation and keywords that are required for doing Python-based configuration, but I’m not sure if my hunch is correct. I’m also guessing that my hunch is missing some of the cool things that are possible.
The key differences are:
1. No statements, functions, classes, etc. are allowed.
2. Arbitrary computation or namespace manipulation can be done by the software that’s using the configuration, so that e.g. certain software-defined names are contextually available, without the configuration user having to import things. Which means the configuration is arbitrarily extensible, but also that you can have implicit variables and reduce explicit coupling in the configuration. For example, you could have an expression that sets some parameter like font size, and then have a block underneath that expression that then “inherits” or “acquires” that configuration.
3. Arbitrary hierarchies can be specified in the configuration, and the parent/child relationship data can thus be used, without having to use clumsier methods to establish hierarchy like nested function calls, class statements, begin/end calls, etc. You simply construct the hierarchy with ‘:’ and indentation.
4. Expressions can have a context that’s an egg specifier, a reference to another file, a module, or a code block. This makes it really useful for complex configuration tasks and ones that may involve plugins to extensible applications or frameworks.
SCALE’s main competition in the Python space right now are hierarchical formats like ZCML, ZConfig, and nested .ini formats like ConfigObject. It also compares well against clumsy .ini kludges like the logging.config .ini format .
It’s not, however, intended to be a replacement for configuration that consists of a dozen or so name-value pairs — although it also does that job just fine as long as you put quotes around your strings! The only advantages it offers in that space is unicode and encoding support, which of course you’d also get with pure Python.
Is SCALE hierarchical (tree-like) only ?
I remember to have red about semilattice in your blog (http://dirtsimple.org/2005/02/making-it-from-scratch-with-tdd-and.html)
Do you plan to have a standard way to support such structures.
SCALE allows nodes to be named, and they can then be referred to in other expressions. Depending on the application, this can be done sequentially or lazily, and the object that is bound to the name may not precisely be the same as the original expression. But yes, it definitely allows sharing and overlap; if you look at the recent “deployment configuration” discussions on the Web-SIG you’ll see some examples of the idea.
Hello Philip,
as much as I often find your concepts bright and spot-on, this one I simply
don’t grok.
You invent new syntax here which A) is very close to Python (and thus can rely on parts of the standard Python machinery) but B) isn’t Python proper.
A) makes your concept hard to buy for non-Python folks (in contrast to, say, JSON or YAML – see http://www.pault.com/pault/pxml/xmlalternatives.html) while B) is a no-no for us hardcore Pythonistas.
You probably have very good reasons to pursue this approach inspite of these built-in evolutionary disadvantages – please tell us.
– TE
“””hard to buy for non-Python folks (in contrast to, say, JSON or YAML”””
I’m not trying to sell those people on it. It’s a Python format for Python programs to create domain-specific languages with, using less verbose syntax than would be possible with procedural Python, even if you used the new “with” statement in Python 2.5. “with” is probably the closest competitor to SCALE that you can get in procedural Python, but it doesn’t let you do lazy (out-of-order) evaluation.