Thursday, December 02, 2004

Python Is Not Java

I was recently looking at the source of a wxPython-based GUI application, about 45.5KLOC in size, not counting the libraries used (e.g. Twisted). The code was written by Java developers who are relatively new to Python, and it suffers from some performance issues (like a 30-second startup time). In examining the code, I found that they had done lots of things that make sense in Java, but which suck terribly in Python. Not because "Python is slower than Java", but because there are easier ways to accomplish the same goals in Python, that wouldn't even be possible in Java.

So, the sad thing is that these poor folks worked much, much harder than they needed to, in order to produce much more code than they needed to write, that then performs much more slowly than the equivalent idiomatic Python would. Some examples:
  • A static method in Java does not translate to a Python classmethod. Oh sure, it results in more or less the same effect, but the goal of a classmethod is actually to do something that's usually not even possible in Java (like inheriting a non-default constructor). The idiomatic translation of a Java static method is usually a module-level function, not a classmethod or staticmethod. (And static final fields should translate to module-level constants.)

    This isn't much of a performance issue, but a Python programmer who has to work with Java-idiom code like this will be rather irritated by typing Foo.Foo.someMethod when it should just be Foo.someFunction. But do note that calling a classmethod involves an additional memory allocation that calling a staticmethod or function does not.

    Oh, and all those Foo.Bar.Baz attribute chains don't come for free, either. In Java, those dotted names are looked up by the compiler, so at runtime it really doesn't matter how many of them you have. In Python, the lookups occur at runtime, so each dot counts. (Remember that in Python, "Flat is better than nested", although it's more related to "Readability counts" and "Simple is better than complex," than to being about performance.)

  • Got a switch statement? The Python translation is a hash table, not a bunch of if-then statments. Got a bunch of if-then's that wouldn't be a switch statement in Java because strings are involved? It's still a hash table. The CPython dictionary implementation uses one of the most highly-tuned hashtable implementations in the known universe. No code that you write yourself is going to work better, unless you're the genetically-enhanced love child of Guido, Tim Peters, and Raymond Hettinger.

  • XML is not the answer. It is not even the question. To paraphrase Jamie Zawinski on regular expressions, "Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems."

    This is a different situation than in Java, because compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain. In Python, XML is something you use for interoperability, not your core functionality, because you simply don't need it for that. In Java, XML can be your savior because it lets you implement domain-specific languages and increase the flexibility of your application "without coding". In Java, avoiding coding is an advantage because coding means recompiling. But in Python, more often than not, code is easier to write than XML. And Python can process code much, much faster than your code can process XML. (Not only that, but you have to write the XML processing code, whereas Python itself is already written for you.)

    If you are a Java programmer, do not trust your instincts regarding whether you should use XML as part of your core application in Python. If you're not implementing an existing XML standard for interoperability reasons, creating some kind of import/export format, or creating some kind of XML editor or processing tool, then Just Don't Do It. At all. Ever. Not even just this once. Don't even think about it. Drop that schema and put your hands in the air, now! If your application or platform will be used by Python developers, they will only thank you for not adding the burden of using XML to their workload.

    (The only exception to this is if your target audience really really needs XML for some strange reason. Like, they refuse to learn Python and will only pay you if you use XML, or if you plan to give them a nice GUI for editing the XML, and the GUI in question is something that somebody else wrote for editing XML and you get to use it for free. There are also other, very rare, architectural reasons to need XML. Trust me, they don't apply to your app. If in doubt, explain your use case for XML to an experienced Python developer. Or, if you have a thick skin and don't mind being laughed at, try explaining to a Lisp programmer why your application needs XML!)

  • Getters and setters are evil. Evil, evil, I say! Python objects are not Java beans. Do not write getters and setters. This is what the 'property' built-in is for. And do not take that to mean that you should write getters and setters, and then wrap them in 'property'. That means that until you prove that you need anything more than a simple attribute access, don't write getters and setters. They are a waste of CPU time, but more important, they are a waste of programmer time. Not just for the people writing the code and tests, but for the people who have to read and understand them as well.

    In Java, you have to use getters and setters because using public fields gives you no opportunity to go back and change your mind later to using getters and setters. So in Java, you might as well get the chore out of the way up front. In Python, this is silly, because you can start with a normal attribute and change your mind at any time, without affecting any clients of the class. So, don't write getters and setters.

  • Code duplication is quite often a necessary evil in Java, where you must often write the same method over and over with minor variations (usually because of static typing constraints). It is not necessary or desirable to do this in Python (except in certain rare cases of inlining a few performance-critical functions). If you find yourself writing the same function over and over again with minor variations, it's time to learn about closures. They're really not that scary.

    Here's what you do. You write a function that contains a function. The inner function is a template for the functions that you're writing over and over again, but with variables in it for all the things that vary from one case of the function to the next. The outer function takes parameters that have the same names as those variables, and returns the inner function. Then, every place where you'd otherwise be writing yet another function, simply call the outer function, and assign the return value to the name you want the "duplicated" function to appear. Now, if you need to change how the pattern works, you only have to change it in one place: the template.

    In the application/platform I looked at, just one highly trivial application of this technique could have cut out hundreds of lines of deadweight code. Actually, since the particular boilerplate has to be used by developers developing plugins for the platform, it will save many, many more hundreds of lines of third-party developer code, while simplifying what those developers have to learn.
This is only the tip of the iceberg for Java->Python mindset migration, and about all I can get into right now without delving into an application's specifics. Essentially, if you've been using Java for a while and are new to Python, do not trust your instincts. Your instincts are tuned to Java, not Python. Take a step back, and above all, stop writing so much code.

To do this, become more demanding of Python. Pretend that Python is a magic wand that will miraculously do whatever you want without you needing to lifting a finger. Ask, "how does Python already solve my problem?" and "What Python language feature most resembles my problem?" You will be absolutely astonished at how often it happens that thing you need is already there in some form. In fact, this phenomenon is so common, even among experienced Python programmers, that the Python community has a name for it. We call it "Guido's time machine", because sometimes it seems as though that's the only way he could've known what we needed, before we knew it ourselves.

So, if you don't feel like you're at least ten times more productive with Python than Java, chances are good that you've been forgetting to use the time machine! (And if you miss your Java IDE, consider the possibility that it's because your Python program is much more complex than it needs to be.)

The comment thread is now closed; it was beginning to degenerate into offtopic Python versus Java arguments.

<< Nov 26: We've Got Options
>> Dec 06: Mocking wxPython

^^ Home