dirtSimple.orgwhat stands in the way, becomes the way

The Perception of Speed

Glyph Lefkowitz (of Twisted fame) hits the nail precisely on the head, with this incisive summary of why C programmers think Python is slow. After Guido asked the python-dev list for ideas on combatting this common perception, I suggested that adding a compiler – however primitive – would help. Glyph, however, went one better; he effectively pointed out that all we really need to do is stop saying, “rewrite speed-sensitive code in C”. Instead, we should be saying “add type declarations to speed-sensitive code, and compile it with Pyrex.” As a practical matter, Pyrex is C with Python-like syntax, anyway, at least if you type-declare everything to death.

As for me, I totally don’t get the idea that Python is slow. It still boggles my mind every time I run ‘timeit’ on some operation and it comes back as taking half a microsecond on my desktop PC. My first computer couldn’t do anything in half a microsecond, even if it was a single assembly instruction! And then I programmed in interpreted Basic for the better part of a decade, using assembly for speed-sensitive parts.

“Sheesh. Why these kids today, they don’t know what it’s like. Back in my day, we had to program five miles uphill every morning through the snow to get to school… and don’t get me started about unit tests. Our unit tests were, we read the entire program and pretended to be the computer because we couldn’t afford a computer to type the program into. And we liked it!”

[Feel free to continue this rant in the comments, by contributing stereotypical rambling-old-man jokes about how things were so bad in the old days and today’s programmers are spoiled brats by comparison.]

Join the discussion
  • I think Glyph is right when it comes to the perspective of C programmers. At the same time, to non-C programmers we shouldn’t be talking much about compilation at all. For most performance issues (like 95% or more) the answer isn’t rewriting slow portions in C, it’s rewriting slow portions in Python. For most programmers, the suggested steps should be:

    1. Try it; maybe performance isn’t the issue you expect it to be.
    2. If it is, run a profiler.
    3. Look for places to leverage Python’s built-in functions. Implement algorithms in terms of Python’s built-in data structures, like dictionaries and lists.
    4. If startup is slow, do things lazily.
    5. If it’s slow after startup, cache results.
    6. Find the (small) performance-critical portions, and use some of the Python-specific tricks, like avoiding global and method lookup.
    7. Try psyco. It can’t hurt, and requires about 20 seconds of effort, including reading the necessary docs.
    8. Then, if you’ve done steps 1-7, maybe try rewriting a portion in a faster compiled language. But you’ll probably achieve your goals faster by going through steps 2-7 a second time before you try this.

    When people ask for advice about performance, the first message is often “well, you can always rewrite portions in C if you need to”, and that’s just bad advice. We need to stop saying that.

    I’ll also reiterate that it would be great if someone wrote a book about Python performance. It’d be really good for Python, *and* I think it would sell well.

  • How much do you pay a programmer to write fast(est) code so she needs much more time, and how much do you pay for a bigger machine?

    Of course if bad algorithms are used (exponential ones) thinking of a better solution is mandatory, but these exponential algs. are slow in every language… .

    The argument “C is fast therefore I use it” is the last rear up of people thinking they are the elite because they can handle pointers better than compilers/interpreters.

  • Then you should write the book!

    I don’t think reasonable people care that Python is slow, and I think that “You can always (EASILY) rewrite portions in C” is a good answer, because it speaks to those who understand that most of a program is not time critical (and, today, Nothing need be written in a lower-level language than C).

    For those who don’t understand that fact, I don’t want them clogging up Python lists (or Blogs like this)!

    … Closing: Python, for what it is, is Not slow. Explaining to C programmers that Python “is not slow” is like telling asm programmers that C is not slow. Python, and C, are slow when compared to lower-level languages; you should only use them when the speed doesn’t really matter.

  • PJE – Thanks for the support :). I am glad that we are in agreement about this, since it’s an issue that I am frequently confronted with – I hope it is a prelude to a consensus emerging in the community.

    Ian – You’re right, of course, completely, for about 80% of all code. However, there is a sizeable minority where the application domain (and hence the bottlenecks) are well-known in advance, and there are envelope-pushing things that one wants to do with the CPU. For example: scalable scene-graph representation and triangle culling for unbounded volume outdoor environments in massively multiplayer games – a problem I’ve never personally worked on, but I hear about all the time. All the optimization strategies that one would normally apply to Python are true, but they remain true even if the code is initially written in a fast, compiled language (which means C++) with an eye towards efficiency.

    For these applications, Python is totally off the map, even though the design problems are really hard and the architecture would really benefit from a Pythonic style.

    It’s true that many application programmers are just wasting a lot of time by implementing lots of interactively-executed, single-user, non-speed-critical code in C, without knowing whether it actually needs to be fast or not. However, I try to assume that whenver I am talking to a C/C++ programmer hell-bent on optimization and dismissive of Python as too slow, that they are working on a problem like the one I mentioned above.

    I think it’s important for the community to give a little more respect to those situations where the developer really knows, up front, that they are going to need an assload of performance. Not simply to cave in and say “well then, I guess you shouldn’t use Python”, but to focus on the wide periphery of the application that isn’t part of those speed-critical parts and explain how Python could be used to improve it.

    With a compiler like the one being suggested here, the *whole* application could be written in Python; and then the one or two loops that really need to be insanely fast *can* be insanely fast and won’t have to be written in another language with a whole separate tool infrastructure.

    There is something I wrote a long time ago which is relevant to this discussion: Extend, Don’t EmbedI’m currently trying to figure out how to update it to cover this new dimension.

  • Back when I was a Freshman engineering student we had to code in Fortran; I/O was via a Keypunch and Card-reader; after you submitted your job you went to the cafeteria to get a snack and came back after the Operator had put your output into your cubby hole. Then the debugging started (lather rinse repeat). Oh, and you dreaded the thought of dropping your deck of cards while walking from the Keypunch room to the Card-reader room.

  • First, I wanted to see if I could get a more experienced Python guy looking at pyAlarm and making a few pointers about it.

    Second, about speed. Python *is* pretty fast already, if you do it right. One thing I’ve noticed, in C or C++ it’s common to iterate through a list, because the performance hit isn’t great and it gets you moving into other parts of the program.

    In the pyExtractor, I parse an entire webpage into words and tally the tags each word appears in. Originally the data structure was a dict(). Then I needed to sort the dict() (heh heh). It was a complex data structure and the simple, common idiom didn’t work (tried it). So I rewrote the data structure as a class, implemented all the dict() stuff and also some of the list() stuff. So I wound up with a class that could sort. IN doing so, I made the same move I would’ve in C++, I used some for loops to iterate through the growing data structure. Took an entire cigarette to parse my homepage.

    So I thought “Hmm, maybe I should throw out this fake dict() and rewrite it with a container”. Then I threw in a lookup table as a last resort to avoid rewriting everything. The difference in speed is quite phenomenal. Don’t blink if you want to see the program working is all I can say. 😉

    I wrote the same stuff in C++ awhile back and had it most of the way there. I used STL. It actually performed more like the long cigarette version, but I used STL to manage the data. maybe I did it wrong.

    I wrote similar stuff in PHP. That was actually quick, but still not as quick.

    I suppose I took advantage of internal stuff, and even then there are probably still plenty of ways to speed it up. And this is string processing in a “slow” language.

    You know, back in my day we didn’t get to argue about whether or not you should optimize your code. We only had 64k of RAM to work with, and we only got that by not using the kernel and rewriting all the hardware access! Screw that, I’ll take Python over any of that.

  • I would be happy with something like pyrex in the python standardlib. I would also be happy with a compiler that produces executables that run happyily in presence of a python.dll or python.so.

    Arguing with evangelists of one aproach to programming, is like arguing with people to whom everything looks like a nail. Admittedly, both people from the C or python camp are same in this respect, but apart from the speed issue, it can’t really be argued that python isn’t the more vesataile hammer then C.

    Even if performance of python was all that bad ( which it isn’t ), in times when computing power is avaiable like fresh air, there’s so much more precious resources ( time, workers, nerves, security, bug-freeness ) to worry about.

    Python isn’t horribly slow. It’s only slow if you compare it on a per-instruction level, and I think some 30/40 years ago people stopped arguing that assembler is slower then doing the bytecode by hand…

    An interesting tale to tell would be the one of a friend of mine. He started a project, and did a lot of it in C++. He embedded python, and then extended it where he needed interfacing. The python part was rather simple and used for scripting like purpose. About half a year into the project he began cursing why he didn’t wrote it all in python and extended at need. Why you may ask? Simple, when you’ve more then plenty of computing power, but a deer shortage of human time, writing code with less efford gets a real issue.

  • I’ve had to use C++ not because python is slow but because python is slow to start. The script I used to have in python I called several times per day on the command line. It stalls for about 1-2 seconds and then runs it. If I run it again immediately after it takes about 0.1 seconds. Why? Ask Intel or Guido or Bjarne.
    I rewrote it in C++ and compiled and binary. Now it took 0.1 seconds every time I ran it. No starting up delay.

    My quess is that this is because of the python interpretor byte compiles to .pyc the first time or if it hasn’t done so in a long time. What can be done about this?

  • In my day, all we had was 0’s and 1’s, and the 1’s were so expensive that we had to work hard in the mines for a month just to afford three of them! And that was with the employee discount!

  • I like the idea, of saying write it in python/pyrex with type declarations… but… That still does not solve a lot of C programmers speed problems.

    One area where python keeps getting slower on every release is startup time. Startup time would be a great area to speed python up in.

    python makes around 580 system calls to print ‘hello’. Perl and bash use around 37-45 system calls.

    If anyone knows how to speed up python startup time, it would make an excellent tutorial!

    A real basic c program on my machine can finish its job in this much time:
    real 0m0.003s
    user 0m0.001s
    sys 0m0.001s

    Python takes this long to print ‘hello’.
    real 0m0.060s
    user 0m0.040s
    sys 0m0.011s

    That is 10-20 times slower!!!

    For programs that need to be run repeatedly, this all adds up.

    Python still kicks javas arse for startup time though 🙂

    Python is around middle speed for lots of things according to the language shootout pages. Which is about right in my experience. 10-15 times slower than the top performers. As everyone knows here, python is fast enough for a crap load of problems. Even php, as slow as it is, is fast enough for lots of stuff.

    Converting all the shootout problems from python into pyrex should not be too hard. Then you have some empirical evidence to show others.

    Maybe also say that they should not write their time critical bits in slow C. Instead write it with a run time assembler(pySoftwire). Dynamically generated assembly can be 2-10 times faster than C! Or write it with Cg for graphics/numerical code. Or with psql for database code etc etc.

    A lot of C programmers put inline assembly into their code. Mentioning that it is possible from within python would ease one part of their minds.

    Manipulating Cg, and assembler from within python is easier than from within C. Allthough probably about the same from within C++.

    ps. The buying new machines argument just doesn’t add up sometimes. If your code allready runs on six machines, then buying twice as many machines is not as cheap as tweaking the code for a couple of hours or days. If all the empirical evidence is true, and python code is 10-15 times slower than the top performers, then you would need to buy 60 new machines!!!

    Also, some groups are not in the position to buy new machines. They have time, but not money.

    I especially find the buy new machines argument sad for widely used libraries. If the library is going to be used by one million people, then spending a little time making it even 5% faster may be well worth it.

  • “””My quess is that this is because of the python interpretor byte compiles to .pyc the first time or if it hasn’t done so in a long time.”””

    Nope. It’s more likely that startup pulls in a ton of Python library modules, like ‘re’ (for regular expressions), and the ‘site’ module scans the site-packages directory for .pth files. I have heard that you can speed these things up by packaging the standard library in a .zip file and putting the .zip file on PYTHONPATH. But I don’t have any direct experience with that. I suspect that what you are experiencing for subsequent runtimes is due to operating system cache effects.

    One option for addressing your issue is to use ReadyExec, which has a small C client that basically hands off stdin/stdout/stderr to a running daemon. This only works on Unix-like operating systems, though.

    There has been some talk now and then on Python-Dev about speeding up startup, and I think some other changes were done in 2.4 for this, though I’m not sure.

  • Softwire and inline-assembly me be alike in programming style, but conceptually they’re worlds apart. This may not seem obvious now but let me explain.

    Compiled languages translate a description of a problem into machine-code. Like any translation the specific choosen translation tries to aproximate the meaning, because of course there are more possible translations then there is harddisk-space to contain codepaths.
    However, beeing able to generate machine-code at runtime is an entirely different matter. Suddenly only codepaths that are really run are translated, because it’s obvious from the domain-perspective.

    This makes runtime-assembly incredibly more flexible then it’s “set into stone” relatives “executables”.

    Now I think it’s really hard to make good use of it, but it sure would be nice to have the functionality. And pysoftwire is surely going to be a step into this direction.

  • In response to Anonymous, if you want to process the elements in a dictionary in alphabetical order (I assume that is what you mean by ‘sorting a dictionary’), the idiom I would use is:

    keyValues = theDictionary.items()
    for key, value in keyValues:
    … print key, value

  • Peter, your blog’s comments are broken; it insists on asking for user ID+password. Here’s what I tried to tell you over there about the performance problems in your code:

    If you’re using Python 2.4, try rewriting the ‘arrested’ method as:

    return sqrt(sum(x*x for x in self.directions))==J

    Or better yet:

    return sum(x*x for x in self.directions)==J2

    (where J2 = J*J)

    This should give you an enormous speedup. You should also get another small speedup by computing ‘range(MAXSTEPS)’ *once*, and keeping it in a constant, rather than computing it 20,000 times as you are currently doing. There are other inefficient things being done in your code, but these two are probably the most immediate in terms of bang for the buck.

    Also, the Basic program isn’t doing the same thing as your program is doing — it’s doing several things differently, including the fact that it’s not doing any floating point math and doesn’t make any function calls, so of course it’s going to be faster! This is an apples-to-oranges comparison. If you wrote a straightforward translation of the Basic program, it would be a hell of a lot faster than your Python program, and would probably produce similar output as well.

  • Peter, at least for the 1d case you can get a phenomenal speedup by skipping the whole random walk class and just do:

    distance = 0;
    for j in range(NUM_DRUNKARDS):
    for i in range(MAXSTEPS):
    if distance == J:
    bins[i] += 1
    distance += random.random()>0.5 and 1 or -1

  • If you need a list but also need to have fast (dictionary like)
    access to it; then I’d write a class around a list and the
    bisect module. I’ve done it before; it does binary searches
    and inserts into the wrapped list so it’s almost as fast as
    a dictionary for lookups, and obviously much faster then
    keylist = mydict.keys(); keylist.sort() for cases where you
    want to iterate in sorted order (for output reporting or



Stay In Touch

Follow our feeds or subscribe to get new articles by email on these topics:

  • RSS
  • RSS
  • RSS


Get Unstuck, FAST

Cover photo of "A Minute To Unlimit You" by PJ Eby
Skip to toolbar