Shared publicly  - 
Some patterns for fast Python. Know any others?

- Avoid overengineering datastructures. Tuples are better than objects (try namedtuple too though). Prefer simple fields over getter/setter functions.

- Built-in datatypes are your friends. Use more numbers, strings, tuples, lists, sets, dicts. Also check out the collections library, esp. deque.

- Be suspicious of function/method calls; creating a stack frame is expensive.

- Don't write Java (or C++, or Javascript, ...) in Python.

- Are you sure it's too slow? Profile before optimizing!

- The universal speed-up is rewriting small bits of code in C. Do this only when all else fails.
Manash Singh's profile photoRalph Corderoy's profile photoFlavio Coelho's profile photoGuido van Rossum's profile photo
Some would argue that those are mostly patterns for fast CPython.  :-)
I like decorators. They can do some nifty tricks :)
+Grey Geek Needing to write getters and setters is a sign of poorly made language, tho :/
The rules of optimisation that I have always given, when needed:
1) Don't
2) (For experts only) Don't yet.
3) Profile before you do.

In the last few years I have had to give out rule Zero: Have you established if pypy is a viable option yet?
I'm strongly biased in favor of profiling, too. I can't remember how many times this saved me from optimizing the wrong part of a system, and helped me finding (sometimes not so) surprising bottlenecks. I suggest to use cProfile and RunSnakeRun for this. :-)
Understand the performance characteristics of basic operations on the builtin types. For example checking for membership in a list is O(N) but for a set or dictionary it is O(1). Adding lists and tuples creates new objects, inserting to the left of a list is O(N) whilst append is O(1) (use a deque instead). And so on.
Strings in Python really shines. Use them instead of lists with chars etc.
Best generic solution is to use pypy.

Batch return values In yields (return a list or tupple of many values).

Spawn gzip/bzip compressor/decompressor externally to have it multithreaded instead of using python internal gzip.

One really big downside is that function calls are that slow, which leads to ugly code all over the place, though mostly fixable with pypy.

Do not mix data types in containers for pypy.
Don't use threads in Python (and don't use multiprocessing either, its about 3 times slower than something like 0mq or even just shelling out). +1 on +Michael Foord for knowing the big O of the things you are working with.

Also, understand your memory footprint: avoiding circular references can prevent large (transient) memory bloat occuring, and memory bloat hurts performance a lot (not just because gc.collect gets slower, but it can trigger paging)

I've written some heinous code using basic datatypes. I'd take that one off the list:)
+Michael Foord I completely agree, this can sometimes reduce execution time in orders of magnitude! :-)
Using generators and generator expressions for intermediate values can save the time (and memory overhead) of creating (and deallocating) lists of values that you're only going to discard anyway.
+Michael Foord I'm -0 on using generator[ expression]s unless you know you have huge lists of values to iterate over. The concrete list [comprehension] is usually faster than the genexpr until memory allocation becomes critical.
Doug Fort
"".join([string1, string2, ...])
Basically, avoid everything that makes a high-level programming language because the compiler/interpreter won't optimize it away! :-/
For those of us JUST teaching ourselves programming, and using Python as our first learning language - can you describe (or link to) something describing profiling in this context?
I shouldn't say too much, since I'm not very familiar with Python yet. But the first three points there don't feel right to me. You seem to be advising against abstraction and encapsulation, which seem to me to be among a programmer's best tools.

Now, I appreciate that Python is a high-level language, and that its primitives will be more convenient than those of, say, C. But still, when designing a language, you cannot (should not try to) fulfill every programmer's wishes in the core language.

"Be suspicious of function/method calls; creating a stack frame is expensive."? This sounds like a case of premature optimization if I ever heard of one. If a programmer can easily inline a call, then so can the compiler.

I completely agree with the last three points. But the first three seem to contradict the fifth.
If regular expressions are eating up a lot of execution time, use pyre2:  It's significantly faster than the PCRE engine for most regular expressions, and compatible with the builtin re module.
Interesting that JIT has no place in the list
Where appropriate, use comprehensions instead of explicit loops.
+Paul Fisher  Many many thanks.  I come to this from designing boardgames, so the concept of "If it's a procedure the player is going to use a lot, SIMPLIFY, SIMPLIFY, SIMPLIFY" is one I'm familiar with.
"Better Than is the enemy of Good Enough."
"Future You will thank Past You when they look at this code." -- The Unofficial Python Mantra.
+Michiel Helvensteijn It doesn't make sense to create wrappers that don't add anything. Property getters/setters are a prime example of that. In languages like Java you have to do it solely that you can later swap out the trivial implementation for something more complicated without having to rewrite the clients. In Python, there's no syntactic difference between accessing a field and accessing a property for an API client, so you can start with a field and change to a property without breaking anyone. Given that, why bother with trivial getters/setters?
+Pavel Minaev Agreed. Very much so. But I'd say that's a point of style, not speed, which is what the original point was about. I'm assuming here that in any decent Python implementation, creating pointless method wrappers around fields will not have any noticeable effect on speed, if any. They would be optimized away, no?
I agree with the point about using primarily properties, directly, and if necessary use the "property()" function/decorator. But the point that worries me a little is the one about functions/methods and the stack size - it just strikes me how common it is to find huge functions/methods because people didn't want/know how to refactor them. IMO, clear code comes first, then you worry about performance.
"Don't write Java (or C++, or JavaScript, ...) in Python." is something I have heard many times but I have no idea what that means.
A good C module can save a lot of time that can be used to try out new ideas rather than optimizing:

lengths = np.sqrt(np.sum(vectors ** 2, axis=1))

Other than that, too fine grained funtions hurt a lot.
+David Wiebe Every programming language has its own idioms. If you come to Python with Java/C++ in your mind, your code won't be 'Pythonic' - you'll be trying to emulate Java/C++ in Python. And as we all know - emulation is slow. :)
I love deques and slicing in Python. Lend themselves very well to MergeSort for instance:
from collections import deque

def Merge(left, right):
∙∙∙∙while len(left) > 0 or len(right) > 0:
∙∙∙∙∙∙∙∙if len(left) > 0 and len(right) > 0:
∙∙∙∙∙∙∙∙∙∙∙∙if left[0] <= right[0]:
∙∙∙∙∙∙∙∙elif len(left) > 0:
∙∙∙∙∙∙∙∙elif len(right) > 0:
∙∙∙∙return ret 

def MergeSort(S):
∙∙∙∙if len(S) <= 1:
∙∙∙∙∙∙∙∙return S
∙∙∙∙q = len(S)/2
∙∙∙∙left = MergeSort(S[:q])
∙∙∙∙right = MergeSort(S[q:])
∙∙∙∙return Merge(deque(left), deque(right))
Use dicts to store and retrieve information--they are very fast.  Break your task up to simple and testable/verifiable functions so that you make less errors on each part.  Don't be afraid to give variables and functions long and descriptive names so you can reread your code and figure out what it does. Do things the way they make sense to you and don't worry about optimizing your code until you find out which part is slow.

Ignore the Python nay-sayers out there...I have never found anything that I have done in Python that was too slow (or, at least, that I couldn't speed up drastically when I needed to and without resorting to C).  You can achieve blazing execution times in other languages, but while the guy next to you is saving 300 milliseconds of processing time by writing his code in C++ you will be off having a beer because you wrote and tested your Python code in a fraction the time and then finished the job.
Aren't these suggestions pretty universal for all programming languages (with the exception perhaps of interpreted languages)?
Instantiating a class or calling a function has an inherent performance hit in any language!
The next book in the "Architecture of Open Source Applications" series will be titled "The Performance of Open Source Applications" (see for some details). If someone's interested in doing a piece on how to speed up Python (Guido, I'm looking at you...), or anything else, please give us a shout.
When you say, "- Don't write Java (or C++, or Javascript, ...) in Python." are you referring to Jython?
It's more likely he's trying to tell people not to try and bring patterns from other languages to Python.

Oh yeah, and one more thing: Whatever anyone tells you, don't be afraid of methods. You will not be able to maintain code if everything is crammed into a single method.
+Daniel Betz your second paragraph was what I wanted to say, thanks for reinforcing that! Single Responsibility Principle can be a real maintenance saver. :-)
Getters and setters are evil. Evil, evil, I say! Do not write getters and setters. This is what the 'property' built-in is for. And do not take that to mean that you should write getters and setters, and then wrap them in 'property'. That means that until you prove that you need anything more than a simple attribute access, don't write getters and setters. They are a waste of CPU time, but more important, they are a waste of programmer time. Not just for the people writing the code and tests, but for the people who have to read and understand them as well.
+Diogo Baeder I learned that the hard way. I inherited an in-house database cataloging tool, written in C#. The single most largest method spanned over 1000 lines. That's insane. Luckily, there's Resharper. :D
I don't know python yet however I will save this for a later day, thank you owo
+Guido van Rossum Re generators vs lists, I suspect that cases where generators are significantly slower than lists and it matters are quite rare.  I often write things in a generator-style simply because it seems cleaner and easier to read, not to mention that it often handles "huge" for free.  (Sometimes lists are better, of course.)
Funny. I often write lists because they seem cleaner to me. :-)
This is trending? Geez G+ is still packed with nerds :P
To the people who see some of this advice as being against using higher level OO concepts -- remember, this is when tuning for performance.
A lot of OO [and a lot of Python] is about writing maintainable code, which means comprehensible code.

Sometimes this does now mate up well with high performance.

That's not to say maintainable code can't be fast.  It can.  However, when you need to eke out the last few cycles of performance, sometimes you need to sacrifice "encapsulation and abstraction".

I often see Perl as "reaching toward CS ideals from the real world", and Python as "reaching toward the real world from CS ideals."  So, whilst we want to write idealistic code in Python, sometimes we have to make concessions to reality [limits of the implementation.]
I disagree with the last one: I'd rather maintain clean C than tightly squeezed Python that avoids named fields and function calls.

I'd say: rewrite to C (or Cython or Pyrex) once you're sure the algorithm is right and there's nothing obviously inefficient in the Python. Writing optimised Python is a fool's errand: it's hard to read and it's still not all that fast.
Comprehension is better than loops.
+Micheal Hill some of these like "measure before (and after) optimizing" are universal, but the specifics are actually quite Python-specific.

For instance in C or C++ accessing a field in a struct, or allocating a struct, is going to be very similar in time to operating on an array.  You would never prefer to use an array for performance reasons.

Function calls, too, have some cost, but the cost is moderately low and the compiler can often inline them.  I'm not going to measure it again now, but I would say in C a function call cost is on the order of 20 typical straight-line statements, but in Python it's more like 200.  I would be very reluctant to manually inline a function in C but in Python it may be the best choice.  (Though, as I said above, at that point it would be better to rewrite into some other language.)
Thanks for the "deque" tip. After reading your post, I found a few places where I could take advantage of this data structure.
Python has made it fun to code again...truth.
+Martin Pool I agree, if we're at that point where inlining functions really matters, it's time to consider using modules in C or another very high performance approach. To be honest, in my whole carreer, I've never been in a situation where it made sense to sacrifice readability for performance - everytime I've needed such a high performance, the (succesful) approach was to use high-performance alternatives to plain CPython (or to the plain language when I worked with other languages) -.

Most of the time, in my experience, that people tend to make such "code optimizations" (that end up sacrificing readability) they are just shooting in the wrong direction; Such situations include:
- Bad SQL queries;
- Unnecessary I/O (disk, network) trips;
- Naturally slow I/O (like consuming a slow webservice, or one with high latency)
- Bad algorithmn (like +Michael Foord mentioned, regarding big O)
and none of these are because of additional function/method calls.

Do you need blazing-fast Python? Try PyPy, Cython, or rewriting bits in C, but don't sacrifice readability.
where do i start to learn Phyton,... help! help! help! help!
"str" + "str" is faster than "%s%s" % (s, s) and even ''.join([s, s]) unless you're working with thousands of strings, though rarely more readable, and it rarely matters unless you're doing it thousands of times.

RAM (in-process dict) < memcached < disk IO < net IO - a good caching strategy saves lives.

Know your APIs: Disk IO, DB hits, or any network trips are (almost) always more expensive than any calculations you're doing - getting a thousand records can be quite fast (with modern DBs), but getting one record a thousand times is almost always intolerably slow - so watch your loops!

If you're crunching numbers, numpy is your friend (and so are its buddies, cython and pypy). If you're not, none of these tips probably apply to you, unless you have some crazy nano-second scale requirements for your libraries - and if so - Python may not be the best language! All languages are tools, and some tools are better suited to certain tasks than others.

Lastly, readability and maintainability always trumps performance - always - until performance is your only concern. In my experience, that almost never, ever happens. As long as you're not writing poor-performing code (for instance, hitting a DB in a large loop), you're going to be fine, until it's time to profile!
Don't write Java (or C++, or Javascript, ...) in Python.
I'm so Guilty in doing exactly that! But only because I was thrown in the deep end of python development after using C#, Java, and C++ for so long.
I fear that I started with the non-Pythonic coding and it stuck with me. There are so many opinions about how to write Python that ends up confusing more than helpful!

small example on how i thought it was confusing: 
- Classes and Constructors? (how to have constructors with multiple variables?? I've seen so many ways to do so, but which is the pythonic way?) 
All very nice suggestions Guido, but the most effective general speedup is to have a JIT.  PyPy comes to mind:
Why isn't this part of "standard" Python?
I wish PyGTK guys read this aloud, like, a few hundred times — as penance.
"- Be suspicious of function/method calls; creating a stack frame is expensive."

I would much rather change the language if I had to get down to this level (or change to using C). This seems to encourage writing fewer large functions rather than many smaller ones. And that just seems so inappropriate from being able to cleanly abstract specific elements of logic. (Exception: unless this advice was applied to a very very small handful deep inner loop functions).
I would add two more rules of thumb to the list:

- do only one thing at a time and focus on code purpose not on a specific implementation. Now let's see if there is some existing fast library doing just that...

- memoizing is your friend and objets are great for that (this one prooved really efficient dealing with project Euler problems)
If it matches your problemspace, itertools in tandem with the operator module looks like pure python, but is pretty much all C under the hood, and is consequently both easy to use and very fast.  

Unfortunately, since most of the functions in the operator module don't take keyword args, or alternatively, since there's no C-implemented equivalent to functools.partial which allows pre-filling positional args by index out of order, for many tasks, operator becomes useless and you're forced back into the land of the lambda.
What +Dhananjay Nene said! And ...

Why bother with an OO language if you are not going to use classes? You might as well just write it in C and be done with it.
I think the "avoid function calls" bit is mostly to get people to think twice about filling their inner loop with a call(s) to a short function that otherwise could have been manually inlined -- in these cases, the majority of cpu time would be spent on (un)winding the call stack; Inner loops, of course, is where performance "really matters".

I might call Java an "OO language." Python, however, is most certainly multi-paradigm, and you can write non-trivial programs using no OO whatsoever, just as you can write non-trivial programs using nothing but OO, and depending on the problem, either solution may be entirely pythonic.
Avoid attribute lookups by caching in a local, especially if it's occurring in a high-iteration loop.
+Ralph Corderoy that's a good example of "moving loop invariants out of the loop". Everything that doesn't change can be cached as local variables, even instance methods.

This is one of the optimizations that the pypy JIT does for you.

Worth noting that local variable lookup is faster than global variable or builtin lookup, and much faster than attribute lookup. One micro-optimization is to move everything (even builtins) into local variables. It tends to make your code less readable, so only do it if the performance benefit is really worth it.
I Think "avoid function is very difcult matter.    
+Michael Foord, yes, loop-invariant hoisting. Attribute lookup can be particularly painful. The dis module dis() can be fun to see what's going on. Really, locals aren't `looked up', a local is given a slot at compile-time and its index is used from then on.