Why does Python code run faster in a function?
You might ask why it is faster to store local variables than globals. This is a CPython implementation detail.
Remember that CPython is compiled to bytecode, which the interpreter runs. When a function is compiled, the local variables are stored in a fixed-size array (not a dict
) and variable names are assigned to indexes. This is possible because you can't dynamically add local variables to a function. Then retrieving a local variable is literally a pointer lookup into the list and a refcount increase on the PyObject
which is trivial.
Contrast this to a global lookup (LOAD_GLOBAL
), which is a true dict
search involving a hash and so on. Incidentally, this is why you need to specify global i
if you want it to be global: if you ever assign to a variable inside a scope, the compiler will issue STORE_FAST
s for its access unless you tell it not to.
By the way, global lookups are still pretty optimised. Attribute lookups foo.bar
are the really slow ones!
Here is small illustration on local variable efficiency.
Inside a function, the bytecode is:
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_CONST 3 (100000000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (i)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
At the top level, the bytecode is:
1 0 SETUP_LOOP 20 (to 23)
3 LOAD_NAME 0 (xrange)
6 LOAD_CONST 3 (100000000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_NAME 1 (i)
2 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 2 (None)
26 RETURN_VALUE
The difference is that STORE_FAST
is faster (!) than STORE_NAME
. This is because in a function, i
is a local but at toplevel it is a global.
To examine bytecode, use the dis
module. I was able to disassemble the function directly, but to disassemble the toplevel code I had to use the compile
builtin.