Why is [] faster than list()?
list()
requires a global lookup and a function call but []
compiles to a single instruction. See:
Python 2.7.3
>>> import dis
>>> dis.dis(lambda: list())
1 0 LOAD_GLOBAL 0 (list)
3 CALL_FUNCTION 0
6 RETURN_VALUE
>>> dis.dis(lambda: [])
1 0 BUILD_LIST 0
3 RETURN_VALUE
Because list
is a function to convert say a string to a list object, while []
is used to create a list off the bat. Try this (might make more sense to you):
x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]
While
y = ["wham bam"]
>>> y
["wham bam"]
Gives you a actual list containing whatever you put in it.
Because []
and {}
are literal syntax. Python can create bytecode just to create the list or dictionary objects:
>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
1 0 BUILD_LIST 0
3 RETURN_VALUE
>>> dis.dis(compile('{}', '', 'eval'))
1 0 BUILD_MAP 0
3 RETURN_VALUE
list()
and dict()
are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.
For the empty case, that means you have at the very least a LOAD_NAME
(which has to search through the global namespace as well as the builtins
module) followed by a CALL_FUNCTION
, which has to preserve the current frame:
>>> dis.dis(compile('list()', '', 'eval'))
1 0 LOAD_NAME 0 (list)
3 CALL_FUNCTION 0
6 RETURN_VALUE
>>> dis.dis(compile('dict()', '', 'eval'))
1 0 LOAD_NAME 0 (dict)
3 CALL_FUNCTION 0
6 RETURN_VALUE
You can time the name lookup separately with timeit
:
>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119
The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:
>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125
So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39
seconds per 10 million calls.
You can avoid the global lookup cost by aliasing the global names as locals (using a timeit
setup, everything you bind to a name is a local):
>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137
but you never can overcome that CALL_FUNCTION
cost.