sorted() using generator expressions rather than lists
The first thing sorted()
does is to convert the data to a list. Basically the first line (after argument validation) of the implementation is
newlist = PySequence_List(seq);
See also the full source code version 2.7 and version 3.1.2.
Edit: As pointed out in the answer by aaronasterling, the variable newlist
is, well, a new list. If the parameter is already a list, it is copied. So a generator expression really has the advantage of using less memory.
There's no way to sort a sequence without knowing all the elements of the sequence, so any generator passed to sorted()
is exhausted.
The easiest way to see which is faster is to use timeit
and it tells me that it's faster to pass a list rather than a generator:
>>> import random
>>> randomlist = range(1000)
>>> random.shuffle(randomlist)
>>> import timeit
>>> timeit.timeit("sorted(x for x in randomlist)",setup = "from __main__ import randomlist",number = 10000)
4.944492386602178
>>> timeit.timeit("sorted([x for x in randomlist])",setup = "from __main__ import randomlist",number = 10000)
4.635165083830486
And:
>>> timeit.timeit("sorted(x for x in xrange(1000,1,-1))",number = 10000)
1.411807087213674
>>> timeit.timeit("sorted([x for x in xrange(1000,1,-1)])",number = 10000)
1.0734657617099401
I think this is because when sorted()
converts the incoming value to a list it can do this more quickly for something that is already a list than for a generator. The source code seems to confirm this (but this is from reading the comments rather than fully understanding everything that is going on).