Lambda function in list comprehensions
The big difference is that the first example actually invokes the lambda f(x)
, while the second example doesn't.
Your first example is equivalent to [(lambda x: x*x)(x) for x in range(10)]
while your second example is equivalent to [f for x in range(10)]
.
This question touches a very stinking part of the "famous" and "obvious" Python syntax - what takes precedence, the lambda, or the for of list comprehension.
I don't think the purpose of the OP was to generate a list of squares from 0 to 9. If that was the case, we could give even more solutions:
squares = []
for x in range(10): squares.append(x*x)
- this is the good ol' way of imperative syntax.
But it's not the point. The point is W(hy)TF is this ambiguous expression so counter-intuitive? And I have an idiotic case for you at the end, so don't dismiss my answer too early (I had it on a job interview).
So, the OP's comprehension returned a list of lambdas:
[(lambda x: x*x) for x in range(10)]
This is of course just 10 different copies of the squaring function, see:
>>> [lambda x: x*x for _ in range(3)]
[<function <lambda> at 0x00000000023AD438>, <function <lambda> at 0x00000000023AD4A8>, <function <lambda> at 0x00000000023AD3C8>]
Note the memory addresses of the lambdas - they are all different!
You could of course have a more "optimal" (haha) version of this expression:
>>> [lambda x: x*x] * 3
[<function <lambda> at 0x00000000023AD2E8>, <function <lambda> at 0x00000000023AD2E8>, <function <lambda> at 0x00000000023AD2E8>]
See? 3 time the same lambda.
Please note, that I used _
as the for
variable. It has nothing to do with the x
in the lambda
(it is overshadowed lexically!). Get it?
I'm leaving out the discussion, why the syntax precedence is not so, that it all meant:
[lambda x: (x*x for x in range(10))]
which could be: [[0, 1, 4, ..., 81]]
, or [(0, 1, 4, ..., 81)]
, or which I find most logical, this would be a list
of 1 element - a generator
returning the values. It is just not the case, the language doesn't work this way.
BUT What, If...
What if you DON'T overshadow the for
variable, AND use it in your lambda
s???
Well, then crap happens. Look at this:
[lambda x: x * i for i in range(4)]
this means of course:
[(lambda x: x * i) for i in range(4)]
BUT it DOESN'T mean:
[(lambda x: x * 0), (lambda x: x * 1), ... (lambda x: x * 3)]
This is just crazy!
The lambdas in the list comprehension are a closure over the scope of this comprehension. A lexical closure, so they refer to the i
via reference, and not its value when they were evaluated!
So, this expression:
[(lambda x: x * i) for i in range(4)]
IS roughly EQUIVALENT to:
[(lambda x: x * 3), (lambda x: x * 3), ... (lambda x: x * 3)]
I'm sure we could see more here using a python decompiler (by which I mean e.g. the dis
module), but for Python-VM-agnostic discussion this is enough.
So much for the job interview question.
Now, how to make a list
of multiplier lambdas, which really multiply by consecutive integers? Well, similarly to the accepted answer, we need to break the direct tie to i
by wrapping it in another lambda
, which is getting called inside the list comprehension expression:
Before:
>>> a = [(lambda x: x * i) for i in (1, 2)]
>>> a[1](1)
2
>>> a[0](1)
2
After:
>>> a = [(lambda y: (lambda x: y * x))(i) for i in (1, 2)]
>>> a[1](1)
2
>>> a[0](1)
1
(I had the outer lambda variable also = i
, but I decided this is the clearer solution - I introduced y
so that we can all see which witch is which).
Edit 2019-08-30:
Following a suggestion by @josoler, which is also present in an answer by @sheridp - the value of the list comprehension "loop variable" can be "embedded" inside an object - the key is for it to be accessed at the right time. The section "After" above does it by wrapping it in another lambda
and calling it immediately with the current value of i
. Another way (a little bit easier to read - it produces no 'WAT' effect) is to store the value of i
inside a partial
object, and have the "inner" (original) lambda
take it as an argument (passed supplied by the partial
object at the time of the call), i.e.:
After 2:
>>> from functools import partial
>>> a = [partial(lambda y, x: y * x, i) for i in (1, 2)]
>>> a[0](2), a[1](2)
(2, 4)
Great, but there is still a little twist for you! Let's say we wan't to make it easier on the code reader, and pass the factor by name (as a keyword argument to partial
). Let's do some renaming:
After 2.5:
>>> a = [partial(lambda coef, x: coef * x, coef=i) for i in (1, 2)]
>>> a[0](1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: <lambda>() got multiple values for argument 'coef'
WAT?
>>> a[0]()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: <lambda>() missing 1 required positional argument: 'x'
Wait... We're changing the number of arguments by 1, and going from "too many" to "too few"?
Well, it's not a real WAT, when we pass coef
to partial
in this way, it becomes a keyword argument, so it must come after the positional x
argument, like so:
After 3:
>>> a = [partial(lambda x, coef: coef * x, coef=i) for i in (1, 2)]
>>> a[0](2), a[1](2)
(2, 4)
I would prefer the last version over the nested lambda, but to each their own...
Edit 2020-08-18:
Thanks to commenter dasWesen, I found out that this stuff is covered in the Python documentation: https://docs.python.org/3.4/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result - it deals with loops instead of list comprehensions, but the idea is the same - global or nonlocal variable access in the lambda function. There's even a solution - using default argument values (like for any function):
>>> a = [lambda x, coef=i: coef * x for i in (1, 2)]
>>> a[0](2), a[1](2)
(2, 4)
This way the coef value is bound to the value of i at the time of function definition (see James Powell's talk "Top To Down, Left To Right", which also explains why mutable default values are shunned).
The first one
f = lambda x: x*x
[f(x) for x in range(10)]
runs f()
for each value in the range so it does f(x)
for each value
the second one
[lambda x: x*x for x in range(10)]
runs the lambda for each value in the list, so it generates all of those functions.
The first one creates a single lambda function and calls it ten times.
The second one doesn't call the function. It creates 10 different lambda functions. It puts all of those in a list. To make it equivalent to the first you need:
[(lambda x: x*x)(x) for x in range(10)]
Or better yet:
[x*x for x in range(10)]