Complexity of Java 7's current Lambda proposal? (August 2010)
Modulo some scope-disambiguation constructs, almost all of these methods follow from the actual definition of a lambda abstraction:
λx.E
To answer your questions in order:
I don't think there are any particular things that make the proposals by the Java community better or worse than anything else. As I said, it follows from the mathematical definition, and therefore all faithful implementations are going to have almost exactly the same form.
Anonymous first-class functions bolted onto imperative languages tend to end up as a feature that some programmers love and use frequently, and that others ignore completely - therefore it is probably a sensible choice to give it some syntax that will not confuse the kinds of people who choose to ignore the presence of this particular language feature. I think hiding the complexity and particulars of implementation is what they have attempted to do by using syntax that blends well with Java, but which has no real connotation for Java programmers.
It's probably desirable for them to use some bits of syntax that are not going to complicate existing definitions, and so they are slightly constrained in the symbols they can choose to use as operators and such. Certainly Java's insistence on remaining backwards-compatible limits the language evolution slightly, but I don't think this is necessarily a bad thing. The PHP approach is at the other end of the spectrum (i.e. "let's break everything every time there is a new point release!"). I don't think that Java's evolution is inherently limited except by some of the fundamental tenets of its design - e.g. adherence to OOP principles, VM-based.
I think it's very difficult to make strong statements about language evolution from Java's perspective. It is in a reasonably unique position. For one, it's very, very popular, but it's relatively old. Microsoft had the benefit of at least 10 years worth of Java legacy before they decided to even start designing a language called "C#". The C programming language basically stopped evolving at all. C++ has had few significant changes that found any mainstream acceptance. Java has continued to evolve through a slow but consistent process - if anything I think it is better-equipped to keep on evolving than any other languages with similarly huge installed code bases.
I have not followed the process and evolution of the Java 7 lambda proposal, I am not even sure of what the latest proposal wording is. Consider this as a rant/opinion rather than statements of truth. Also, I have not used Java for ages, so the syntax might be rusty and incorrect at places.
First, what are lambdas to the Java language? Syntactic sugar. While in general lambdas enable code to create small function objects in place, that support was already preset --to some extent-- in the Java language through the use of inner classes.
So how much better is the syntax of lambdas? Where does it outperform previous language constructs? Where could it be better?
For starters, I dislike the fact that there are two available syntax
for lambda functions (but this goes in the line of C#, so I guess my
opinion is not widespread. I guess if we want to sugar coat, then
#(int x)(x*x)
is sweeter than #(int x){ return x*x; }
even if the
double syntax does not add anything else. I would have preferred the
second syntax, more generic at the extra cost of writting return
and
;
in the short versions.
To be really useful, lambdas can take variables from the scope in
where they are defined and from a closure. Being consistent with
Inner classes, lambdas are restricted to capturing 'effectively
final' variables. Consistency with the previous features of the
language is a nice feature, but for sweetness, it would be nice to be
able to capture variables that can be reassigned. For that purpose,
they are considering that variables present in the context and
annotated with @Shared
will be captured by-reference, allowing
assignments. To me this seems weird as how a lambda can use a variable
is determined at the place of declaration of the variable rather than
where the lambda is defined. A single variable could be used in more
than one lambda and this forces the same behavior in all of them.
Lambdas try to simulate actual function objects, but the proposal does
not get completely there: to keep the parser simple, since up to now
an identifier denotes either an object or a method that has been kept
consistent and calling a lambda requires using a !
after the lambda
name: #(int x)(x*x)!(5)
will return 25
. This brings a new syntax
to use for lambdas that differ from the rest of the language, where
!
stands somehow as a synonim for .execute
on a virtual generic
interface Lambda<Result,Args...>
but, why not make it complete?
A new generic (virtual) interface Lambda
could be created. It would
have to be virtual as the interface is not a real interface, but a
family of such: Lambda<Return>
, Lambda<Return,Arg1>
,
Lambda<Return,Arg1,Arg2>
... They could define a single execution
method, which I would like to be like C++ operator()
, but if that is
a burden then any other name would be fine, embracing the !
as a
shortcut for the method execution:
interface Lambda<R> {
R exec();
}
interface Lambda<R,A> {
R exec( A a );
}
Then the compiler need only translate identifier!(args)
to
identifier.exec( args )
, which is simple. The translation of the
lambda syntax would require the compiler to identify the proper
interface being implemented and could be matched as:
#( int x )(x *x)
// translated to
new Lambda<int,int>{ int exec( int x ) { return x*x; } }
This would also allow users to define Inner classes that can be used
as lambdas, in more complex situations. For example, if lambda
function needed to capture a variable annotated as @Shared
in a
read-only manner, or maintain the state of the captured object at the
place of capture, manual implementation of the Lambda would be
available:
new Lambda<int,int>{ int value = context_value;
int exec( int x ) { return x * context_value; }
};
In a manner similar to what the current Inner classes definition is, and thus being natural to current Java users. This could be used, for example, in a loop to generate multiplier lambdas:
Lambda<int,int> array[10] = new Lambda<int,int>[10]();
for (int i = 0; i < 10; ++i ) {
array[i] = new Lambda<int,int>{ final int multiplier = i;
int exec( int x ) { return x * multiplier; }
};
}
// note this is disallowed in the current proposal, as `i` is
// not effectively final and as such cannot be 'captured'. Also
// if `i` was marked @Shared, then all the lambdas would share
// the same `i` as the loop and thus would produce the same
// result: multiply by 10 --probably quite unexpectedly.
//
// I am aware that this can be rewritten as:
// for (int ii = 0; ii < 10; ++ii ) { final int i = ii; ...
//
// but that is not simplifying the system, just pushing the
// complexity outside of the lambda.
This would allow usage of lambdas and methods that accept lambdas both
with the new simple syntax: #(int x){ return x*x; }
or with the more
complex manual approach for specific cases where the sugar coating
interferes with the intended semantics.
Overall, I believe that the lambda proposal can be improved in different directions, that the way it adds syntactic sugar is a leaking abstraction (you have deal externally with issues that are particular to the lambda) and that by not providing a lower level interface it makes user code less readable in use cases that do not perfectly fit the simple use case. :