What does Expression.Quote() do that Expression.Constant() can’t already do?

Short answer:

The quote operator is an operator which induces closure semantics on its operand. Constants are just values.

Quotes and constants have different meanings and therefore have different representations in an expression tree. Having the same representation for two very different things is extremely confusing and bug prone.

Long answer:

Consider the following:

(int s)=>(int t)=>s+t

The outer lambda is a factory for adders that are bound to the outer lambda's parameter.

Now, suppose we wish to represent this as an expression tree that will later be compiled and executed. What should the body of the expression tree be? It depends on whether you want the compiled state to return a delegate or an expression tree.

Let's begin by dismissing the uninteresting case. If we wish it to return a delegate then the question of whether to use Quote or Constant is a moot point:

        var ps = Expression.Parameter(typeof(int), "s");
        var pt = Expression.Parameter(typeof(int), "t");
        var ex1 = Expression.Lambda(
                Expression.Lambda(
                    Expression.Add(ps, pt),
                pt),
            ps);

        var f1a = (Func<int, Func<int, int>>) ex1.Compile();
        var f1b = f1a(100);
        Console.WriteLine(f1b(123));

The lambda has a nested lambda; the compiler generates the interior lambda as a delegate to a function closed over the state of the function generated for the outer lambda. We need consider this case no more.

Suppose we wish the compiled state to return an expression tree of the interior. There are two ways to do that: the easy way and the hard way.

The hard way is to say that instead of

(int s)=>(int t)=>s+t

what we really mean is

(int s)=>Expression.Lambda(Expression.Add(...

And then generate the expression tree for that, producing this mess:

        Expression.Lambda(
            Expression.Call(typeof(Expression).GetMethod("Lambda", ...

blah blah blah, dozens of lines of reflection code to make the lambda. The purpose of the quote operator is to tell the expression tree compiler that we want the given lambda to be treated as an expression tree, not as a function, without having to explicitly generate the expression tree generation code.

The easy way is:

        var ex2 = Expression.Lambda(
            Expression.Quote(
                Expression.Lambda(
                    Expression.Add(ps, pt),
                pt)),
            ps);

        var f2a = (Func<int, Expression<Func<int, int>>>)ex2.Compile();
        var f2b = f2a(200).Compile();
        Console.WriteLine(f2b(123));

And indeed, if you compile and run this code you get the right answer.

Notice that the quote operator is the operator which induces closure semantics on the interior lambda which uses an outer variable, a formal parameter of the outer lambda.

The question is: why not eliminate Quote and make this do the same thing?

        var ex3 = Expression.Lambda(
            Expression.Constant(
                Expression.Lambda(
                    Expression.Add(ps, pt),
                pt)),
            ps);

        var f3a = (Func<int, Expression<Func<int, int>>>)ex3.Compile();
        var f3b = f3a(300).Compile();
        Console.WriteLine(f3b(123));

The constant does not induce closure semantics. Why should it? You said that this was a constant. It's just a value. It should be perfect as handed to the compiler; the compiler should be able to just generate a dump of that value to the stack where it is needed.

Since there is no closure induced, if you do this you'll get a "variable 's' of type 'System.Int32' is not defined" exception on the invocation.

(Aside: I've just reviewed the code generator for delegate creation from quoted expression trees, and unfortunately a comment that I put into the code back in 2006 is still there. FYI, the hoisted outer parameter is snapshotted into a constant when the quoted expression tree is reified as a delegate by the runtime compiler. There was a good reason why I wrote the code that way which I do not recall at this exact moment, but it does have the nasty side effect of introducing closure over values of outer parameters rather than closure over variables. Apparently the team which inherited that code decided to not fix that flaw, so if you are relying upon mutation of a closed-over outer parameter being observed in a compiled quoted interior lambda, you're going to be disappointed. However, since it is a pretty bad programming practice to both (1) mutate a formal parameter and (2) rely upon mutation of an outer variable, I would recommend that you change your program to not use these two bad programming practices, rather than waiting for a fix which does not appear to be forthcoming. Apologies for the error.)

So, to repeat the question:

The C# compiler could have been made to compile nested lambda expressions into an expression tree involving Expression.Constant() instead of Expression.Quote(), and any LINQ query provider that wants to process expression trees into some other query language (such as SQL) could look out for a ConstantExpression with type Expression instead of a UnaryExpression with the special Quote node type, and everything else would be the same.

You are correct. We could encode semantic information that means "induce closure semantics on this value" by using the type of the constant expression as a flag.

"Constant" would then have the meaning "use this constant value, unless the type happens to be an expression tree type and the value is a valid expression tree, in which case, instead use the value that is the expression tree resulting from rewriting the interior of the given expression tree to induce closure semantics in the context of any outer lambdas that we might be in right now.

But why would we do that crazy thing? The quote operator is an insanely complicated operator, and it should be used explicitly if you're going to use it. You're suggesting that in order to be parsimonious about not adding one extra factory method and node type amongst the several dozen already there, that we add a bizarre corner case to constants, so that constants are sometimes logically constants, and sometimes they are rewritten lambdas with closure semantics.

It would also have the somewhat odd effect that constant doesn't mean "use this value". Suppose for some bizarre reason you wanted the third case above to compile an expression tree into a delegate that hands out an expression tree that has a not-rewritten reference to an outer variable? Why? Perhaps because you are testing your compiler and want to just pass the constant on through so that you can perform some other analysis on it later. Your proposal would make that impossible; any constant that happens to be of expression tree type would be rewritten regardless. One has a reasonable expectation that "constant" means "use this value". "Constant" is a "do what I say" node. The constant processor's job is not to guess at what you meant to say based on the type.

And note of course that you are now putting the burden of understanding (that is, understanding that constant has complicated semantics that mean "constant" in one case and "induce closure semantics" based on a flag that is in the type system) upon every provider that does semantic analysis of an expression tree, not just upon Microsoft providers. How many of those third-party providers would get it wrong?

"Quote" is waving a big red flag that says "hey buddy, look over here, I'm a nested lambda expression and I have wacky semantics if I'm closed over an outer variable!" whereas "Constant" is saying "I'm nothing more than a value; use me as you see fit." When something is complicated and dangerous we want to be making it wave red flags, not hiding that fact by making the user dig through the type system in order to find out whether this value is a special one or not.

Furthermore, the idea that avoiding redundancy is even a goal is incorrect. Sure, avoiding unnecessary, confusing redundancy is a goal, but most redundancy is a good thing; redundancy creates clarity. New factory methods and node kinds are cheap. We can make as many as we need so that each one represents one operation cleanly. We have no need to resort to nasty tricks like "this means one thing unless this field is set to this thing, in which case it means something else."


This question has already received an excellent answer. I'd additionally like to point to a resource that can prove helpful with questions about expression trees:

There is was a CodePlex project by Microsoft called Dynamic Language Runtime. Its documentation includes the document titled, "Expression Trees v2 Spec", which is exactly that: The specification for LINQ expression trees in .NET 4.

Update: CodePlex is defunct. The Expression Trees v2 Spec (PDF) has moved to GitHub.

For example, it says the following about Expression.Quote:

4.4.42 Quote

Use Quote in UnaryExpressions to represents an expression that has a "constant" value of type Expression. Unlike a Constant node, the Quote node specially handles contained ParameterExpression nodes. If a contained ParameterExpression node declares a local that would be closed over in the resulting expression, then Quote replaces the ParameterExpression in its reference locations. At run time when the Quote node is evaluated, it substitutes the closure variable references for the ParameterExpression reference nodes, and then returns the quoted expression. […] (p. 63–64)


After this a really excellent answer, it's clear what are the semantics. It's not so clear why they are designed that way, consider:

Expression.Lambda(Expression.Add(ps, pt));

When this lambda is compiled and invoked it evaluates the inner expression and returns the result. Inner expression here is an addition, so the ps+pt is evaluated and the result is returned. Following this logic, the following expression:

Expression.Lambda(
    Expression.Lambda(
              Expression.Add(ps, pt),
            pt), ps);

should return an inner's lambda compiled method reference when the outer lambda is invoked (because we say that lambda compiles to a method reference). So why do we need a Quote?! To differentiate the case when the method reference is returned vs. the result of that reference invocation.

Specifically:

let f = Func<...>
return f; vs. return f(...);

Due to some reason .Net designers chose Expression.Quote(f) for the first case and plain f for the second. To my view this causes a great deal of confusion, since in most programming languages returning a value is direct (no need for Quote or any other operation), but invocation does require extra writing (parentheses + arguments), which translates to some kind of invoke at MSIL level. .Net designers made it the opposite for the expression trees. Would be interesting to know the reason.