How does non-strict and lazy differ?
Non-strict and lazy, while informally interchangeable, apply to different domains of discussion.
Non-strict refers to semantics: the mathematical meaning of an expression. The world to which non-strict applies has no concept of the running time of a function, memory consumption, or even a computer. It simply talks about what kinds of values in the domain map to which kinds of values in the codomain. In particular, a strict function must map the value ⊥ ("bottom" -- see the semantics link above for more about this) to ⊥; a non strict function is allowed not to do this.
Lazy refers to operational behavior: the way code is executed on a real computer. Most programmers think of programs operationally, so this is probably what you are thinking. Lazy evaluation refers to implementation using thunks -- pointers to code which are replaced with a value the first time they are executed. Notice the non-semantic words here: "pointer", "first time", "executed".
Lazy evaluation gives rise to non-strict semantics, which is why the concepts seem so close together. But as FUZxxl points out, laziness is not the only way to implement non-strict semantics.
If you are interested in learning more about this distinction, I highly recommend the link above. Reading it was a turning point in my conception of the meaning of computer programs.
An example for an evaluation model, that is neither strict nor lazy: optimistic evaluation, which gives some speedup as it can avoid a lot of "easy" thunks:
Optimistic evaluation means that even if a subexpression may not be needed to evaluate the superexpression, we still evaluate some of it using some heuristics. If the subexpression doesn't terminate quickly enough, we suspend its evaluation until it's really needed. This gives us an advantage over lazy evaluation if the subexpression is needed later, as we don't need to generate a thunk. On the other hand, we don't lose too much if the expression doesn't terminate, as we can abort it quickly enough.
As you can see, this evaluation model is not strict: If something that yields _|_ is evaluated, but not needed, the function will still terminate, as the engine aborts the evaluation. On the other hand, it may be possible that more expressions than needed are evaluated, so it's not completely lazy.
Yes, there is some unclear use of terminology here, but the terms coincide in most cases regardless, so it's not too much of a problem.
One major difference is when terms are evaluated. There are multiple strategies for this, ranging on a spectrum from "as soon as possible" to "only at the last moment". The term eager evaluation is sometimes used for strategies leaning toward the former, while lazy evaluation properly refers to a family of strategies leaning heavily toward the latter. The distinction between "lazy evaluation" and related strategies tend to involve when and where the result of evaluating something is retained, vs. being tossed aside. The familiar memoization technique in Haskell of assigning a name to a data structure and indexing into it is based on this. In contrast, a language that simply spliced expressions into each other (as in "call-by-name" evaluation) might not support this.
The other difference is which terms are evaluated, ranging from "absolutely everything" to "as little as possible". Since any value actually used to compute the final result can't be ignored, the difference here is how many superfluous terms are evaluated. As well as reducing the amount of work the program has to do, ignoring unused terms means that any errors they would have generated won't occur. When a distinction is being drawn, strictness refers to the property of evaluating everything under consideration (in the case of a strict function, for instance, this means the terms it's applied to. It doesn't necessarily mean sub-expressions inside the arguments), while non-strict means evaluating only some things (either by delaying evaluation, or by discarding terms entirely).
It should be easy to see how these interact in complicated ways; decisions are not at all orthogonal, as the extremes tend to be incompatible. For instance:
Very non-strict evaluation precludes some amount of eagerness; if you don't know whether a term will be needed, you can't evaluate it yet.
Very strict evaluation makes non-eagerness somewhat irrelevant; if you're evaluating everything, the decision of when to do so is less significant.
Alternate definitions do exist, though. For instance, at least in Haskell, a "strict function" is often defined as one that forces its arguments sufficiently that the function will evaluate to _|_ ("bottom") whenever any argument does; note that by this definition, id
is strict (in a trivial sense), because forcing the result of id x
will have exactly the same behavior as forcing x
alone.