Please explain some of Paul Graham's points on Lisp
Matt's explanation is perfectly fine -- and he takes a shot at a comparison to C and Java, which I won't do -- but for some reason I really enjoy discussing this very topic once in a while, so -- here's my shot at an answer.
On points (3) and (4):
Points (3) and (4) on your list seem the most interesting and still relevant now.
To understand them, it is useful to have a clear picture of what happens with Lisp code -- in the form of a stream of characters typed in by the programmer -- on its way to being executed. Let's use a concrete example:
;; a library import for completeness,
;; we won't concern ourselves with it
(require '[clojure.contrib.string :as str])
;; this is the interesting bit:
(println (str/replace-re #"\d+" "FOO" "a123b4c56"))
This snippet of Clojure code prints out aFOObFOOcFOO
. Note that Clojure arguably does not fully satisfy the fourth point on your list, since read-time is not really open to user code; I will discuss what it would mean for this to be otherwise, though.
So, suppose we've got this code in a file somewhere and we ask Clojure to execute it. Also, let's assume (for the sake of simplicity) that we've made it past the library import. The interesting bit starts at (println
and ends at the )
far to the right. This is lexed / parsed as one would expect, but already an important point arises: the result is not some special compiler-specific AST representation -- it's just a regular Clojure / Lisp data structure, namely a nested list containing a bunch of symbols, strings and -- in this case -- a single compiled regex pattern object corresponding to the #"\d+"
literal (more on this below). Some Lisps add their own little twists to this process, but Paul Graham was mostly referring to Common Lisp. On the points relevant to your question, Clojure is similar to CL.
The whole language at compile time:
After this point, all the compiler deals with (this would also be true for a Lisp interpreter; Clojure code happens always to be compiled) is Lisp data structures which Lisp programmers are used to manipulating. At this point a wonderful possibility becomes apparent: why not allow Lisp programmers to write Lisp functions which manipulate Lisp data representing Lisp programmes and output transformed data representing transformed programmes, to be used in place of the originals? In other words -- why not allow Lisp programmers to register their functions as compiler plugins of sorts, called macros in Lisp? And indeed any decent Lisp system has this capacity.
So, macros are regular Lisp functions operating on the programme's representation at compile time, before the final compilation phase when actual object code is emitted. Since there are no limits on the kinds of code macros are allowed to run (in particular, the code which they run is often itself written with liberal use of the macro facility), one can say that "the whole language is available at compile time".
The whole language at read time:
Let's go back to that #"\d+"
regex literal. As mentioned above, this gets transformed to an actual compiled pattern object at read time, before the compiler hears the first mention of new code being prepared for compilation. How does this happen?
Well, the way Clojure is currently implemented, the picture is somewhat different than what Paul Graham had in mind, although anything is possible with a clever hack. In Common Lisp, the story would be slightly cleaner conceptually. The basics are however similar: the Lisp Reader is a state machine which, in addition to performing state transitions and eventually declaring whether it has reached an "accepting state", spits out Lisp data structures the characters represent. Thus the characters 123
become the number 123
etc. The important point comes now: this state machine can be modified by user code. (As noted earlier, that's entirely true in CL's case; for Clojure, a hack (discouraged & not used in practice) is required. But I digress, it's PG's article I'm supposed to be elaborating on, so...)
So, if you're a Common Lisp programmer and you happen to like the idea of Clojure-style vector literals, you can just plug into the reader a function to react appropriately to some character sequence -- [
or #[
possibly -- and treat it as the start of a vector literal ending at the matching ]
. Such a function is called a reader macro and just like a regular macro, it can execute any sort of Lisp code, including code which has itself been written with funky notation enabled by previously registered reader macros. So there's the whole language at read time for you.
Wrapping it up:
Actually, what has been demonstrated thus far is that one can run regular Lisp functions at read time or compile time; the one step one needs to take from here to understanding how reading and compiling are themselves possible at read, compile or run time is to realise that reading and compiling are themselves performed by Lisp functions. You can just call read
or eval
at any time to read in Lisp data from character streams or compile & execute Lisp code, respectively. That's the whole language right there, all the time.
Note how the fact that Lisp satisfies point (3) from your list is essential to the way in which it manages to satisfy point (4) -- the particular flavour of macros provided by Lisp heavily relies on code being represented by regular Lisp data, which is something enabled by (3). Incidentally, only the "tree-ish" aspect of the code is really crucial here -- you could conceivably have a Lisp written using XML.
1) A new concept of variables. In Lisp, all variables are effectively pointers. Values are what have types, not variables, and assigning or binding variables means copying pointers, not what they point to.
(defun print-twice (it)
(print it)
(print it))
'it' is a variable. It can be bound to ANY value. There is no restriction and no type associated with the variable. If you call the function, the argument does not need to be copied. The variable is similar to a pointer. It has a way to access the value that is bound to the variable. There is no need to reserve memory. We can pass any data object when we call the function: any size and any type.
The data objects have a 'type' and all data objects can be queried for its 'type'.
(type-of "abc") -> STRING
2) A symbol type. Symbols differ from strings in that you can test equality by comparing a pointer.
A symbol is a data object with a name. Usually the name can be used to find the object:
|This is a Symbol|
this-is-also-a-symbol
(find-symbol "SIN") -> SIN
Since symbols are real data objects, we can test whether they are the same object:
(eq 'sin 'cos) -> NIL
(eq 'sin 'sin) -> T
This allows us for example to write a sentence with symbols:
(defvar *sentence* '(mary called tom to tell him the price of the book))
Now we can count the number of THE in the sentence:
(count 'the *sentence*) -> 2
In Common Lisp symbols not only have a name, but they also can have a value, a function, a property list and a package. So symbols can be used to name variables or functions. The property list is usually used to add meta-data to symbols.
3) A notation for code using trees of symbols.
Lisp uses its basic data structures to represent code.
The list (* 3 2) can be both data and code:
(eval '(* 3 (+ 2 5))) -> 21
(length '(* 3 (+ 2 5))) -> 3
The tree:
CL-USER 8 > (sdraw '(* 3 (+ 2 5)))
[*|*]--->[*|*]--->[*|*]--->NIL
| | |
v v v
* 3 [*|*]--->[*|*]--->[*|*]--->NIL
| | |
v v v
+ 2 5
4) The whole language always available. There is no real distinction between read-time, compile-time, and runtime. You can compile or run code while reading, read or run code while compiling, and read or compile code at runtime.
Lisp provides the functions READ to read data and code from text, LOAD to load code, EVAL to evaluate code, COMPILE to compile code and PRINT to write data and code to text.
These functions are always available. They don't go away. They can be part of any program. That means any program can read, load, eval or print code - always.
How are they different in languages like C or Java?
Those languages don't provide symbols, code as data or runtime evaluation of data as code. Data objects in C are usually untyped.
Do any other languages other than LISP family languages have any of these constructs now?
Many languages have some of these capabilities.
The difference:
In Lisp these capabilities are designed into the language so that they are easy to use.
For points (1) and (2), he is talking historically. Java's variables are pretty much the same, which is why you need to call .equals() to compare values.
(3) is talking about S-expressions. Lisp programs are written in this syntax, which provides lots of advantages over ad-hoc syntax like Java and C, such as capturing repeated patterns in macros in a far cleaner way than C macros or C++ templates, and manipulating code with the same core list operations that you use for data.
(4) taking C for example: the language is really two different sub languages: stuff like if() and while(), and the preprocessor. You use the preprocessor to save having to repeat yourself all the time, or to skip code with #if/#ifdef. But both languages are quite separate, and you can't use while() at compile time like you can #if.
C++ makes this even worse with templates. Check out a few references on template metaprogramming, which provides a way of generating code at compile time, and is extremely difficult for non-experts to wrap their heads around. In addition, it's really a bunch of hacks and tricks using templates and macros that the compiler can't provide first class support for - if you make a simple syntax error, the compiler is unable to give you a clear error message.
Well, with Lisp, you have all this in one single language. You use the same stuff to generate code at run time as you learn in your first day. This isn't to suggest metaprogramming is trivial, but it is certainly more straightforward with first class language and compiler support.