What does it mean that "Lisp can be written in itself?"
He isn't saying that Lisp can be used to write a Lisp compiler. He's saying that the language is made up from its own data structures. So while you can't build up a C function out of C data structures, you can do that in Lisp. A program is made up of lists that are executed by your computer, and the effect of those lists can be to create other lists that are then executed, and the effect of those lists can be to create still more lists to be executed. C doesn't have this property. C code can't, for example, manipulate its own AST.
Well, the link you provided does go on to say that, if you continue reading, he will answer your question in detail.
The unusual thing about Lisp-- in fact, the defining quality of Lisp-- is that it can be written in itself. To understand what McCarthy meant by this, we're going to retrace his steps, with his mathematical notation translated into running Common Lisp code.
Probably what Paul means is that the representation of Lisp syntax as a Lisp value is standardized and pervasive. That is, a Lisp program is just a special kind of S-expression, and it's exceptionally easy to write Lisp code that manipulates Lisp code. Writing a Lisp interpreter in Lisp is a special case, and is not so exciting as the general ability to have a unified representation for both code and data.
I just deleted a very long reply that is probably inappropriate here.
However consider that:
LISP has no "syntax"(1) if you mean it with the meaning it has for languages like C/Java/Pascal... there is an (initial but customizable) syntax for the Common LISP reader, but that's a different thing (LISP that Graham is talking about is not Common LISP, and a (the) reader is not the LISP language, but just a procedure). Something like "(lambda (x) (* x 2))" is not LISP code, but text that for example the CL standard reader can convert to LISP code.
LISP not only can be written in LISP (if you mean the "bootstrap" ability) but it actually got into existence that way. The very first implementation of eval in late 1950's was written in LISP on paper, and then converted manually into machine language(2): LISP started as a purely theoric idea not meant to be implemented. I know no other computer language that followed that path. For example C++ was conceived as a pre-processor for a C compiler and was written in C, it wasn't a C++ program later converted to C to be able to run it.
There are many other aspects in which LISP is quite different, and I think that the best way to get a grasp of it is to actually implement a toy LISP interpreter (it's a job smaller than one would think especially if your "machine language" is an high-level dynamically typed language like Python).
(1) actually there are two predefined syntax levels in LISP; the first is the syntax of the reader, i.e. defines rules for how source characters are translated into s-expressions, the second define rules of how s-expressions are understood by the compiler when generating actual machine code. But if there are two syntax levels in LISP then why it's correct to say that LISP has no syntax? the reason is that neither them is fixed. The first level for example is handled by the standard common lisp reader using "read tables" that can be customized to have your code executed when a certain character is found in the source code. The second level can be customized using macros, symbol macros and compiler macros and this allow the definition of new syntax constructs. In other words LISP has no fixed syntax and it's possible to write a LISP program that begins as standard LISP and after a while becomes identical to Python (I am not making this up, there is (was) a cl-python implementation that supported exactly this as mixed source mode: anything starting with an open parenthesis was considered using LISP syntax, other characters as Python syntax).
(2) in http://www-formal.stanford.edu/jmc/history/lisp/node3.html you can find how McCarthy describes that eval[e, a]
was found on paper first as an interesting theoretical result (a "universal function" implementation neater than an universal Turing machine) when only the data structures and elementary native functions had been laid out for the Lisp language the group was building. This hand-written function was implemented by hand by S.R. Russell in machine code and started serving them as the first Lisp interpreter.