How to compile the source code of TeX
You are linking there to plain.tex
which is a file written in TeX not the source of tex-the-program (which is tex.web
)
These days if you want to compile from source it is probably best to start with a full download of the texlive build sources.
The sources are at
http://www.tug.org/texlive/svn/
and that page has hints about where to start if you want to compile. See in particular:
http://www.tug.org/texlive/build.html
Good luck:-)
David Carlisle explains how to compile the sources for the modern versions of Tex that are the basis for Texlive (Pdftex, Xetex, and Luatex, among others). These derive from Karl Berry's Web2c fork of Knuth's source code, which is a mechanical way of translating code Web sources to C code that can be compiled just about anywhere.
If you want to compile sources that are closer to what Knuth wrote (and documents in The TeXbook), take a look at:
http://www.ctan.org/tex-archive/systems/unix/tex-gpc/
This project allows you to compile Pascal WEB sources directly, using GNU Pascal. This apparently wasn't trivial; as the author, Wolfgang Helbig, writes:
I was somewhat intrigued while building TeX from its sources, since some of these depend on others to be built and installed. Knuth wrote these programs in the WEB language (WEB is only remotely related to the last W from CERN's WWW). WEB programs are converted to Pascal sources by tangle and to a TeX input file by weave. Of course, tangle and weave are WEB programs as well. So one needs tangle to build tangle---and weave and TeX to read a beautifully typeset WEB program. But don't despair, I cut this indefinite recursion and provided tangle.p, the Pascal source of tangle, and
tex.pdf
. It shows what, why and how I changed Knuth's program.
His tex.pdf documents in minute detail these changes.
The original sources for TeX (and friends) can be found on Knuth's CTAN. All the sources have very comprehensive documentation, but trying to compile them is still an epic task.
TeX is written in WEB, a programming language invented by Knuth. So first we're going to need WEB.
WEB is written in WEB itself, it consists of two programs: weave which produces the TeX documentation from a WEB program, and tangle which produces Pascal code from a WEB program. To compile the WEB system we need an implementation of tangle; you can either get one from an existing TeX package, or you can compile it from Pascal source.
Note that if you want to read about the implementation of any WEB program you can use weave and TeX to produce copious documentation; this is a good starting point to Knuth's code. (For TeX and Metafont you can buy a printed version of the WEB output as Computers and Typesetting volumes B and D respectively.)
Now we need to talk a little about the dialect of Pascal the Knuth uses, which he calls Pascal H. TeX was written before Pascal was standardised; to my knowledge no native Pascal H compiler exists and it is not compatible with modern Pascal compilers. However Knuth wrote the programs in a relatively portable way so it's only moderately Herculean to port them. At this point you have some choices:
- Write a compiler for Knuth's Pascal H
- Port the WEB source to an existing Pascal dialect using change files
- Translate Knuth's Pascal H to another programming language
tex-gpc takes approach 2, TeX live (and Miktex) take approach 3 via web2c.
Now if you can do this the actual process of initialising and running TeX (initex, fonts, etc.) will be relatively easy; make sure you validate your build against TRIP (see the TeX sources). If you're feeling adventurous do the same for Metafont, the TeX tools, Metafont tools, and WEB.
About the design of TeX and Metafont: these programs were designed by Knuth to be highly robust, efficient and portable in the late 1970s. Today programmers take for granted the speed of modern processors and programming standards that allow them to write adequately functioning programs much more quickly. Much of what happens in these programs (e.g. carefully enumerating character codes, statically allocating memory at compile time, on-line error recovery) rarely happens in today's programming; and many of the modern annoyances (having to compile LaTeX twice for back references, difficulty with fonts, the intricacy of the macro language) are a result of these design goals and decisions. I wouldn't advocate Knuth's methods for most projects today involving multiple people, efficient computers, and tight deadlines. Still TeX is among the oldest programs to still be running today (and into the future unless LuaTeX supplants it), delicately designed, intricately implemented, pretty portable and copiously documented.
Good luck!