Rather simple, instructional, educational and illustrative web program source codes in Knuth's WEB literal programming language?
Here is a small hello.web
, that doesn't use much of the features of WEB:
@* Introduction.
This program takes an integer $n$ as input, and prints ``Hello world'' $n$ times.
@p program HELLO(input, output);
var
n: integer;
i: integer;
begin
read(n);
for i := 1 to n do
begin
writeln('Hello world');
end;
end.
@* Index. Not much to it. Everything occurs in section 1.
After weave hello.web
followed by tex hello.tex
, the resulting typeset output starts like:
For more details, read the WEB manual, available with texdoc webman
or online.
Instead of “Hello World”, how about a small-ish program that was specifically intended to illustrate WEB?
In the November 1981 issue of TUGboat, Knuth mentioned that he was developing a system called WEB. Then in the next (March 1982) issue, he published “Fixed-Point Glue Setting: An Example of WEB”, saying
I will soon be publishing a complete manual about WEB, but in the meantime I think it will be useful to have an example of a fairly short piece of code written in "web" form. Therefore I have prepared the accompanying program, which also serves another function: ...
You can read the program itself with texdoc glue
, or here. The glue.web
that you need is here.
I was able to test this glue.web
just now with another Pascal compiler (fpc), by first running tangle glue.web
to generate glue.p
, then adding {$mode ISO}
to the top of glue.p
(so that when the program says “integer”, a 32-bit type is used rather than a 16-bit type), then running fpc glue.p
to generate the glue
binary. (I tried web2js but it crashed.)
Then to test the binary, put the following input (which consists of 7 test cases—pairs of lines—followed by 0) in a file (or type it in at the terminal). This input is the same as that used by Knuth.
200000
30000 40000 50000 60000 0
2000
30000 40000 50000 60000 0
1000000000
8000000 -9000000 8000000 4000 7000000 0
100
8000000 -9000000 8000000 4000 7000000 0
1000000000
800 -900 800 400 700 0
1000000000
800 -900 800 400 -700 0
65555
-200 199 0
1
60000 -59999 90000 0
0
The output should be (matching that published in TUGboat):
Test data set number 1:
Glue ratio is 1.1111 (0,14,18205)
30000 33334
40000 44445
50000 55557
60000 66668
Totals 180000 200004 (versus 200000)
Test data set number 2:
Glue ratio is 0.0111 (0,21,23302)
30000 333
40000 444
50000 555
60000 666
Totals 180000 1998 (versus 2000)
Test data set number 3:
Glue ratio is 71.4101 (8,0,18281)
8000000 571281250
-9000000 -642686836
8000000 571281250
4000 274215
7000000 499857383
Totals 14004000 1000007262 (versus 1000000000)
Test data set number 4:
Glue ratio is 0.0000 (8,24,30670)
8000000 57
-9000000 -64
8000000 57
4000 0
7000000 49
Totals 14004000 99 (versus 100)
Test data set number 5:
Glue ratio is 2x2x2x2x2x2x8681.0000 (-6,1,17362)
800 444467200
-900 -500025600
800 444467200
400 222233600
700 388908800
Totals 1800 1000051200 (versus 1000000000)
Test data set number 6:
! Excessive glue.
Glue ratio is 2x2x2x2x2x2x2x0.0000 (-6,0,0)
800 0
-900 0
800 0
400 0
-700 0
Totals 400 0 (versus 1000000000)
Test data set number 7:
Invalid data (nonpositive sum); this set rejected.
Test data set number 8:
Glue ratio is 0.0000 (1,30,23861)
60000 0
-59999 0
90000 1
Totals 90001 1 (versus 1)
To illustrate WEB a bit better, here is a rather elaborate version of the hello-world program that uses most of the main features of WEB: named and unnamed sections; simple, numeric, and parametric macros; some formatting and indexing controls; and, most exotic of all, the string pool.
All that the program does is print “Hello world” N times after reading a number N (so there's nothing mathematically or algorithmically non-trivial to add difficulty, unlike the GLUE or PRIMES programs), but it is split up into modules, and uses double-quoted strings that (TANGLE writes and) the program reads from the string pool file, maintaining the strings in an array of characters as TeX and other Knuth programs do.
Here is the program (also uploaded here):
% A Hello-world program.
\def\title{Hello}
% Hack to change link colours (if used with pdfwebmac)
\def\BlueGreen{\pdfsetcolor{\cmykRed}}
@* Introduction.
This program takes an integer $n$ as input and prints ``Hello world'' $n$ times.
(There is no Pascal code in this section.)
@ We give part of the program here, and it will continue later.
@p
@<Compiler directives@>
program HELLO(input, output);
var
@<global variables@>
@ What global variables do we need? For one thing, we need the $n$.
@<global...@>=
@!n: integer;
@ For |integer| variables to be treated as 32-bit by the Pascal compiler,
on FPC we need a special compiler directive.
@<Compiler di...@> = @{@&$mode iso@}
@^system dependencies@>
@* The string pool and file I/O.
The WEB feature of string pools was designed at a time when Pascal compilers
did not have good support for strings. Now it may be no longer necessary, but to
illustrate the feature we will maintain a string pool.
More specifically, we will maintain a large array of characters, named |str|.
All characters of all strings from the string pool go into this array: the $n$th
string occupies the positions from |str_start[n]| to |str_start[n+1] - 1|
(inclusive) in this array, where |str_start| is an auxiliary array of integers.
Also, the number of strings currently in the string pool is stored in an integer
variable called |str_count|.
By convention, the first $256$ strings are the one-character (one-byte) strings.
For this program we don't need too many additional strings. In fact we need just
a few strings, but we'll support $10$ strings with a total of $1000$ characters.
@d max_strings = 256 + 10
@d max_total_string_length = 1000
@<global var...@>=
@!str: array[0..max_total_string_length-1] of char;
@!str_start: array[0..max_strings-1] of integer;
@!str_count: integer;
@ To use this string pool, we have a procedure that reads out characters from it
one-by-one. Specfically, |print(k)| prints the $k$th string, and |println| and
|printnl| are convenience macros.
@d println(#) == begin print(#); writeln; end
@d printnl(#) == begin writeln; print(#); end
@p
procedure print(n: integer);
var
i: integer;
begin
@{ writeln('For ', n, ' will print characters from ', str_start[n], ' to ', str_start[n + 1] - 1); @}
for i := str_start[n] to str_start[n + 1] - 1 do
begin
write(str[i]);
end;
end;
@ We'll have a procedure to populate this array by reading from the pool file,
but unfortunately that means we need to figure out file input. How this is done
depends on the Pascal compiler. In FPC, a file of characters can be declared as
a variable of type |TextFile|, initialized with |Assign| and |Reset|, then read
with |read|.
@^system dependencies@>
@p
procedure initialize_str_array;
var
pool_file: TextFile;
x, y: char; { for the first two digits on each line}
@!length: integer;
i: integer;
begin
str_count := 0;
str_start[0] := 0;
for i := 0 to 255 do
begin
str[i] := chr(i);
str_start[i + 1] := str_start[i] + 1;
str_count := str_count + 1;
end;
Assign(pool_file, 'hello.pool');
Reset(pool_file);
while not eof(pool_file) do
begin
read(pool_file, x, y);
if x = '*' then @<check pool checksum@>
else begin
length := 10 * (ord(x) - "0") + ord(y) - "0";
str_start[str_count + 1] := str_start[str_count] + length;
for i := str_start[str_count] to str_start[str_count + 1] - 1 do
begin
read(pool_file, str[i]);
end;
readln(pool_file);
str_count := str_count + 1;
end
end;
end;
@ To ensure that the pool file hasn't been modified since tangle was run, we can
use the @@\$ (= |@t\AT!\$@>| = at-sign, dollar-sign) feature. We can reuse
(abuse?) the |y| and |length| variables for reading characters and maintaining
the checksum read from the file.
@<check pool...@> =
begin
length := ord(y) - "0";
while not eof(pool_file) do
begin
read(pool_file, y);
if ("0" <= ord(y)) and (ord(y) <= "9") then
length := length * 10 + (ord(y) - "0");
end;
if length <> @$ then
begin
writeln('Corrupted pool file: got length: ', length : 1, '; rerun tangle and recompile.');
Halt(1);
end
end
@* Main program.
Apart from |n|, we also need an |i| to loop over.
@<glob...@> =
i: integer;
@ Here finally is the ``main'' block of the program.
@p
begin
initialize_str_array;
print("How many times should I say hello? ");
read(n);
printnl("OK, here are your "); write(n : 1); println(" hellos: ");
for i := 1 to n do
begin
println("Hello, world!");
end;
print("There, said hello "); write(n : 1); println(" times.");
end.
@* Index. If you're reading the woven output, you'll see the index here.
Running tangle
(to get hello.p
and hello.pool
) and then a Pascal compiler shows the program working correctly:
% tangle hello.web
This is TANGLE, Version 4.5 (TeX Live 2018)
*1*5*9*11
Writing the output file
Done.
6 strings written to string pool file.
(No errors were found.)
% cat hello.pool
35How many times should I say hello?
18OK, here are your
09 hellos:
13Hello, world!
18There, said hello
07 times.
*332216284
% fpc hello.p
Free Pascal Compiler version 3.0.4 [2018/10/02] for x86_64
Copyright (c) 1993-2017 by Florian Klaempfl and others
Target OS: Darwin for x86_64
Compiling hello.p
Assembling (pipe) hello.s
Linking hello
26 lines compiled, 0.1 sec
% echo 5 | ./hello
How many times should I say hello?
OK, here are your 5 hellos:
Hello, world!
Hello, world!
Hello, world!
Hello, world!
Hello, world!
There, said hello 5 times.
And running weave
:
weave hello && sed -i=.bak "s/webmac/pdfwebmac/" hello.tex && pdftex hello.tex
results in a 7-page typeset PDF that is the version of the program “supposed” to be read usually.
Some additional sources of information.
In 1987, Knuth gave a series of lectures on mathematical writing; these also often discussed computer science writing. He devoted two lectures to literate programming, naturally using WEB
(CWEB
was also developed in 1987, but the lectures may well predate it).
While the whole series consists of twenty-one lectures, they are grouped by subject and are relatively self-contained; you do not need to watch all of them to understand the two relevant here:
Literate programming (1) - https://www.youtube.com/watch?v=U8LttJ1rvWI
Literate programming (2) - https://www.youtube.com/watch?v=ObxmXC2NCMA
He goes through a few WEB
programs written by his students and critiques them, all while discussing various features of WEB
and literate programming in general.
But that's not all. There was a different series of lectures in 1982 on ``The internal details of TeX82''; the first few of these often touch upon WEB
, but the first is the most relevant. I would recommend watching the whole series if you're interested in, well, the internal details of TeX82.
(But do note some anachronisms; while most of the details haven't changed, there is at least one big discrepancy between the TeX82 discussed in the lectures and the TeX82 we know today in the existence of a \chcode
primitive. This single primitive was used where either \catcode
or \mathcode
would be used now (see error #395 in the errorlog). Oh, and the recordings of the computer's display are largely illegible, but if you're familiar with WEB
and TeX then you should still be able to follow along.)
@ShreevatsaR pointed out in a comment that the differences between the TeX82 lectured about and the TeX82 we know are not necessarily trivial, and linked to the tex.web
version (the banner
said version -.25
) as of the time of the lectures. I went to go read some of it and realized that, even if you're somewhat familiar with WEB
and TeX, it could prove extremely difficult to follow along using the WEB
source (especially without a woven, typeset version).
So let's go through some parts of it, comparing it with TeX version 3.14159265 (henceforth "OldTeX" is version -0.25 as presented in the lectures and "TeX" is the modern form). As for line numbers, the reader should copy and paste the code from the website, and NOT save the entire page; doing the latter would require removing all of the HTML entities and markup from the code (and there's a lot of code).
First, of course, there's the first 61 lines:
COMMENT ⓧ VALID 00057 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00006 00002 % This program is copyright 1982 by D. E. Knuth all rights are reserved.
C00010 00003 @* \[1] Introduction.
C00042 00004 @* \[2] The character set.
C00056 00005 @* \[3] Input and output.
C00079 00006 @* \[4] String handling.
C00095 00007 @* \[5] On-line and off-line printing.
C00111 00008 @* \[6] Reporting errors.
C00133 00009 @* \[7] Arithmetic with scaled dimensions.
C00148 00010 @* \[8] Packed data.
C00158 00011 @* \[9] Dynamic memory allocation.
C00175 00012 @* \[10] Data structures for boxes and their friends.
C00211 00013 @* \[11] Memory layout.
C00224 00014 @* \[12] Displaying boxes.
C00243 00015 @* \[13] Destroying boxes.
C00247 00016 @* \[14] Copying boxes.
C00253 00017 @* \[15] The command codes.
C00267 00018 @* \[16] The semantic nest.
C00280 00019 @* \[17] The table of equivalents.
C00332 00020 @* \[18] The hash table.
C00351 00021 @* \[19] Saving and restoring equivalents.
C00372 00022 @* \[20] Token lists.
C00384 00023 @* \[21] Introduction to the syntactic routines.
C00391 00024 @* \[22] Input stacks and states.
C00423 00025 @* \[23] Maintaining the input stacks.
C00430 00026 @* \[24] Getting the next token.
C00459 00027 @* \[25] Expanding user macros.
C00481 00028 @* \[26] Basic scanning subroutines.
C00531 00029 @* \[27] Building token lists.
C00544 00030 @* \[28] File names.
C00572 00031 @* \[29] Font metric data.
C00628 00032 @* \[30] Device-independent file format.
C00665 00033 @* \[31] Shipping pages out.
C00721 00034 @* \[32] Packaging.
C00749 00035 @* \[33] Data structures for math mode.
C00778 00036 @* \[34] Subroutines for math mode.
C00799 00037 @* \[35] Typesetting math formulas.
C00853 00038 @* \[36] Alignment.
C00902 00039 @* \[37] Breaking paragraphs into lines.
C00963 00040 @* \[38] Breaking paragraphs into lines, continued.
C00992 00041 @* \[39] Pre-hyphenation.
C01004 00042 @* \[40] Post-hyphenation.
C01020 00043 @* \[41] Hyphenation.
C01038 00044 @* \[42] Initializing the hyphenation tables.
C01067 00045 @* \[43] Breaking vertical lists into pages.
C01084 00046 @* \[44] The page builder.
C01131 00047 @* \[45] The chief executive.
C01157 00048 @* \[46] Building boxes and lists.
C01218 00049 @* \[47] Building math lists.
C01267 00050 @* \[48] Conditional processing.
C01281 00051 @* \[49] Mode-independent processing.
C01325 00052 @* \[50] Dumping and undumping the tables.
C01350 00053 @* \[51] The main program.
C01362 00054 @* \[52] Debugging.
C01367 00055 @* \[53] Extensions.
C01389 00056 @* \[54] System-dependent changes.
C01390 00057 @* \[55] Index.
C01391 ENDMK
Cⓧ;
At SAIL, the main system text editor (and this is referenced often in the
lectures) was page-oriented; there was also a line editor with ex-like
functionality. Pages are marked with the formfeed character, which may
display as a square with FF
, and can be entered in a terminal using control-L
(which usually functions as a clear
command if typed at the shell).
Interestingly, the
GNU C coding style guide
recommends dividing source code with formfeed characters, but I've personally
never seen this in any code past 1990.
Regardless, this "header" just lists the pages of the file and their first lines. In an editor which supports pagination, one could easily jump to Part 30 by jumping to page 32. It can be ignored. You can safely replace all formfeeds with nothing and not change anything important.
Next we have the limbo section, which I'll divide into parts.
% This program is copyright 1982 by D. E. Knuth; all rights are reserved.
% Please don't make any changes to this file unless you are D. E. Knuth!
% Version 0 is fully implemented but not yet fully tested, so beware of bugs.
% Here is TeX material that gets inserted after \input webhdr
\def\hang{\hangindent 3em\ \unskip\!}
\def\textindent#1{\hangindent 2.5em\noindent\hbox to 2.5em{\hss#1 }\!}
\def\at{@@} % use for an at sign
\chcode@@=13 \def@@{\penalty999\ } % ties words together
\def\TeX{T\hbox{\hskip-.1667em\lower.424ex\hbox{E}\hskip-.125em X}}
\font b=cmr9 \def\mc{\:b} % medium caps for names like PASCAL
\def\PASCAL{{\mc PASCAL}}
\def\ph{{\mc PASCAL-H}}
\font L=manfnt % font used for the METAFONT logo
\def\MF{{\:L META}\-{\:L FONT}}
\def\<#1>{$\langle#1\rangle$}
\def\kern{\penalty100000\hskip}
For comparison (omitting the copyright comments), here is the corresponding
code in the canonical tex.web
:
% Here is TeX material that gets inserted after \input webmac
\def\hang{\hangindent 3em\noindent\ignorespaces}
\def\hangg#1 {\hang\hbox{#1 }}
\def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces}
\font\ninerm=cmr9
\let\mc=\ninerm % medium caps for names like SAIL
\def\PASCAL{Pascal}
\def\ph{\hbox{Pascal-H}}
\def\pct!{{\char`\%}} % percent sign in ordinary text
\font\logo=logo10 % font used for the METAFONT logo
\def\MF{{\logo META}\-{\logo FONT}}
\def\<#1>{$\langle#1\rangle$}
\def\section{\mathhexbox278}
Well, the first difference is that we're \input
ting webhdr
and not
webmac
. This is simple: in TeX78 and seemingly OldTeX, it appears that the
.tex
files containing the definitions for a format have filenames that end in
hdr.tex
instead of mac.tex
. Examples: manmac.tex
was manhdr.tex
, and
the corresponding file to modern-day taocpmac.tex
(an illegal filename at
SAIL because it's longer than ten characters) was acphdr.tex
. And naturally
webmac.tex
was webhdr.tex
, though for a short time.
(I suspect that this convention is related to a similar one for SAIL source
code (see TEXHDR.SAI, and it
was changed to distance TeX from SAIL and make it more independent; but this is
just a hypothesis.)
Then come the definitions of \hang
, \hangg
(absent in OldTeX), and
\textindent
. These are used for itemizing; presumably finer control was
desired later. In OldTeX and TeX78, \!
was a primitive that appears to have
been synonymous with TeX's \ignorespaces
.
I'm not sure exactly what the purpose of \ \unskip\!
is. \␣
was a "forced
space" as it is in TeX. \unskip
is listed in TeX and METAFONT, New
Directions in Typesetting as a "recent addition", and deletes the most recent
glue. The expression thus seems to mean <space><remove the space><supress
following spaces>
, which doesn't make sense to me.
The next definitions for OldTeX setup \at
to produce an at sign, and then
makes the at sign an active character (remember that any @@
in a WEB
file
is unconditionally converted to a single at sign) and defines it to produce a
non-breaking space or tie---essentially, @
did what ~
does. Note in plain
TeX, ~
is defined to be a penalty of 10000 as opposed to @
's 999. The
change of @
's category code is accomplished with \chcode
; as I stated
above, this is essentially a combination of \catcode
and \mathcode
. One
difference between OldTeX and TeX78 is shown here: in TeX78, the expression
would be \chcode@@=12
(category codes started at zero instead of one).
The reason for TeX's \def\pct!...
is that webmac.tex
defines \%
to be a
percent sign in the \tt
font. webhdr.tex
does not. \TeX
's definition is
not part of basic
, OldTeX and TeX78's default format, analogous to TeX's
plain
.
Note that the retaining of \PASCAL
even though the name eventually came to be
set in lowercase is likely just for compatibility.
Font handling (in terms of defining fonts and switching to them) was actually
surprisingly complicated in OldTeX and TeX78. TeX78 font names were single
characters, not control sequences; the actual internal font number to which
they refer is determined by taking the last five bits of the ASCII code
corresponding to that character, limiting the number of addressable fonts to
32. Later this was extended to 256. Suffice it to say that \font b=cmr9
and
then \:b
is the same as \font\b=cmr9
and then \b
. I'm not familiar with
any further details.
Now we can move on. The remainder of the limbo parts of both versions shouldn't present too many difficulties.
OldTeX
\def\(#1){} % this is used to make module names sort themselves better
\def\9#1{} % this is used for sort keys in the index via @:sort key}{entry@>
\outer\def\N#1. \[#2]#3.{\par\mark{#1}\vfill\eject % beginning of starred module
\gdef\position{\:a#2\:ux\:a\topmark} % for part numbers
\xdef\rhead{\uppercase{\!#3}}
\sendcontents{\Z{\]#2]#3}{#1}{\count1}}
\Q\noindent{\bf#1.\quad\!#3.\quad}\!}
\def\title{\TeX82}
\def\contentspagenumber{1}
\def\topofcontents{\hsize 5.5in
\topspace 0pt plus 1fil minus 1fil
\def\]##1]{\hbox to 1in{\hfil##1.\ }}
}
\def\botofcontents{\vskip 0pt plus 1fil minus 1fil\setpage\let\]=\let}
\def\lheader{\hbox to1.5em{\:a\hss\count0}\:m\qquad\rhead\hfill\title\qquad
\position} % top line on left-hand pages
\def\rheader{\position\:m\qquad\title\hfill\rhead\qquad
\hbox to1.5em{\:a\hss\count0}} % top line on right-hand pages
\setcount0 \contentspagenumber
\topofcontents
\ctrline{(replace this page by the contents page printed later)}
\botofcontents
\mark{1}\eject
TeX
\def\(#1){} % this is used to make section names sort themselves better
\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@>
\outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section
\def\rhead{PART #2:\uppercase{#3}} % define running headline
\message{*\modno} % progress report
\edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next
\ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces}
\let\?=\relax % we want to be able to \write a \?
\def\title{\TeX82}
\def\topofcontents{\hsize 5.5in
\vglue 0pt plus 1fil minus 1.5in
\def\?##1]{\hbox to 1in{\hfil##1.\ }}
}
\def\botofcontents{\vskip 0pt plus 1fil minus 1.5in}
\pageno=3
The TeX and METAFONT sources patch \N
(all WEB
versions), to modify the
running header and the table of contents (for aesthetic reasons). OldTeX and
TeX go in different directions for this.
I must admit I'm not sufficiently well-versed in WEAVE to explain the usage of
\()
and \9
. \9
is internal to WEAVE; it isn't used anywhere in tex.web
.
\()
, on the other hand, is frequently used within module names that share a
common prefix with several others (this isn't a comment on WEAVE's internal
data structures or how it searches by prefix, but just an observation).
For instance, here are four module names as they appear in order within a
single section (namely 453):
@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>=
@<Scan for \(u)units that are internal dimensions;
|goto attach_sign| with |cur_val| set if found@>;
@<Scan for \(m)\.{mu} units and |goto attach_fraction|@>;
@<Scan for \(a)all other units and adjust |cur_val| and |f| accordingly;
|goto done| in the case of scaled points@>
The three that share "Scan for" have \()
, but "Scan units..." doesn't. Again,
the exact use and meaning of this are not clear yet to me; perhaps someone more
informed can chime in.
Wow; now we can actually get into some code! For every single difference
between the source for OldTeX and the source for TeX, I've made a
gist that
simply contains the output of running diff oldtex.web tex.web
. You'll see
that there are, of course, thousands of changes; most of them are typographical
corrections/changes, where "typographical" refers to both the text of the code
and the typography of the woven TeX output. There's also generally more
indexing in TeX. (I think that, unfortunately, the indexing commands often make
the code somewhat difficult to read; this may be an issue for people just
getting into WEB
.)
But here's an extremely important kind of difference, prevalent throughout the
entire source: the character set. These days, something incompatible with ASCII
is quite rare, but at the time of SAIL's installation it was all too prevalent.
This is why all WEB
programs that do any input and output and are expected to
be portable would convert the input into an internal, ASCII-like format. SAIL's
character set is an extension of ASCII, adding many mathematically useful
characters. Here's an excerpt of the source of OldTeX, after removing WEB
commands and changing the indentation to enhance clarity:
@<Accumulate the constant...@>=
loop begin
if (cur_tok<zero_token+radix)∧(cur_tok≥zero_token)∧(cur_tok≤zero_token+9) then
d←cur_tok-zero_token
else if (radix=16)∧(cur_tok≤A_token+5)∧(cur_tok≥A_token) then
d←cur_tok-A_token+10
else
goto done;
vacuous←false;
if (cur_val≥m)∧((cur_val>m)∨(d>7)∨(radix≠10)) then
begin if OK_so_far then
begin print_nl("! Number too big");
help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")
("so I'm using that number instead of yours.");
error; cur_val←infinity; OK_so_far←false;
end;
end
else cur_val←cur_val*radix+d;
get_nc_token;
end;
done:
The characters ∧
, ∨
, ≤
, ≥
, ←
, and ≠
are all not standard ASCII. To
have TeX run on as many systems as possible, and to make the porting process
as painless as possible, these evidently couldn't be used; and the modern
tex.web
is indeed pure ASCII. The TeX78 language also made heavy use of
SAIL's character set; ⊗
would be used where &
is now when aligning. This
was also done away with. Reading the source of TeX and the TeXbook, one gets
the idea that Knuth was somewhat resentful at having to limit everything to the
"inferior" standard ASCII.
To change the WEB
code from SAIL's character set to standard ASCII, a SAIL
program was written by David R. Fuchs, which automatically performs the
conversion. This program was called
UNDEK. That name is amusing
already, but even more so when you consider that the initial prototype of the
TANGLE component of the WEB
system was named
UNDOC!
At any rate, the final form of the @<Accumulate the constant...@>
module is
thus:
@<Accumulate the constant...@>=
loop begin
if (cur_tok<zero_token+radix)and(cur_tok>=zero_token)and
(cur_tok<=zero_token+9) then
d:=cur_tok-zero_token
else
if radix=16 then
if (cur_tok<=A_token+5)and(cur_tok>=A_token) then
d:=cur_tok-A_token+10
else
if (cur_tok<=other_A_token+5)and(cur_tok>=other_A_token) then
d:=cur_tok-other_A_token+10
else
goto done
else
goto done;
vacuous:=false;
if (cur_val>=m)and((cur_val>m)or(d>7)or(radix<>10)) then
begin if OK_so_far then
begin print_err("Number too big");
help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")
("so I'm using that number instead of yours.");
error; cur_val:=infinity; OK_so_far:=false;
end;
end
else cur_val:=cur_val*radix+d;
get_x_token;
end;
done:
(I really agree with Kernighans points about semicolons in Section 5 of his
article critiquing Pascal; I honestly don't know if the quadruply-nested if
statement is correctly indented.)
The biggest issue (definitely more difficult to deal with than the character
set) is the lack of a printed version of the woven source of OldTeX. Throughout
the lectures, reference is frequently made to exact module numbers. Here is an
approximation to a solution: simply search (after removing formfeeds) for an at
sign at the beginning of a line, followed by either a space or an asterisk;
this can be represented in a common notation for pattern matching as
^@( |\*)
. Then by going to the nth result you effectively go to the nth
module. It's not a very satisfactory solution, but oh well.
(Okay that was more of an information dump than a walkthrough, but now it's documented.)