How does LaTeX know the page number of a reference?
The answer to your question in the title is “LaTeX doesn't know”, at least not completely.
How does LaTeX manage cross-references?
Suppose you have
As we shall see in section~\ref{sec!main-results}, ...
...[many pages later]...
\section{Main results}\label{sec!main-results}
We are now ready to prove the most important theorems.
When the \label
is seen, the page with \ref
has long been typeset and output: there's no way to “get back and fix the number”.
So the approach is not “get all the cross references right at once”, because it would need keeping the whole document in memory, before doing any typesetting. Instead, LaTeX writes a note as soon as it outputs a page where a \label
command appeared: in the .aux
file you'll find something like
\newlabel{sec!main-results}{{3}{9}}
where the first number is the section number and the second one is the page number. The note is written out only when shipping out a page, because only then the page number is really known. Remember that TeX always looks ahead and only typesets full paragraphs, before deciding for a page break.
At the end of the job, the .aux
file is closed and input. At such time \newlabel
gets a suitable definition, whose purpose is to check whether the label was already known from a previous run and, in this case, whether one of the associated numbers has changed.
This is the point where you can see warnings such as
Label(s) may have changed. Rerun to get cross-references right
There were multiply-defined labels
Label `<label>' multiply defined
that should be self-explanatory.
At the start of a job, when LaTeX is processing \begin{document}
, the .aux
file is input and \newlabel
gets a different definition, which allows for \ref{sec!main-results}
or \pageref{sec!main-results}
to print the proper number. But what number? In the case above, the section number will be 3, even in case you have added a whole section between runs. Only at the end of the job, LaTeX will know the number has changed and it will issue the first of the warnings listed above.
If a cross-reference is unknown, just ??
will be printed and the warning about changed label will be issued. If a \ref
or \pageref
command refers to a label not yet defined, you get the warning
There were undefined references
but also
Reference `<label>' on page <page> undefined
that will tell you that, maybe, you have misspelled the label.
Should we care about the size of a cross-reference?
Should we care about the space used by the reference? Not really. Paragraphs usually have enough flexibility to allow for shrinking or stretching a line without modifying substantially the output. This is not completely foolproof and there are examples around of cleverly written documents that never stabilize: each new run of LaTeX will change the page number associated to a label so it never remains the same. However, the chances that this happens in a real document are pretty small.
What about multiple runs?
Document processors such as latexmk
are able to look into the .log
file for warnings about changed labels or undefined references and trigger a new run for fixing the output. However it's not so important that at each point in time the cross-references are correct: they'll be when you get no warning like the ones above.
What about the .aux
file?
The .aux
file is used for several other purposes: citations, for example, but also other administrative tasks. Packages, notably hyperref
, can modify the annotations made, by extending the syntax for the two versions of \newlabel
, but dealing with this would be too long. The idea is still the same.
Important note. It's clear from this description, that preserving the integrity of the .aux
file between runs is essential. This file should generally not be removed, unless it has become corrupt because of some fatal error. An incomplete annotation might cause an error when the file is input: interrupting the LaTeX run at this error will preserve the same corrupt file, so at the next run the same error will reappear. In such cases, removing the .aux
file is the only remedy. Not a big deal, it will cost a new run of LaTeX (maybe two). But, of course, removing a correct .aux
file at the end of a run will always produce errors about undefined reference.
Finally, there is a switch that makes LaTeX not touch any of the file it writes out: if you add \nofiles
in the preamble, the .aux
file and the ones used for the table of contents and similar lists will only be input and not rewritten. It's a relic of the past, when even writing to a file or just keeping some open caused delays, so when one was sure that cross-reference and lists were correct, adding \nofiles
saved some running time. Nowadays, the overhead is so small that such a trick is almost useless.
It writes page numbers of all references to the aux file. Note that on the first run, when there is no aux file, it puts ?? instead. On later runs, it first reads the aux file. Then, as it makes note of page numbers, it will compare the page number to what was written to the aux file on the previous run. If any references differ, it will issue a warning at the end of the run.
It is conceivable, though, that this process will never terminate. As the number of digits in the page number goes up, it is conceivable (barely) that the paragraph will become a line shorter, due to the way the paragraph building algorithm works. And that could, in turn, move the referenced page one page back, shortening the page number once more. But I think you'll have to work really hard to come up with an example of this.
If you number your pages with roman numerals, it is a bit easier: Now a later page number can have a shorter number, which more plausibly shortens the paragraph by a line.
This is why you need to latex two or three (or more times) if you latex myfile.tex
then look in myfile.aux
and you will see all the cross reference information. So on the first run references are just ?
but after that it uses the values written out the previous time. At the end of the run latex does a consistency check that the values it has written are the same as the ones read in, if they are different it issues the "labels have changed, re-run LaTeX" warning.