Exact alignment of caption beside figure

Alas, @egreg’s explanation, although captivating, is incorrect. (But I too was deceived at first, since I upvoted it! :-) Notwithstanding the fallacious diagnosis, however, the cure he suggests is equally efficacious, so I wouldn’t bother to write this answer, if it weren’t for a detail that might actually confuse the casual reader: @egreg’s answer implies that a \label command occurring as the last thing in a vertical box always smashes the depth of that box, but this is not at all the case.

The correct explanation, however, is a pretty long story, so I must ask you to be patient. Let us begin by recapping the rules by which TeX computes the depth of a vertical box it is constructing with \vbox (the rules for \vtop and \vcenter are related to these, but different), which are detailed in The TeXbook on pages 80 (bottom) and  81 (top). The general rule is quite simple and natural: the depth of the constructed \vbox is the depth of the bottommost box inside it. This rule has only three exceptions:

  1. when the constructed box contains no boxes, in which case its depth is zero (obvious);

  2. when the last box inside the \vbox has glue, or kerning, somewhere below it, in which case the “outer” depth is zero too (seems reasonable);

  3. when the user has set an explicit threshold for the depth, by setting the \boxmaxdepth parameter, in which case the threshold is honored (again, obvious).

In particular, “whatsit” nodes are “transparent” to the depth of the bottommost box, as we’ll show below with a simple experiment.

Now, if a \caption command comes last in a \vbox that is being constructed, in normal circumstances exception 2, above, is applied to the line (or to the last one of the lines) it typesets, because that line does have glue below itself, namely, the \belowcaptionskip glue (see, for instance, the definition of \@makecaption in classes.pdf, in particular code line 1122). This means that, if \belowcaptionskip retains its default value of 0pt, in normal circumstances the caption will not have its baseline aligned with the baseline of an enclosing box, but rather its outer, “off-the-depth” contour, as in the second of the OP’s examples (yes, exception 2 applies also if the amount of glue is zero). So the question to ask is, rather, “Why doesn’t this happen in the first example?”

The point is, here, that in both cases the caption is wrapped in a minipage environment, and the code that gets executed when this environment ends (contained in the macro \endminipage, see its definition in latex.ltx), among other things issues—aha!—an \unskip; the latter is a TeX primitive command that removes the last glue node from the current list. Thus, the \unskip removes the glue coming from \belowcaptionskip, so that exception 2 no longer applies, and the depth of the last line makes it into the level of the enclosing box.

“OK—you ask—but then, why doesn’t this happen in the second example too?” The point, now, is that the \unskip command can remove the last glue node only if it is still the very last node in the current list. Here @egreg’s original explanation steps in: if a \label command follows \caption, then it appends (notice: after \caption has completed, so below the \belowcaptionskip glue!) a “whatsit” node (that will show up in diagnostic tracings as a \write node); the \unskip will then find this “whatsit”, not the glue, and will therefore have no effect at all. Thus, the glue node from \belowcaptionskip will survive, making exception 2 apply. In other words, the presence of the \label command does not directly make any of above exceptions apply, it is only relevant in that it prevents \upskip from doing its job.

Most of the claims set forth above can be proved if you experiment with the following code

\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}

% \setlength{\belowcaptionskip}{\bigskipamount}

% \makeatletter
% 
% \def\endminipage{%
%     \par
% %     \unskip
%     \ifvoid\@mpfootins\else
%       \vskip\skip\@mpfootins
%       \normalcolor
%       \footnoterule
%       \unvbox\@mpfootins
%     \fi
%     \@minipagefalse   %% added 24 May 89
%   \color@endgroup
%   \egroup
%   \expandafter\@iiiparbox\@mpargs{\unvbox\@tempboxa}}
% 
% \makeatother

\showboxbreadth = 1000
\showboxdepth = 10
% \tracingoutput = 1



\begin{document}

\begin{figure}[h]
  \begin{minipage}[b]{.6\linewidth}
    \centering
    \rule{0.99\linewidth}{60pt}
  \end{minipage}%
  \begin{minipage}[b]{.4\linewidth}
    \caption{The quick brown fox jumps over the lazy dog.}
  \end{minipage}
\end{figure}

\begin{figure}[h]
  \begin{minipage}[b]{.6\linewidth}
    \centering
    \rule{0.99\linewidth}{60pt}
  \end{minipage}%
  \begin{minipage}[b]{.4\linewidth}
    \caption{The quick brown fox jumps over the lazy dog.}
    \label{fig:fox}
  \end{minipage}
\end{figure}

% \showlists

\setbox0 = \vbox{
    \hsize = 10cm
    \prevdepth = 2pt % not actually important
    The quick brown fox jumped over the lazy dog.\par
    \label{box}
}

The depth of \verb|\box0| is \the\dp0.

% \showbox0

\unvbox0

\end{document}

and uncomment its various portions to try different alternatives. For example, uncommenting

% \setlength{\belowcaptionskip}{\bigskipamount}

will show you that \belowcaptionskip survives in the second example, but not in the first. Uncommenting

% \makeatletter
% 
% \def\endminipage{%
%     \par
% %     \unskip
%     \ifvoid\@mpfootins\else
%       \vskip\skip\@mpfootins
%       \normalcolor
%       \footnoterule
%       \unvbox\@mpfootins
%     \fi
%     \@minipagefalse   %% added 24 May 89
%   \color@endgroup
%   \egroup
%   \expandafter\@iiiparbox\@mpargs{\unvbox\@tempboxa}}
% 
% \makeatother

will prove that the \unskip at the end of the minipage environment is indeed the culprit for the glue removal (of course, one could also use the \patchcmd utility from the etoolbox package, here). Moreover, uncommenting

% \showlists

will include an enlightening diagostic listing in your transcript file (if you know how to read it); and uncommenting

% \showbox0

will produce a similar listing which proves that the \write node produced by the \label command is indeed the last node inside \box0 (but, notwithstanding this, the depth of the inner box survives at the outer level, as the typeset output shows).

On the other hand, uncommenting

% \tracingoutput = 1

serves no purpose in our case, since all figures have got [h] as their position specifier (and the h happens to be honored by LaTeX!).

I am pretty tired now, so I will leave the explanation concerning the color package “as an exercise” (hint: remove the lua-visual-debug package and look closely… ;-) .

I did say that this was a pretty long story, but the O.P. asked for “enligthening”, and this is what I have tried to provide…


Addition: Answer to the “exercise”

As it is well known, the original TeX did not deal with color at all; in order to implement it, the color package (on which all other packages that provide color facilities are based) have recourse as well to “whatsit” nodes included at appropriate places in the lists being constructed. In particular, in order to properly restore, at the end of a box, the color settings that were in force outside it, a special mechanism is used whereby a certain kind of node, containing the appropriate instructions for the rendering device, is appendd after the box in question. The exact nature of this node, and how it is reported in diagnostic listings, depends on the typesetting engine being used; for instance, it is reported as \pdfcolorstack 0 pop with pdf(La)TeX, or as \special{color pop} with (La)TeX. In any case, this node inhibits the effect of \unskip exactly as a \write node does.

As a matter of fact, if the lua-visual-debug package were not loaded, this node containing the instruction for restoring colors would not be appended after the last line of the caption in the first of the OP’s examples, because the caption, in itself, does not make use of color. Indeed, consider the following variation of the code presented above:

\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}
\usepackage{color}

% \setlength{\belowcaptionskip}{\bigskipamount}

% \makeatletter
% 
% \def\endminipage{%
%     \par
% %     \unskip
%     \ifvoid\@mpfootins\else
%       \vskip\skip\@mpfootins
%       \normalcolor
%       \footnoterule
%       \unvbox\@mpfootins
%     \fi
%     \@minipagefalse   %% added 24 May 89
%   \color@endgroup
%   \egroup
%   \expandafter\@iiiparbox\@mpargs{\unvbox\@tempboxa}}
% 
% \makeatother

\showboxbreadth = 1000
\showboxdepth = 10
% \tracingoutput = 1



\begin{document}

\begin{figure}[h]
  \begin{minipage}[b]{.6\linewidth}
    \centering
    \rule{0.99\linewidth}{60pt}
  \end{minipage}%
  \begin{minipage}[b]{.4\linewidth}
    \caption{The quick brown fox jumps over the lazy dog.}
  \end{minipage}
\end{figure}

\begin{figure}[h]
  \begin{minipage}[b]{.6\linewidth}
    \centering
    \rule{0.99\linewidth}{60pt}
  \end{minipage}%
  \begin{minipage}[b]{.4\linewidth}
    \caption{\color{red}The quick brown fox jumps over the lazy dog.}
  \end{minipage}
\end{figure}

\begin{figure}[h]
  \begin{minipage}[b]{.6\linewidth}
    \centering
    \rule{0.99\linewidth}{60pt}
  \end{minipage}%
  \begin{minipage}[b]{.4\linewidth}
    \caption{The quick brown fox jumps over the lazy dog.}
    \label{fig:fox}
  \end{minipage}
\end{figure}

% \showlists

\setbox0 = \vbox{
    \hsize = 10cm
    \prevdepth = 2pt % not actually important
    The quick brown fox jumped over the lazy dog.\par
    \label{box}
}

The depth of \verb|\box0| is \the\dp0.

% \showbox0

\unvbox0

\end{document}

If you compile it, you will notice that the descenders of the caption of the first figure, that neither uses color nor is followed by a \label command, do extend below the baseline, while those of the other two captions don’t; in the case of the second figure, this is due to the presence of color inside the caption itself (uncomment \showlists and see the transcript file for the full details). However, the lua-visual-debug package includes (when loaded together with color) some colored lines inside the box of the caption, thus, in that case, the first example behaves like the second one (since color is equally present inside that box).


In the first case, the last item in the caption minipage is the second line of the caption. In the second case, the last item is the whatsit TeX places for later transforming it in the \write instruction to the auxiliary file.

This item has no depth, so the reference point of the last item in the second case is at the bottom of the minipage. In the first case, the reference point is at the baseline of the caption's second line.

Solution: use

\caption{The quick brown fox jumps over the lazy dog.\label{fig:fox}}

Now the whatsit belongs to the paragraph and so the reference point is computed like in the first example.