Implementing a pullquotes algorithm in LaTeX

Ok, here's my approach.

The basic idea is to typeset both columns as one box which is split afterwards. The outer form of the text is predefined as one big \parshape construct.

The algorithm in more detail:

  1. Measure the picture to be included and calculate the number of lines it will occupy, as well as the vertical position where it will be placed.
  2. From this, generate a "global" parshape construct covering the concatenation of both columns and forming the 'inverse' of the picture shape at the calculated position.
  3. Typeset the text in a \vbox. The "global" parshape is applied to every paragraph. Note that a single paragraph covering both columns is a possible fringe case.
  4. At every occurrence of \par, an appropriate prefix is stripped from the "global" parshape construct to reflect the number of lines taken up by the preceding paragraph before re-applying the \parshape to the next paragraph. \prevgraf gives the number of lines in the preceding paragraph.
  5. Split the \vbox into two columns which are then arranged with the picture.
  6. There is one more complication to be considered: For calculating the parshape, the number of lines in a column has to be known. OTOH, the number of lines cannot be known precisely without knowing the parshape, as this influences the numer of lines neccessary for typesetting the text. Hence, an iterative approach is taken, starting with a rough estimate of the line count and then successively adding to it until the proper line count is reached. This will effectively balance the columns.

The code

The following is not really implementing an immediately useful functionality, but mainly demonstrating how the goal set by the question can be achieved. For being really applicable to some concrete purpose, the code (in particular for outputting the pullquote structure produced) probably has to be extended (see also "possible extensions" below).

If the following is copied into a file pullquote.sty,

\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{pullquote}[2012/04/03 EXPERIMENTAL package for typesetting pullquote]

\RequirePackage{microtype}

%-------------------------------------------------------------------
% (1) USER INTERFACE
%-------------------------------------------------------------------

% CONFIGURATION

% Distance between columns.
\newdimen\textcoldist
\textcoldist4mm\relax

% Distance around image.
\newdimen\imgdist
\imgdist4mm\relax

% Width of text column.
\newdimen\textcolwd
\textcolwd\dimexpr.5\linewidth-.5\textcoldist\relax

% Main macro.
% \pullquote{<Text>}{<Image>} will typeset <Text> in two
% (balanced) columns, embedding <Image> in the middle such that
% text 'flows around' the image.
% <Text> should be just text interspersed with \par. No lists,
% displayed math etc. <Image> should be a singular object
% like \includegraphics, but it could as well be a \parbox. Make
% sure the size of the image and the amount of text match such
% that the image can be effectively 'flowed around'.

\newcommand\pullquote[2]
{%
  \begingroup
    % We allow widows and orphans as they would lead to glitches
    % in the paragraph shape.
    \clubpenalty=\z@
    \widowpenalty=\z@
    % Make sure both columns start at the same vertical position. 
    \splittopskip\dimexpr\baselineskip-\dp\strutbox\relax
    % Don't complain about underfull boxes at \vsplit.
    \vbadness\maxdimen
    \vfuzz\maxdimen
    % Put the image into a box which can be measured.
    \setbox\imgbox@pq
    =\hbox{%
      #2%
    }%
    % (2) The text is typeset once to get a rough estimate of the
    % required number of lines.
    \typesettext@pq{#1}{}%
    % Calculate number of lines for text and image.
    \pqlines@pq
    \numexpr
      \dimexpr\ht\textbox@pq+\dp\textbox@pq\relax
      /\baselineskip
      /\tw@
    \relax
    \imglines@pq
    \numexpr
      \dimexpr\ht\imgbox@pq+\dp\imgbox@pq+2\imgdist\relax
      /\baselineskip
    \relax
    % (2a) Column line count is only a rough estimate, not
    % considering the text extension by \parshape. So we \loop
    % until correct column line count is reached.
    \loop
      \typeout{trying \the\pqlines@pq\space lines.}%
      % Calculate number of lines above image.
      \imgtopoffset@pq
      \numexpr(\pqlines@pq-\imglines@pq)/\tw@\relax
      % Number of lines in parshape.
      \global\parshapelines@pq\numexpr2*\pqlines@pq+\@ne\relax
      % (3) Calculate "global" parshape from image size and position.
      \xdef\parshape@pq
      {%
        \number\parshapelines@pq\space
        \iterate@mkps@pq{1}{1}%
        0pt\space\the\textcolwd\space
      }%
      % (4) Re-typeset text with parshape setting.
      \typesettext@pq{#1}
      {%
        \let\o@par\par
        \let\par\par@pq
        \parshape\parshape@pq
      }%
      % (6) Split off two columns.
      \setbox\columnabox@pq=\vsplit\textbox@pq to \pqlines@pq\baselineskip
      \setbox\columnbbox@pq=\vsplit\textbox@pq to \pqlines@pq\baselineskip
      % (7) Iterate until estimation for  column line count is correct.
     \unless\ifvoid\textbox@pq
      % We need to advance line count by half the "leftover" lines
      % in \textbox@pq. To make sure we're not over-extending the
      % line count (by any strange effect of line breaking with
      % the changed parshape) which could lead to unneccessary
      % white space at the bottom of the right column, we deduce
      % one from the result. 
      \@tempcnta
      \numexpr
        \dimexpr\ht\textbox@pq+\dp\textbox@pq\relax
        /\baselineskip
        /\tw@
        -\@ne
      \relax
      % But advance line count by at least one.
      \ifnum\@tempcnta<\@ne\@tempcnta\@ne\fi
      \advance\pqlines@pq\@tempcnta
    \repeat
    % (8) Output text columns and image.
    \hbox
    {%
      \rlap
      {%
        \hskip
        \dimexpr
          \textcolwd+.5\textcoldist-.5\wd\imgbox@pq
        \relax
        \raise
        \dimexpr
          \numexpr\pqlines@pq-\imglines@pq-\imgtopoffset@pq\relax
          \baselineskip
          +.5\dimexpr\imglines@pq\baselineskip-\ht\imgbox@pq-\dp\imgbox@pq\relax
        \relax
        \box\imgbox@pq
      }%
      \box\columnabox@pq\hskip\textcoldist\box\columnbbox@pq
    }%
  \endgroup
}

%-------------------------------------------------------------------
% INTERNALS
%-------------------------------------------------------------------

% AUXILIARY REGISTERS AND CONTAINERS.

% Box for full text content.
\newbox\textbox@pq
% Box for first column.
\newbox\columnabox@pq
% Box for second column.
\newbox\columnbbox@pq
% Box for image.
\newbox\imgbox@pq
% Line count for one column.
\newcount\pqlines@pq
% Line count for "global" parshape.
\newcount\parshapelines@pq
% Line count for image.
\newcount\imglines@pq
% Vertical position of image.
\newcount\imgtopoffset@pq
% Container for "global" parshape definition.
\newcommand*\parshape@pq{}

% INTERNAL MACROS.

% Typeset text. Some precautions are made for keeping register.

\newcommand\typesettext@pq[2]%
{%
  \setbox\textbox@pq
  =\vbox{%
    \hsize\textcolwd
    \tolerance9999\relax
    \lineskiplimit-\maxdimen
    \parskip\z@
    % Here the parshape settings can be injected.
    #2%
    \strut#1%
  }%
}%

% (3) Generate parshape lines. This is a very naive definition which just
% clips out the image box, but it could easily be made much more general.

\newcommand\iterate@mkps@pq[2]%
{%
  \ifnum\numexpr#1*#2\relax>\numexpr2*\pqlines@pq\relax
   \else
    \ifnum#1>\pqlines@pq
      \iterate@mkps@pq{1}{2}%
     \else
      \ifnum#1>\imgtopoffset@pq
         \ifnum#1>\numexpr\imgtopoffset@pq+\imglines@pq\relax
           0pt\space\the\textcolwd\space
          \else
           \ifnum#2=\@ne
             0pt\space\the
             \dimexpr\textcolwd-.5\wd\imgbox@pq-\imgdist+.5\textcoldist\relax
             \space
            \else
             \the
             \dimexpr.5\wd\imgbox@pq+\imgdist-.5\textcoldist\relax
             \space
             \the
             \dimexpr\textcolwd-.5\wd\imgbox@pq-\imgdist+.5\textcoldist\relax
             \space
           \fi
         \fi
       \else
         0pt\space\the\textcolwd\space
       \fi
       \expandafter\iterate@mkps@pq\expandafter
       {\number\numexpr#1+\@ne\relax}{#2}%
     \fi
   \fi
 }

% (5) Internal definition of \par.
% Truncate "global" parshape by \prevgraf and reassign parshape.

\def\par@pq
{%
  \o@par
  \@tempcnta\prevgraf
  \ifnum\parshapelines@pq<\@tempcnta\@tempcnta\parshapelines@pq\fi
  \global\advance\parshapelines@pq-\@tempcnta
  \ifnum\parshapelines@pq=\z@
   \else
    \xdef\parshape@pq{\expandafter\gobbleparshapeprefix@pq\parshape@pq}%
    \parshape\parshape@pq
  \fi
}

% (5a) Gobble parshape lines.

\def\gobbleparshapeprefix@pq#1 #2 #3 %
{%
  \ifnum#1>\parshapelines@pq
    \expandafter\gobbleparshapeprefix@pq\number\numexpr#1-\@ne\expandafter\expandafter\expandafter\relax\expandafter\space
   \else
    #1 #2 #3 %
  \fi
}

then it can be used as follows:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lipsum,xcolor,graphicx}
\usepackage{booktabs}
\usepackage[latin]{babel}

\usepackage{pullquote}

\newcommand\alltext
{%
  \lipsum[1-3]
}

\def\addgraphic{\includegraphics[width=3cm]{mill}}
\def\addquote
{%
  \Large
  \begin{tabular}[b]{p{5cm}}
    \midrule
    \textit{Wir müssen wissen.}\\
    \textit{Wir werden wissen.}\\
    \mbox{}\hfill\large\textsc{David Hilbert}\\
    \midrule
  \end{tabular}%
}

\begin{document}
\pullquote{\alltext}{\addgraphic}
\pullquote{\lipsum[1-2]}{\addquote}
\end{document}

Explanation of the code

I commented the code, so I'll only describe the most important steps here. The following items correspond to marks I put in the comments above.

  • (1) User interface. There is only one user-level macro: \pullquote{text}{quote} will typeset text in two columns, inserting quote (which must be a rectangular object like \includegraphics, tabular or \parbox) in the middle, such that the text of the columns 'flows around' it. The dimension register \textcolwd gives the width of one column, \textcoldist gives the distance between columns, and \imgdist sets the distance on all sides of the image. Note that the vertical distance of the image can differ because the vertical size of (image+distance) has to be a multiple of \baselineskip.

  • (2) Estimating the number of lines. The whole construction depends on knowing the number of lines in a column (stored in \pqlines@pq). But this number cannot be known before actually typesetting the text with the 'window' for the image. To solve this problem, first the text is typeset withot parshape, setting \pqlines@pq to (height + depth)/\baselineskip/2. This will be too low for sure (not considering the white space left for the quote) and will be adapted in the following by a \loop construct (2a).

  • (3) Calculating the global parshape. This is done by the macro \iterate@mkps@pq{line}{col}. It will simply advance line twice from 1 to \pqlines@pq, once for col=1 and once for col=2, denoting the column produced. While col=1, the parshape represents the left part of the window, while col=2, the right part. Hence, \parshapelines@pq=\pqlines@pq*2 lines in TeXs \parshape syntax are produced, all in one sequence. Note that it is a valid fringe case that a single paragraph runs completely through both columns, so we need an appropriately "large" parshape anyway. In fact, one more parshape line is generated at the end, representing a 'full' line, just to be sure we don't get a distorted line count in case the image should go all the way to the bottom.

  • (4) Typesetting in earnest. Now, we can apply the \parshape to get the final form of the text. \par is redefined to \par@pq which will modify and re-apply the \parshape at every new paragraph. Note that at this point, both columns are 'glued together' in one \vbox which is later split with \vsplit. Note further that by the line count estimation, the box will inevitably be longer than \pqlines@pq*2, hence some text will be left over after splitting off the columns.

  • (5) Internal definition of \par. We need to renew the \parshape after every paragraph. \prevgraf gives the line count of the previous paragraph. We subtract this number from \parshapelines@pq, and truncate a corresponding prefix from the parshape construct. (5a) This is done by the macro \gobbleparshapeprefix@pq. It just successively reads (and throws away) the first line definition until the number of lines in the parshape (declared as usual by the first number in the construct) reaches the new value of \parshapelines@pq.

  • (6) Splitting off columns. We simply split the text box \textbox@pq twice with \vsplit, giving two boxes containing one column each.

  • (7) Balancing columns. Until the line count estimate in \pqlines@pq is correct, there will be some material left in \textbox@pq after splitting. We use the size of the material to estimate the new line count. But there is a risk here. If we really advance \pqlines@pq by the number of lines left over divided by 2, it could get too large. Paragraph formatting is a tricky procedure, and the total line number of the text might well be reduced simply by the window being in a different position relative to the lines of the text content. Hence, as a measure of safety, we advance \pqlines@pq by (the number of lines left over divided by 2) minus 1, but at least by 1. Afterwards, the procedure is repeated from step (2a).

  • (8) Constructing the output. As soon as \textbox@pq is void after splitting, all the text material is in the column boxes. These are combined with the appropriately raised quote in a \hbox which forms the result of the macro \pullquote.

Restrictions

  1. This is for text interspersed by \par only. No lists, no displayed math and effectively no tables.
  2. Every line has to be exactly the same height. No changes of \baselineskip or large objects inside the text.
  3. No stretchable or shrinkable vertical space; in fact vertical spaces are not considered at all.
  4. Widows and orphans are not considered, hence they are allowed here.

Possible exensions

  1. The idea of "global parshape" can support much more than just positioning one image. In principle, an arbitrary number of objects (even with non-rectangular shape) could be positioned in different places with a reasonable extension of the parshape calculation.
  2. More than one column would also be no problem, ideally in the contenxt of a generalization like the previous item.
  3. Page breaks would also be possible, which in priciple calls for some scheme of globally positioning objects on the page with text flowing around them.
  4. Once the level of the previous item is reached, all this could be integrated with the output routine, giving a rather non-standard TeX document class - much less freedom inside the vertical list and much more freedom for global page layout.

Result

pullquote example

A more advanced example

My data based publishing software DocScape more or less implements extension 3 listed above. DocScape has an 'object-oriented' page model allowing to place arbitraty (also irregularly formed) objects on the page. Internally, an explicit "design grid" is kept for every page including allocation of grid cells occupied by objects which have already been placed. It's a basic property of text frames in DocScape that text can 'flow' around allocated grid cells, using an algorithm similar to the one used above.

Here is a slightly more sophisticated example done with DocScape.

DocScape example

Examples of "real world" uses of this feature in DocScape projects are found here (for instance page 44) and also here (Warning: 7.3MB) (page 16).

Update

There is now a package pullquote which allows to create circular as well as rectangular inserts.

See Two-column text with circular insert.


Does the cutwin package help at all? It lets you put something (a graphic, a quote, a ...) in a window in a single paragraph.

Tags:

Programming