Automated Newspaper Layout (with TeX and abroad)

Not much of an answer, more a couple of loose thoughts ...

Off-hand I'm not aware of any such system and also not aware of any research that deals with automatic newspaper layout. As far as I know there has been only very very limited attempts to approach the subject of automatic typesetting with more complex layout rules and dependencies that go beyond what is largely a linear process. You can count the with your hands:

  • Michael Plass (under Knuth)
  • Graham Asher in 1990 or so (Type & Set) - not sure what happened to that
  • Anne Brüggemann-Klein in the mid 90ties
  • Richard Furuta and a few others in the 90ties
  • Stephan Wohlfeil 1997 (Phd: On the Pagination of Complex Book-like Documents)

and to my knowledge nada otherwise. And those are all looking more at the questions arising from "book-like" documents rather than newspapers/journals. But I might be very wrong as I didn't follow that area closely in the last 10 years.

But assuming my knowledge is correct for a moment, it isn't really really surprising, is it? What you have is a global optimization problem of a constraint system where the possibilities that you need to test grow astronomically the moment you have more than a single column and a good number of floats with a certain set of constraints. And so far any serious attempts to do much better than choosing the trivial way out (no floats, just linear typesetting - aka MS-Word model) or a simple greedy algorithm that never looks back (like LaTeX does) got defeated by the complexity of the task.

Now newspaper typesetting on one hand comes with the additional complexity (but perhaps also the freedom) of having multiple input streams of limited length which allow for reordering (to some extent). On the other hand it will have much different requirements on picture order and call-outs.

By the way, to my knowledge it is quite common in newspaper writing that the authors have to write to length and if they don't they get edited to it. Are you thinking of taking that into account? Because if so that would simplify the task probably considerably.

So I think the first task would be to understand and research the constraint system, e.g., what kind of rules make newspapers or journals tick. Those will not be universal and most likely they are contradicting each other if taken all together. But they form a basis of what an algorithm needs to be able to be configured for. And only when those boundaries are known can one delve deeper into the question of designing such an algorithm. How close one can get to an ideal, I don't know. In some respects, I would assume that it might in fact be simpler for newspapers due to the flexibility of reordering stories but in any case I believe this is an open research topic that is so far unsolved (just like "the pagination of complex book-like documents" effectively is). --- I'm certainly interested and have been for more than two decades, even if I had to take a longer break after the millennium.

I don't know if Wohlfeil's PhD work is still easily available (it was difficult for me to get back then) but a quick search on the web brought up a shorter paper by Brüggeman-Klein/Klein/Wohlfeil "On the Pagination of Complex Documents" which is from around the same time. And I also found "Pagination reconsidered" by the same authors (but no date to go with it, but from the number it was probably earlier).

I'm sure that there are probably many other sources but one good book that I think is worth looking at for those who speak German is "Praxishandbuch Gestaltungsraster" by Andreas and Regina Maxhauer. Its focus isn't the newspaper angle, but rather the grid one but that naturally covers a good number of possible rules.

By the way, a good way to do some research (through far from perfect at the moment) is to look around in Microsoft's Academic Search. For example that gives you some more background on what Anne was doing over the years and which papers she co-authored. But you have to be aware that there is a lot of rubbish in the data they have and it is horribly incomplete in parts.

Update

Upon reading a bit in Stefan's PhD thesis again (which I incorrectly labeled habil initially) I came across the work of Krista Lagus who wrote in her master thesis about "Automated pagination of the generalized newspaper using simulated annealing". I didn't find the thesis on the web but perhaps it is worth exploring further.


I am not familiar with any literature other than some papers that concentrate on page description languages. However, I think Håkon Wium Lie's thesis on Cascading Style Sheets, might be partially relevant to what you are looking at least from the point of developing a robust "templating" or "templet" system (also has an interesting bibliography). However, as you said:

There won't be technical difficulties with page and article layout using my system DocScape. I'm asking (myself) about the basic algorithm for "geometrically" generating the page layout based on the given content stream.

The difficulty lies in defining an algorithm for nicely placing textual objects on a page, trying the various permutations etc. The answer certainly lies in the realm of AI and especially machine learning.

I would envision a system that has scanned and translated into templates (based on an as yet to be developed system) 1000s of editions and then out of this corpus to train the algorithm to produce similar designs using pattern recognition algorithms.

However, the problem will become more tractable if you re-phrase as: from a set of pre-determined typographical layouts can you automate the production of a newspaper. The answer for this is almost certain as proven by LaTeX that automates the production of pre-determined styles for books etc. Such a system has been described by DeTreville in a PhD Thesis. The dissertation is a bit dated but has a good approach in abstracting layouts.

I tried hard on and off to try and define an algorithm that from a set of figures and text produce art book like output. So far I have a collection of about 100 different designs. How do you choose one from another still evades me and this is three orders of magnitude easier.

But, please don't let me discourage you. I think is a great area to develop and research or create a start-up for it.


CSS Paged Media

First of all, LaTeX is not really intended for unattended typesetting of large documents. ConTeXT may lend itself marginally better to this, but it will still not be on par with the requirements for automatically typesetting a newspaper.

By contrast, the combination of HTML and CSS performs way much better at automatically positioning and resizing content whilst not requiring much effort by the designer. After all, this is not surprising as this is exactly what we expect fluid web pages to do: adapting content to unknown screen dimensions.

This is why the commercial software Prince XML deserves mentioning here as a bridge from HTML to printed media. On the product's website there are several examples of entire magazines automatically typeset from HTML & CSS.

Recently, this technique received a generic name: CSS paged media. One can read more about it here.

Under specific circumstances the use of Prince XML is free of charge.

Myself, I am using the non-commercially licensed Prince XML in my automatic work flow from Pandoc Markdown over HTML & CSS to Letter & A4-sized PDF. Check out my website for examples and the makefile.

Even though I have a good amount of experience with TeX, I was unable to achieve such nice-looking automatically generated results with LaTeX nor with ConTeXt.

Moreover, the HTML, CSS & Prince XML combo is extremely fast. Whereas ConTeXt would typical require at least 3 seconds for a couple of pages, Prince XML does the same and better in a fraction of a second. So server-side on-demand typesetting with the commercially licensed Prince XML certainly belongs to the realm of workable possibilities.