What are the pros and cons of using hard wrapping, and why is it so common with LaTeX?
Many text-comparing tools like diff
use line-by-line comparison. This can be explained by their origin as programmers' tools. When lines are short enough, these tools work well with TeX sources - especially when combined with version control systems. Of course, there are tools like latexdiff
(highly recommended!), which do not take into account line lengths.
Also, since TeX comments start from % and continue to the end of line, short lines make easier to comment the code.
Last but not least, many TeX authors use programmers' editors like Emacs
or vi
, which use hard line wrapping. This makes the code more readable - and TeX source is primarily a code.
Anyway, TeX itself, of course, does not care about your line length.
There are no pros of hard-wrapping and no cons of soft-wrapping. It's just habit that makes us hard-wrap, that and not knowing about M-x longline-mode
. TeX doesn't care about single newline characters[1], treating them as normal spaces. So as far as the document is concerned, there's no argument for one over the other.
But wrapping is useful for us, the authors. Suppose we stuck to using
double-newlines for paragraphs (thus in practice avoiding the input-buffer issue). A typical paragraph goes over 80 characters (a usual
width of screen for editors) and thus either disappears off the end (not very helpful) or the editor wraps it. The editor can hard-wrap or soft-wrap it. If
it hard-wraps (ie inserts literal newline characters) then there's a problem with editing the paragraph later: one generally wants the paragraph to reflow on editing otherwise it ends up looking
awful.
But reflowing a TeX document is quite an art. A real danger is reflowing comments: if a line ends in a comment %something like this one
then it is extremely important that the wrapping not remove that newline.
A situation less drastic for TeX but highly irritating for an author is wrapping environments and certain other commands. It's quite useful to be able to easily see where an \item
starts, for example, and this is easier if it is at the start of a line (possibly indented) than somewhere in the middle.
Several others have also mentioned diff
. This is another case where one wants to be able to control the newlines carefully. If I have a paragraph that is hard-wrapped to 80 columns (or 72 or whatever) and I change one small bit at the start, and then reflow the text, that might change every line in that paragraph. Running diff
on the resulting file will produce every line as having changed but all I really want to know is the change at the start. So in the file you really do want to have some newlines, but this is really the main reason for having newlines in the file so the newlines in the file should be chosen in such a way that they answer the question "When I run diff
on this file and there's a change here, what extra information do I want to see?". For me, that's usually just the sentence that the change is in so I put a newline after every sentence. I tend also to separate out large chunks of inline maths as well and a few other things.
So what one wants is a system whereby it is possible to specify certain newlines as immutable and others as flowable. A fair amount of the immutable newlines can be predicted (comments, certain commands) but a useful system will make it easy for an author to specify them as they go along. Emacs does this with its longline-mode
. Author-inserted newlines are respected, auto-inserted newlines are not. Best of both worlds. Any respectable editor will have similar functionality. If not, your editor is forcing you to adapt to its idiosyncrasies whereas it should be the other way around.
(I encountered this first when I switched to using a VCS. I wrote up my experiences on my website.)
[1] There are three cases that I know of where newlines matter:
- A newline marks the end of a comment
- It is possible to force TeX to interpret newlines as
\par
tokens, theverbatim
environment does this as does\obeylines
. - If everything were on one line then as TeX reads the document a line at a time it may be possible to overwhelm the input buffer.
Another answer is that TeX has a fixed maximum size of input lines, that is, if a non-wrapped line exceeds a certain length, there will be an error message.
This limit is quite large nowadays. You'll find it in your texmf.cnf
:
buf_size = 200000
But it used to be smaller in the past, and anyway, who wants to think about something like this while writing text?