Is it possible to automatically enumerate sentences in LaTeX?
As other have pointed out, doing it completely automatically is probably quite difficult. If you want to use the \label
-\ref
mechanism, you would have to insert labels anyway. Let's pick some character which is usually not used in input, such as the vertical bar |
. Ten minutes of hacking and we end up with:
\documentclass{article}
\newcounter{sentence}
\newcounter{para}
\makeatletter
\@addtoreset{sentence}{para}
\@addtoreset{para}{section}
\catcode`\|=\active
\def|{\@ifnextchar[%] to keep my editor happy
\start@label\start@nolabel}
\def\start@label[#1]{\ifvmode \start@para@label[#1]\else \start@sent@label[#1]\fi}
\def\start@nolabel{\ifvmode \start@para@nolabel\else \start@sent@nolabel\fi}
\def\start@para@label[#1]{%
\refstepcounter{para}%
\label{#1}\leavevmode}
\def\start@sent@label[#1]{%
\refstepcounter{sentence}%
\label{#1}%
\thesentence~}
\def\start@para@nolabel{%
\stepcounter{para}\leavevmode}
\def\start@sent@nolabel{%
\stepcounter{sentence}%
\thesentence~}
\makeatother
\renewcommand{\thepara}{\thesection.\arabic{para}}
\renewcommand{\thesentence}{\thepara.\arabic{sentence}}
\begin{document}
\parindent=0pt
\parskip=1em
||These rules must be followed. |The end of a paragraph is indicated
as usual with a blank line.
||[parstart]A new paragraph must start with a vertical
bar. |[sentstart]Each sentence must also start with a vertical
bar. |It follows from~\ref{parstart} and~\ref{sentstart} that a new
paragraph actually starts with two vertical bars.
Without vertical bars, nothing special happens. This might be useful
to comment on the formal rules above or below.
|[parref]|Each vertical bar takes an optional argument. |If
given, it is used as a label. |[sentref]For example, this is
sentence~\ref{sentref} of paragraph~\ref{parref}.
\end{document}
I use the fact that after a \par
we're in vertical mode to distinguish the two uses of |
. But then we need to explicitly \leavevmode
, since the beginning of a paragraph itself does not insert material causing us to switch to horizontal mode. If you want to be able to reference both whole paragraphs and single sentences, we need the double ||
at the start of each paragraph (they might both have an optional argument). If you never need to reference whole paragraphs, it's easy to change the syntax so that only a single |
is needed at the start of a paragraph; in fact, it would be much simpler, since |
could be implemented as a single macro with optional argument (which, when in \ifvmode
, steps the paragraph counter before doing the other stuff).
Added I avoided using \everypar
since it's not very reliable if lots of other things are happening. However, wrapping stuff in an environment might allow one to use \everypar
and provide a simpler syntax. The biggest problem is really to allow the use of labels; we have to tell LaTeX when and how to look for a label.
Not easily1 with TeX or LaTeX or anything else.
The problem is what is generally known in natural language processing as Sentence boundary disambiguation
Sentence boundary identification is difficult because punctuation marks are often ambiguous. A period may denote an abbreviation, decimal point, an ellipsis, or an email address - not the end of a sentence. In addition sentences can end by exclamation marks or question marks.
A better approach is to pre-process the file outside TeX. The NLTK written in Python might be a starting point.
1 not easily meaning, if you devote a good chunk of your time you may be able to define a TeX parser to capture 95% of cases using TeX.
Obviously Yiannis is right.
However, if you can live with a trade-off then you can perhaps just redefine the macros \\
and \par
(which is implicitly inserted whenever you leave an empty line) and write your sentences like this:
First sentence.\\
Second sentence.\\
Third sentence.
And end up with:
1.1 First sentence. 1.2 Second sentence.
2.1 Third sentence.
This requires two counters, one counting the sentences and one counting the paragraphs.