Static analysis of LaTeX documents?

Short answer is that it's not possible.

There are some tools that do some things but they can not really analyse the latex document and so any advice they give should only be taken as hints, it might be wrong.

The big difference between LaTeX and the languages that you mention like C and Java is that the syntax of LaTeX can not be analysed, even the basic lexical analysis and tokenisation of the input depends on run time behaviour.

\section[abc}

Looks like it might be a syntax error that you might expect a static analysis to pick up but the document might be

\documentclass{article}

\ifodd\time\catcode`[1\fi
\begin{document}

\section[abc}

aa
\end{document}

which means that it is or is not a valid document depending on the number of minutes since midnight. This is obviously an extreme case but not as extreme as you may think. Lots of packages do similar things that change the analysis of the document, think of babel shorthands for example. The fact that babel has been loaded can be statically detected by inspecting the preamble, but determining which language is in force at any point really requires running a full LaTeX interpreter.

Even if it were possible I'd question if some of your items really should be flagged.

unreachable code: \if\else\fi constructions where one or more paths can never be reached?

The difficulty here is determining which tokens are in fact tests, mostly you do not see Tex primitives such as \if But tokens defined via \newif which are harder to recognise by a checker. It could perhaps assume that every token starting \if.. is an if token in this sense but for example LaTeX \ifthenelse starts with \if.... but has a very different syntax.

inefficient loops, like \foreach's that could be simplified?

\foreach is simply a macro so almost by definition any particular use of it can be simplified by expanding out the macro. But that may not be seen as simplification...

unused macro defintions?

LaTeX and all its packages are macro definitions and most documents don't use most of the commands defined, so there are typically thousands of unused macros in any given document.

suggesting the use of \newcommand* instead of \newcommand where appropiate?

I'm not sure how this could be done unless you record every use of the macro in a given document and note that it never takes par in that case,

suspicious lack of possible brackets or whitespaces? Like:

a^b c - clear

a^{bc} - clear

a^bc - suspicious: renders like 1. but maybe 2. was intended?

I'd disagree with this check. 2. is the standard latex syntax. If you decide to allow 1. then you should allow 3 as well without comment. It's a central part of the design of TeX math mode syntax that white space is not significant other than terminating command names.

suspicious empty lines (paragraphs)? For example between text and equation?

TeX goes to some trouble to distinguish the case that the text following a display is or is not a new paragraph, and LaTeX emulates this behaviour for all its list environments. So unless the static analyser is interpreting the sentences and suggesting that it should not be the start of a paragraph it should not be commenting on blank lines.

suspicious or missing [end line comments][1]?

Yes, so long as it can recognise the start of latex3 syntax or similar packages that change the rules and mean % is not necessary.

whatever else you can think of that is very often wrong or inefficient or unclear to the human eye?

getting a human to proofread the document is a good idea, human eyes are still better at this than machines:-)

While the accepted answer makes a number of good points, software to perform LaTeX static analysis does exist. As expected, they are not nearly as comprehensive as linters for a language like Python.

The most notable such linter is ChkTeX. It is also on CTAN and is part of TeX Live (since 2010).

Static analysis of LaTeX documents?

Tags:

Compiling

Errors

Related

Recent Posts