Detecting harmful LaTeX code
web2c based tex's have quite a lot of customisation to control this. As is a well known theorem of Turing, it's not possible to detect all possible infinite loops in any non trivial programming language, so if the tex code is \def\x{\x}\x
it will loop forever, however any web hosting setup should allow you to specify time limits for any forked processes so that isn't really a problem, you can always kill the job after whatever time limit you want to set.
running scripts is not allowed by default so your second concern is only an issue if you allow it to run arbitrary user specified commands, so don't do that:-)
You may also want to clamp down on the ability to read files outside of the input tree by banning reading of /etc/passwd etc (writing such files is again prevented by default)
the texmf.cnf
controlling your text installation will have
% Do we allow TeX \input or \openin (openin_any), or \openout
% (openout_any) on filenames starting with `.' (e.g., .rhosts) or
% outside the current tree (e.g., /etc/passwd)?
% a (any) : any file can be opened.
% r (restricted) : disallow opening dot files
% p (paranoid) : as `r' and disallow going to parent directories, and
% restrict absolute paths to be under $TEXMFOUTPUT.
openin_any = a
openout_any = p
you may want to make openin_any
also p
Other than that tex is as safe as anything else you can do, it can not spawn any new commands, it can not write anywhere other than the directory it is started from (and subdirectories of that) and it can not read any files out of the specified input path.
\endinput% this file is anti-social if this line is removed
\makeatletter
\ProvidesFile{xxx}[\noexpand\ver@xxx]
\ProvidesFile{xxx}[\ver@xxx]
\documentclass{article}
\begin{document}
\end{document}
Detecting anything starting with a backslash is probably going too far. WIthout knowing the content of your documents, \emph{}
, \textsuperscript{}
, $\mu$m
may all be reasonable.
You should certainly disable shell-escape to prevent arbitrary command being run.
You should probably run the compiler in a sandbox of some kind (heavily dependent on your host system so I couldn't give details even if I was an expert). You can also have a watchdog to kill the process if it runs unreasonably long (it sounds like you have a good idea of the job structure and could predict the runtime). Setting text is quick so an abnormally large input shouldnt increase the time by much.
Most attempts to hang a LaTeX compiler would be more likely to cause it to abort -- possible with a "TeX capacity exceeded" error. But of course it might take some time to do that. So a reasonable code-validating step might be to check for and block \def
, \newcommand
and equivalents. It would annoy some users (like many people here) but would make it a little harder to (deliberately or otherwise) hang the compiler by things like uncontrolled recursion. There are ways round this using \begin{def}
so it would probably be a good idea to whitelist any environments that we can \begin
.