Source Code Indentation
Yes, it is possible. Spaces at the begin of source lines is always ignored by TeX. However, I disagree that it is more readable for a normal document. Having several paragraphs of a, say \subsubsection
indented by many spaces is not really readable to me.
Instead I normally add certain separation lines before the sectioning commands, like 80-120x %
. You can use two or more of them for higher sectioning commands. Multiple chapters and parts should be not in one source file anyway, IMHO, but instead split over several files which are then included in the main document using \include
or \input
.
If you question is actually about how to do this automatically then note that this is of course editor dependent. I don't know any editor which does that for sectioning commands. It might be possible to configure the more advanced ones, but for that we would need to know which one you are using.
Whether indenting is possible depends on the current category-code-régime.
Under usual category-code-régime indenting by means of spaces and/or by means of horizontal-tabs is possible/indenting neither does affect the look of the text in the output-file (.pdf-file) nor does affect, e.g., the gathering of tokens for a macro-argument/for a ⟨balanced text⟩
/for the like.
Under unusual category-code-régimes indenting may not be possible/may affect the look of the text in the output-file and/or may affect the gathering of tokens for a macro-argument/for a ⟨balanced text⟩
/for the like.
E.g., within the body of a verbatim
- or verbatim*
-environment indenting by means of spaces does affect the look of the text in the output-file. The environments verbatim
and verbatim*
switch to some unusual category-code-régime.
The story is as follows:
TeX's reading- and tokenizing-apparatus does read/process input line by line.
When preparing a line of input for tokenization, TeX will first convert the single characters of that line to have these characters fit its internal character representation scheme. With traditional TeX engines, the internal character representation scheme is the ASCII-encoding (American Standard Code for Information Interchange). With LuaTeX-engines and XeTeX-engines, the internal character representation scheme is Unicode/utf8.
This yields that a character which in the input might be encoded in whatsoever way specified by the underlying computer-platform in use and which also has a representation in the TeX-engine's internal character representation scheme will TeX-internally be represented by the number of its code-point in the TeX-engine's internal character representation scheme.
Then any space character at the end of that line will be removed. (At this stage, spaces will be represented by the TeX-internal character code 32 as 32 is the number of the code-point of the space-character in ASCII/in Unicode/in the TeX-engine's imternal character representation scheme.)
Then a character will be attached at the end of the line whose code-point-number in the TeX-engine's internal character code equals the current value of the integer-parameter \endlinechar
.
(Usually the value of \endlinechar
is 13(dec) and thus denotes the return-character as the return-character has code-point number 13 in ASCII/in Unicode/in the TeX-engine's internal character representation scheme.
Usually the category code assigned to TeX-internal character 13 (return) is 5 (end of line). )
Then TeX will start tokenizing the line.
I.e., TeX will "look" at the characters of the line of input in question and hereby take the content of the line of input in question for a set of instructions for putting so-called tokens into the token-stream.
Tokens can be control sequence tokens—they come in two flavors: control word tokens and control symbol tokens—or explicit character tokens.
(A control word token is a control sequence token whose name consists either only of characters of category code 11 and/or of several characters. [Control word tokens with names consisting of mixtures of characters both of category code 11 (letter) and not of category code 11 can not be obtained by reading and tokenizing input but can only be obtained by creating them from other tokens via \csname..\endcsname
/ \ifcsname..\endcsname
.]
A control symbol token is a control sequence token whose name consists of a single character whose category code is not 11(letter).
An explicit character token has two properties: A character code and a category code. In TeX-jargon the character code of an explicit character token denotes the number of the code-point of the character in the TeX-engine's internal character representation scheme, which with traditional engines is ASCII and with XeTeX/LuaTeX-engines is Unicode whereof ASCII is a subset. In TeX-jargon the category code of an explicit character token influences what the TeX-engine does when further processing that explicit character token: beginning/ending a group; entering/leaving math-mode; denoting an alignment-tab; denoting a macro-argument; denoting subscript or superscript; denoting a space; denoting some glyph of a font that shall occur in the output-file;...
E.g., when TeX is not about to gather the name of a control sequence token and in the input finds the character A
(which at the time of preparing the line for tokenization was transformed to the TeX-internal character representation-scheme, either ASCII or Unicode, while, according to the ASCII/Unicode-standard, the codepoint of the character A
has the number 65), it will insert an explicit character token into the token stream whose TeX-internal character code is 65 and— as usually catcode 11 (letter) is assigned to character code 65—whose category code is 11 (letter).
I used the term "explicit character token". There are implicit character tokens as well: This is when you do, e.g., \let\foobar=A
. \foobar
will be a control word token with the same meaning as the explicit character token whose TeX-internal character code is 65 and whose category code is 11 (letter). You can use \foobar
both for \if
-comparisons and for \ifcat
-comparisons as if it were the explicit character token A
of TeX-internal character code 65 and category code 11 (letter). But you, e.g., cannot use \foobar
as an alphabetic constant.)
When tokenizing the line of input, the reading- and tokenizing-apparatus of TeX can be in one of three states:
State S: Skipping blanks.
After tokenizing a control word token/after tokenizing a control sequence token whose name consists only of characters of category code 11 (letter), the reading- and tokenizing-apparatus is switched to state S (skipping blanks).
Both after tokenizing an explicit space-token (TeX-internal character code 32, category code 10(space)—space has number 32 in ASCII/Unicode/in the TeX-engine's internal character representation scheme) and after tokenizing a control symbol token whose name consists of a character of category code 10(space) (e.g., after tokenizing a control-space, i.e., \
, i.e., a control symbol whose name consists of a single character of TeX-internal character code 32), the reading apparatus will be switched to state S.
When the reading- and tokenizing-apparatus is in state S (skipping blanks), TeX will not take characters of category code 10 (space) from the input for directives for placing whatsoever tokens into the token stream but TeX will simply drop these characters, leaving the reading- and tokenizing apparatus in state S (skipping blanks).
When encountering a character of category code 5 (end of line) while in state S (skipping blanks), TeX will not insert any token into the token stream but will simply drop that character and will start processing the next line of input if present, thus dropping any information remaining on the current line of input and switching the reading- and tokenizing apparatus to state N (new line).
State M: Middle of line.
Both after tokenizing an explicit character token other than an explicit space-character-token (TeX-internal character code 32, category code 10) and after tokenizing a control symbol token (a control sequence token whose name consists only of a single character of a category code differing from 11) other than control-space, the reading- and tokenizing-apparatus is switched to state M (middle of line).
When the reading- and tokenizing-apparatus is in state M (middle of line), TeX will take any character of category code 10 (space) from the input for the directive to place an explicit character token of TeX-internal character code 32 and category code 10 (space) into the token stream. Then it will switch the reading- and tokenizing-apparatus to state S (skipping blanks).
Under usual category-code-régime both the space-character (code-point number 32 in ASCII/in Unicode/in the TeX-engine's internal character representation scheme) and the horizontal-tab-character (code-point number 9 in ASCII/in Unicode/in the TeX-engine's internal character representation scheme) are of category code 10 (space). Thus under usual category-code-régime spaces and horizontal tabs are interchangeable/are treated in the same way.
When encountering a character of category code 5 (end of line) while in state M (middle of line), TeX will insert a space token, i.e., an explicit character token of TeX-internal character code 32 and category code 10 (space) into the token stream and will start processing the next line of input, thus dropping any information remaining on the current line of input and switching the reading- and tokenizing apparatus to state N (new line).
State N: New line.
When TeX is about to start reading and tokenizing another line of input, the reading- and tokenizing-apparatus is switched to state N.
When the reading- and tokenizing-apparatus is in state N (new line), TeX will not take characters of category code 10 (space) from the input for directives for placing whatsoever tokens into the token stream but TeX will simply drop these characters, leaving the reading- and tokenizing apparatus in state N (new line).
When encountering a character of category code 5 (end of line) while in state N (new line), TeX will insert the control sequence token \par
into the token stream and will start processing the next line of input if present, thus dropping any information remaining on the current line of input and switching the reading- and tokenizing apparatus to state N (new line).
What will be the effect of two consecutive line breaks in the TeX-input?
Due to the \endlinechar
-thingie at the time of preparing the line of input in question for tokenization, there usually will be a character of TeX-internal character code 13 (13 = Return in ASCII/Unicode/the TeX-engine's internal character representation scheme) at the ending of each line of input while usually the category code assigned to internal character code 13 is 5 (end of line).
Two consecutive linebreaks usually means inserting TeX-internal character 13 (13=return in ASCII/Unicode/the TeX-engine's internal character representation scheme) both when encountering the first and when encountering the second line break while TeX-internal character 13 usually has category code 5 (end of line).
When processing the first of these two characters, the reading- and tokenizing-apparatus might be either in state S (skipping blanks) or in state M (middle of line). In the further case this character will, when the line in question gets tokenized, not yield insertion of a token into the token stream. In the latter case this character will, when the line in question gets tokenized, yield insertion of an explicit space token (TeX-internal character code 32, category code 13) into the token stream. In any case the state of the reading-apparatus will be switched to state N after processing this character.
When processing the second of these two characters, the reading- and tokenizing-apparatus will in any case be in state N (new line). In this state this character yields insertion of the control word token \par
into the token stream.
(As long as \par
is not redefined processing \par
in turn—beneath other things—yields cancellation of any horizontal glue at the end of the paragraph in question. Glue inserted due to an explicit space character token which in turn might have come into being due to the \endlinechar
-insertion-thingie in state M will be cancelled as well.)
This is how inserting two consecutive line breaks into the input usually yields ending the current paragraph.
Summa summarum:
Space characters at the ends of lines will be removed by TeX at the time of preparing the lines of input for tokenization.
Space characters and horizontal-tab-characters at the beginnings of lines have no effect under usual category-code-régime:
When TeX is about to tokenize another line of input, the reading- and tokenizing apparatus is in state N (new line).
TeX converts spaces from the input so that they internally are represented by the TeX-internal character code 32.
TeX converts horizontal-tabs from the input so that they internally are represented by the TeX-internal character code 9.
Usually the category code 10 (space) is assigned to the TeX-internal character codes 32 and 9.
When TeX while being in state N (new line) during tokenization processes a character of category code 10 (space), it will simply drop that character, i.e., it will not place any token into the token stream and will remain in state N which determines the same treatment for consecutive spaces occurring in the input.
Within the environments verbatim
and verbatim*
a category code other than 10 (space) is assigned to the space character.
Within the verbatim
-environment the category code 13 (active) is assigned to the space character and therefore in that environment tokenizing a space yields an explicit character token of TeX-internal character code 32 and category code 13 (active). Character tokens of category code 13 (active) can in many ways be used like control sequence tokens. I.e., within the verbatim
-environment the space character will act as a macro. That macro in turn within the verbatim
-environment (via \let
-assignment) is defined to be equal to the macro \@xobeysp
which in turn via \leavevmode
ensures horizontal mode, via \nobreak
prevents a line break and delivers \
, the control-space, which in horizontal mode leads to inserting whitespace/horizontal glue in the width of the glue inserted due to an explicit space token when spacefactor is 1000.
Within the verbatim*
-environment with older LaTeX-releases (I looked into a copy of "The LaTeX 2e Sources 2003/12/01 when writing the initial release of this answer) the category code 12 (other) will be assigned to the space character (ASCII 32) and therefore in that environment tokenizing a space yields an explicit character token of TeX-internal character code 32 and category code 12 (other).
Like with other character tokens of category code 11 and 12, TeX will insert glyphs of a font when processing an explicit character token of character code 32 and category code 12 (other). The font used in verbatim
- and verbatim*
-environments is \ttfamily
—typewriter-family, specified via the control-sequence \verbatim@font
, usually "Computer Modern Typewriter" in OT1-font-encoding—while in the font "Computer Modern Typewriter" in OT1-font-encoding the glyph in slot 32 connected to the character token of TeX-internal character code 32 and category code 12 (other) is this nice underscore-thingie ␣
.
Within the verbatim*
-environment with newer LaTeX-releases (I looked into a copy of "The LaTeX 2e Sources 2019-10-01 Patch level 2 when editing this answer) the category code 13 (active) will be assigned to the space character (ASCII 32) and the active space still is let equal to \@xobeysp
but \@xobeysp
is redefined either (traditional tex/pdftex-engines) to be equal to \asciispace
, a macro which delivers the glyph in slot 32 of the current font or (xetex/luatex-engines) to ensure horizontal mode and deliver the copy of a horizontal box of the width of the letter x in the current font (which should be the font for typesetting things verbatim, specified via \verbatim@font
), containing the character from slot 32 of the "Computer Modern Typewriter"-font in OT1-font-encoding. With xetex/luatex-engines it is ensured that redefining \verbatim@font
for using another typewriter-font for verbatim-output does not break the visible-spaces-mechanism as here the box containing the things for a visible space is created by selecting the glyph in slot 32 after explicitly switching to "Computer Modern Typewriter"-font while it can be taken for ensured that the "Computer Modern Typewriter"-font contains the desired underscore-like visible-space glyph in slot 32.
Within the environments verbatim
and verbatim*
the category code of the horizontal-tab-character still is 10(space), thus within the environments verbatim
and verbatim*
you can use the horizontal-tab-character (↹) both for indenting the source-code in a way which does not affect the look of the output and for obtaining an invisible space of the with of horizontal-glue created by an explicit space-token:
\documentclass{article}
\begin{document}
\begin{verbatim}
ABC
DEF
\end{verbatim}
\begin{verbatim}
ABC
DEF
\end{verbatim}
\begin{verbatim*}
A B C
D E F
\end{verbatim*}
\begin{verbatim*}
A B C
D E F
\end{verbatim*}
\begin{verbatim*}
A B C
D E F
\end{verbatim*}
\begin{verbatim*}
A B C
D E F
\end{verbatim*}
\end{document}
Of course my explanation of verbatim
and verbatim*
does just roughly outline how things work. In case you are interested in subtleties and details, feel free to study The LaTeX 2ε Sources, section 53.3 Verbatim (File y: ltmiscen.dtx)
Therefore under usual category-code-régime you can indent your code by means of spaces and/or horizontal-tabs as you like.
Within environments like verbatim
/verbatim*
that do switch to some unusual category-code-régime, you cannot indent your code by means of spaces as with these environments spaces at the beginnings of lines of input will be transformed into tokens that in turn yield visible output instead of just being skipped. Within verbatim
/verbatim*
indenting without visual effect is only possible by means of horizontal-tabs.
By the way:
Due to the removal of space-characters at line-endings at the time of preparing lines of input for tokenization, it is even with environments like verbatim
and verbatim*
(as defined in the LaTeX 2ε-kernel) not possible to produce output with visible spaces at the ends of lines.
I.e., the input (␣ here denotes a space character)
\begin{verbatim*}
␣␣␣\TeX␣is␣funny␣␣␣
␣␣␣\TeX␣is␣funny␣␣␣
\end{verbatim*}
does not yield the output (␣ here denotes the open-box-glyph-thingie often used for providing a visible representation of spaces)
␣␣␣\TeX␣is␣funny␣␣␣
␣␣␣\TeX␣is␣funny␣␣␣
but does yield the output (␣ here denotes the open-box-glyph-thingie often used for providing a visible representation of spaces)
␣␣␣\TeX␣is␣funny
␣␣␣\TeX␣is␣funny
An example exhibiting the aspects connected to deviating from the usual category-code-régime—all indenting is done by means of spaces only:
\documentclass{article}
\begin{document}
\section{This is a section}
\subsection{This is a a subsection}
This is text that gets read and tokenized under normal category-code-r\'egime.
This is text that gets read and tokenized under normal category-code-r\'egime.
\begin{verbatim*}
This is text that gets read and tokenized under verbatim's category-code-regime.
This is text that gets read and tokenized under verbatim's category-code-regime.
\end{verbatim*}
\begin{itemize}
\item This is an item that gets read and tokenized
under normal category-code-r\'egime.
\item \begin{verbatim*}
This is an item that gets read and tokenized
under verbatim's category-code-regime.
\end{verbatim*}
\item This is another item that gets read and tokenized
under normal category-code-r\'egime.
\end{itemize}
\end{document}
Is there a way to make the editor/front end do this by default because indenting every single line is a bit tedious; or maybe there is s front end/editor that supports this?
latexindent.pl
can perform this indentation for you.
Let's say that you start with the following files, myfile.tex
and sam.yaml
:
myfile.tex
\part
part text
part text
part text
\chapter
chapter text
chapter text
chapter text
\section
section text
section text
section text
\subsection
subsection text
subsection text
subsection text
sam.yaml
indentRules:
part: "\t"
chapter: "\t"
section: "\t"
subsection: "\t"
indentAfterHeadings:
part:
indentAfterThisHeading: 1
level: 1
chapter:
indentAfterThisHeading: 1
level: 2
section:
indentAfterThisHeading: 1
level: 3
subsection:
indentAfterThisHeading: 1
level: 4
Upon running the following command
latexindent.pl myfile.tex -l=sam.yaml -o=output.tex
then you will receive the following file:
output.tex
\part
part text
part text
part text
\chapter
chapter text
chapter text
chapter text
\section
section text
section text
section text
\subsection
subsection text
subsection text
subsection text
This, and many other options, are detailed in the documentation, and are customizable through the YAML interface.