Use xindy to sort by bible book instead of ABC – or: How to add custom letter groups?
The easiest solution is to use Klingon: change the line
\makeindex[options = -M index-style -C utf8]
in your TeX file to
\makeindex[options = -M index-style -C utf8 -L klingon]
This produces a PDF file whose index page is typeset as:
which I believe is what you wanted.
Long answer:
This one was, for me, a good lesson in software testing: I had to treat the xindy
program almost as a black-box, because its documentation is terrible and even actively misleading. In fact, of all the time I spent trying to figure this out, the biggest leap towards a solution was the moment I decided to completely stop trusting the documentation.
The following is not exactly how I arrived at the solution/understanding, but a lightly “fictionalized account” of how one might have.
To recap the question: After saving the LaTeX source from the question as question.tex
, changing filecontents
to filecontents*
(so that it doesn't write %%
comments to the file), and running xelatex -shell-escape question.tex
(note: shell-escape
is in general dangerous; you should not use that option without being absolutely sure the file is safe to run), we get a typeset PDF whose index page looks like the following:
So the problem is that it seems to have completely ignored the (define-letter-groups ...
in the index style.
If you look in the log file (question.log
), it mentions the program that was invoked:
runsystem(texindy -M index-style -C utf8 question.idx)...executed.
And you can verify for yourself that it is this command (texindy -M index-style -C utf8 question.idx
) that turns question.idx
:
\indexentry{Ps 1}{1}
\indexentry{Ps 10}{1}
\indexentry{Ps 2}{1}
\indexentry{Ps 1}{1}
\indexentry{Ps 3}{1}
\indexentry{Ps 56}{1}
\indexentry{Ps 5}{1}
\indexentry{Ps 34,1--2}{1}
\indexentry{Ps 34,7}{1}
\indexentry{Ps 34,1}{1}
\indexentry{Ps 34|full}{1}
\indexentry{Ps 34}{1}
\indexentry{Dtn 3,7.9}{1}
\indexentry{Dtn 3,5}{1}
\indexentry{Dtn 3,8}{1}
\indexentry{Num 122,3}{1}
\indexentry{Num 121}{1}
into question.ind
:
\begin{theindex}
\providecommand*\lettergroupDefault[1]{}
\providecommand*\lettergroup[1]{%
\par\textbf{#1}\par
\nopagebreak
}
\lettergroup{D}
\item Dtn 3,5: 1
\item Dtn 3,7.9: 1
\item Dtn 3,8: 1
\indexspace
\lettergroup{N}
\item Num 121: 1
\item Num 122,3: 1
\indexspace
\lettergroup{P}
\item Ps 1: 1
\item Ps 2: 1
\item Ps 3: 1
\item Ps 5: 1
\item Ps 10: 1
\item Ps 34: 1
\item Ps 34,1: 1
\item Ps 34,1--2: 1
\item Ps 34,7: 1
\item Ps 56: 1
\end{theindex}
The bug already manifests here. So at this point we can forget about the TeX stuff and focus on getting the desired question.ind
out of the question.idx
using the texindy
(or another) command.
The texindy
command has some debug options. For example, adding --debug script
shows that texindy
is essentially calling xindy
as xindy -d script -L general -C utf8 -M tex/inputenc/utf8 -M texindy -M page-ranges -M word-order -M index-style.xdy -I latex question.idx
. We can run that directly, and see that it produces the same output (the “bad” question.ind
).
This list of modules automatically included by texindy
also suggests that some of the lines of our index-style.xdy
are redundant: we can boil it down to just
(markup-locclass-list :open ": ")
(define-letter-groups ("Num" "Dtn" "Ps"))
and keep the output the same. We can go further and also reduce the xindy
command to xindy -M texindy -M index-style question.idx
. Now we can run xindy
directly, instead of texindy
.
To continue exploring the debug options, the first thing is to have xindy
log to a file instead of throwing it away (to /dev/null
). This is done with -t question.ilg
. This log file has output like
Forming letter-groups:
Letter-group: "?? 0000001" -> "P"
Letter-group: "?? 0000010" -> "P"
and so on, which clearly correspond one-to-one with the lines from the input (question.idx
). (??
is what it displays as in my terminal, but I can open it up in a editor to see the actual bytes: e.g. less
shows the second line as Letter-group: "<C8><D0> 0000010" -> "P"
with those two characters higlighted.)
So, for some reason, it has converted input entry like \indexentry{Ps 10}{1}
into "<C8><D0> 0000010"
, which then gets mapped to letter group P
.
The --debug level=1
(or higher) option to xindy
has something relevant in the log: lines like
Add sort rule to run 0: #<ordrule: '<E4>' => '<80>' :again NIL>
Add sort rule to run 0: #<ordrule: 'A' => '<80>' :again NIL>
...
Add sort rule to run 0: #<ordrule: 'p' => '<C8>' :again NIL>
Add sort rule to run 0: #<ordrule: 'P' => '<C8>' :again NIL>
and so on. But this still doesn't explain where those sort rules are coming from. With the remaining debug options --debug script --debug keep_tmpfiles
though, we can find out: First, the command uses a filter called tex2xindy
to convert our .idx
file into a file that has lines like:
(indexentry :tkey (("Ps 1")) :locref "1")
(indexentry :tkey (("Ps 10")) :locref "1")
(entirely in Lisp syntax). This file, we will soon learn, is called the "rawindex". Then, the “core” of xindy runs with a command like:
xindy.run -M /Library/TeX/texbin/xindy.mem -E iso-8859-1 -x (progn
(searchpath ".:/usr/local/texlive/2015/texmf-dist/xindy/modules:/usr/local/texlive/2015/texmf-dist/xindy/modules/base")
(xindy:startup
:idxstyle "<tmp file 1>"
:rawindex "<above tmp file>"
:output "./question.ind"
:logfile "question.ilg"
)
(exit))
We can run this command directly, and see that it (still) produces the same .ind
file. We're getting close: looking at the tmp file above (the attribute :idxstyle
), it looks like:
(require "lang/general/latin9-lang.xdy")
(require "texindy.xdy")
(require "index-style.xdy")
The last two (of the three) lines were specified manually by our commandline xindy -M texindy -M index-style
; the first one was inserted automatically by xindy. (The latin9
does not matter here; if we had used -C utf8
we'd get lang/general/utf8-lang.xdy
here in which the problems are the same.) Looking inside that file (texmf-dist/xindy/modules/lang/general/latin9-lang.xdy
), it starts with
(require "lang/general/latin9.xdy")
and that file starts with lines like:
(define-letter-group "A" :prefixes ("<80>"))
(define-letter-group "B" :after "A" :prefixes ("<84>"))
(define-letter-group "C" :after "B" :prefixes ("<86>"))
(define-letter-group "D" :after "C" :prefixes ("<8D>"))
which give a hint as to what's going on: already by the time define-letter-group
is involved, the characters have already been transformed into bytes like "<80>"
(recall the log lines like Add sort rule to run 0: #<ordrule: 'A' => '<80>' :again NIL>
that we saw earlier). So these define-letter-group
commands are operating not on the raw input in the index file, but on these transformed bytes.
At this point, we can just use the same byte replacements as in this file, to get Solution 1: Create the file index-style.xdy
whose contents are lightly modified from your original, using the following Python script:
s = '''
(markup-locclass-list :open ": ")
(define-letter-group "Num" :prefixes ("\xbc\xe0\xbb"))
(define-letter-group "Dtn" :prefixes ("\x8d\xda\xbc") :after "Num")
(define-letter-group "Ps" :prefixes ("\xc8\xd0") :after "Dtn")
'''
open('index-style.xdy', 'w').write(s)
(This Python script just assigns the file contents as a string to the variable s
and writes out that variable to the file index-style.xdy
: you can use any language or even a shell script; the goal is just to get binary data into the file.)
Then using this index-style.xdy
gives the index you desired (and which is in the screenshot above).
The explanation, for why these transformations happen, is lower in the same file (now I'm using utf8.xdy
instead of latin9.xdy
):
(define-rule-set "xy-alphabetize"
:rules (("À" "<80>" :string)
("Ă" "<80>" :string)
("â" "<80>" :string)
...
("A" "<80>" :string)
...
))
and this rule-set is used in utf8-lang.xdy
:
(use-rule-set :run 0
:rule-set ("xy-alphabetize" "xy-ignore-special"))
because of which in the very first (0th) run, all the characters are transformed already, which is why our (define-letter-groups ("Num" "Dtn" "Ps"))
did not work. The documentation is correct about (define-letter-groups ...)
, in the sense that the feature really exists and the core of xindy
supports it. However, the default “general” configuration (which by the way, is documented in the file comments as “A general sorting order for Western European languages”), explicitly does some stuff that gets in the way, and the documentation says nothing about dealing with this.
If they had provided instead of “general” a “bare” “language” that is even more minimal than “general” (and does not do the hackery with translating characters to bytes outside the ascii range), then using (define-letter-groups
with english-alphabet characters would have just worked (as simple as giving -L bare
or whatever).
To confirm that this is the case, we can pursue an alternative approach (Solution 2), of giving an explicit file (without the first line), instead of the temporary file for idxstyle
. In this file we remove the first (require "lang/general/latin9-lang.xdy")
line and keep only the requires of texindy
and our index-style.xdy
. If we do that, e.g. I put the two lines into a file called forced-idxstyle
and changed the xindy.run
command to just
/Library/TeX/texbin/xindy.run -M /Library/TeX/texbin/xindy.mem -E iso-8859-1 -x '(progn (searchpath ".:/usr/local/texlive/2015/texmf-dist/xindy/modules:/usr/local/texlive/2015/texmf-dist/xindy/modules/base") (xindy:startup :idxstyle "./forced-idxstyle" :rawindex "<same tmp file as before>" :output "./question.ind" :logfile "question.ilg") (exit))'
it worked: our plain index-style.xdy
file which had just the two lines
(markup-locclass-list :open ": ")
(define-letter-groups ("Num" "Dtn" "Ps"))
now starts giving the exact question.ind
file we want (and we can run xelatex
and we'll get the desired typeset output in the index).
This confirms that define-letter-groups
works as documented, and would have been usable by the end user (us) if not for all the “helpful” transformations done by all of xindy's default languages, for treating accented letters the same as unaccented letters.
Actually not all: among the languages that come with xindy
there are a few that don't work with Latin-script characters, and in particular don't transform "A" (for example) into anything else:
belarusian bulgarian georgian greek hebrew klingon korean macedonian russian serbian ukrainian
Of these, if you really want a language that will never get in the way when using English, then “korean” is the most promising, because the others (even Klingon which isn't even in Unicode and uses private-area characters!) do transform some “special” characters like hyphens and semicolons. So this gives Solution 3 that I put at the top of this answer: keep your original index style, keep your original TeX file, just append -L korean
to the indexing command, and you get the index you desired. (The other languages like Klingon etc. also work.)
I solved this problem for my work without using xindy. My alternative was to use sort keys. Thus: \index{19@Psalms!0901 @9:1}
. 19
is Psalms' order in the Hebrew Bible, so this number before the @
sign sorts the books in correct order. The exclamation point (!
) makes the book itself, Psalms, the item and verses will be indexed as subitems below it. You can just remove this if you want the book title before every citation. 0901
is the sort code for 9:1 with zeroes preventing chapters from 10 (example: 11:1) being listed first. The space prevents a verse range from appearing before single verses, so 0901_
will sort ahead of 0901-07
. (This is a genius insight of Barbara Beeton's from another post).
If you have \index{05@Deuteronomy!1212 @12:12} \index{02@Exodus!0101-12@1:1-12} \index{19@Psalms!0901 @9:1}
this should now result in correct biblical order. So:
Exodus
1:1-12
Deuteronomy
12:12
Psalms
9:1
I preferred this solution myself. You may also want to check out the bibleref pacakage too. I haven't used it, but I believe it can work with indexes.