How can I use \dtlsort with different alphabets?
This problem might be similar to that of sorting index entries. When you want to force a specific sort placement (indirectly, a sort order), you have to provide the sorting algorithm with some help.
In your application, I would supply a regular English alphabet replacement or Norsk English that will be used to sort entries on. To that end, create an additional field in your database where you replace each æ
with za
, each ø
with zb
and each å
with zc
(say), in order to have them sorted sequentially after z
.
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{datatool,xparse}
\NewDocumentCommand{\vocab}{m o m m}{%\parskip=0pt \textbf{#1} (#2) #3~\par%
\DTLnewrow{vocabsort}%
\DTLnewdbentry{vocabsort}{norsk}{#1}% Norsk
\IfNoValueF{#2}{\DTLnewdbentry{vocabsort}{norsksort}{#2}}% Norsk sort
\DTLnewdbentry{vocabsort}{class}{#3}% Class
\DTLnewdbentry{vocabsort}{english}{#4}% English
}
\newenvironment{vocabsection}[1]{\noindent \textbf{#1} \par \medskip
\DTLifdbexists{vocabsort}{\DTLcleardb{vocabsort}}{\DTLnewdb{vocabsort}}%
}%
{ \dtlsort[norsk]{norsksort}{vocabsort}{\dtlicompare}%
\displaysorteddb
}
\newcommand{\displaysorteddb}{%
\begin{DTLenvforeach*}{vocabsort}{\norsk=norsk, \class=class, \english=english}
\textbf{\norsk} (\class) \english~\par
\end{DTLenvforeach*}
}
\begin{document}
\begin{vocabsection}{Body Vocab}
\vocab{hør}[hzbr]{n.}{hair} % ø > zb
\vocab{hode}{n.}{head}
\vocab{øye}[zbye]{n.}{eyes} % ø > zb
\vocab{øre}[zbre]{n.}{ear} % ø > zb
\vocab{hake}{n.}{chin}
\vocab{nakke}{n.}{neck}
\vocab{hånd}[hzcnd]{n.}{hand}% å > zc
\vocab{finger}{n.}{finger}
\vocab{arm}{n.}{arm}
\end{vocabsection}
\end{document}
The sort order is supplied as an optional second argument using the interface
\vocab{<norsk>}[<norsk sort>]{<class>}{<english>}
The addition of <norsk sort>
into the database is conditional on its existence. The macro
\dtlsort[norsk]{norsksort}{vocabsort}{\dtlicompare}
uses the field norsksort
as sorting preference in the database vocabsort
(using a case i
nsensitive compare
), and norsk
only if norsksort
is absent (null or \DTLstringnull
).
As mentioned, this is no different than using the a@b
suggestion with \index
; that is, place b
in the index, but sort it as a
(see section 2.2 The Basics/p 4 of the makeindex
documentation). The typical usage would involve something like \index{alpha@$\alpha$}
. One could, of course, also change the above \vocab
interface to take a similar input, if needed.
(You'll need version datatool
v2.25 for this to work. I've only just uploaded it to CTAN, so it make take a day or so to reach the mirrors and TeX distributions.)
The comparison handlers provided by datatool
(actually defined in datatool-base.sty
) compare two strings by assigning a numerical code to each element of the strings. The code is assigned as follows:
- If empty, the code is set to -1.
- If the character (or macro) is identified as a word break, the code is set to the same as the character code for the space character (32).
- If the "character" is actually a macro (command), the code is set to 0. As from v2.24, if this macro is identified as the first octet of a UTF-8 character (and the UTF-8 option has been enabled), then the code is instead obtained using
\dtlsetUTFviiicharcode{octets}{register}
(for case-sensitive comparisons) or\dtlsetUTFviiilccharcode{octets}{register}
(for case-insensitive comparisons). - If none of the above apply, the code is set using
\dtlsetcharcode{character}{register}
(for case-sensitive comparisons) or\dtlsetlccharcode{character}{register}
(for case-insensitive comparisons). These commands were introduced in v2.24. Earlier versions simply set the character code using TeX's backtick method (with\lccode
for case-insensitive comparisons).
(Note that although these new commands were introduced in v2.24, the more recent v2.25 fixes a bug that affects this answer.)
There are therefore two approaches that depend on the document's input encoding.
Latin-1
(This should also work with other encodings where each character is viewed as a single TeX token.)
The two commands that are relevant in this case are \dtlsetcharcode
and \dtlsetlccharcode
. The default definitions are:
\newcommand*{\dtlsetcharcode}[2]{#2=`#1\relax}
\newcommand*{\dtlsetlccharcode}[2]{#2=\lccode`#1\relax}
(The first argument is the character and the second is a count register in which to store the character code.) Using this method, the codes for the Norwegian characters are as follows:
Å 197
Æ 198
Ø 216
å 229
æ 230
ø 248
(For reference A
-> 65, Z
-> 90, a
-> 97, z
-> 122.) So this puts all the Norwegian characters after z
in the above order. If you want to change this order, you need to redefine \dtlsetcharcode
and \dtlsetlccharcode
. For example, for the Norwegian alphabet:
\renewcommand*{\dtlsetcharcode}[2]{%
\ifstrequal{#1}{æ}%
{%
#2=123\relax
}%
{%
\ifstrequal{#1}{ø}%
{%
#2=124\relax
}%
{%
\ifstrequal{#1}{å}%
{%
#2=125\relax
}%
{%
\ifstrequal{#1}{Æ}%
{%
#2=91\relax
}%
{%
\ifstrequal{#1}{Ø}%
{%
#2=92\relax
}%
{%
\ifstrequal{#1}{Å}%
{%
#2=93\relax
}%
{%
#2=`#1\relax
}%
}%
}%
}%
}%
}%
}
\renewcommand*{\dtlsetlccharcode}[2]{%
\ifstrequal{#1}{æ}%
{%
#2=123\relax
}%
{%
\ifstrequal{#1}{ø}%
{%
#2=124\relax
}%
{%
\ifstrequal{#1}{å}%
{%
#2=125\relax
}%
{%
\ifstrequal{#1}{Æ}%
{%
#2=123\relax
}%
{%
\ifstrequal{#1}{Ø}%
{%
#2=124\relax
}%
{%
\ifstrequal{#1}{Å}%
{%
#2=125\relax
}%
{%
#2=\lccode`#1\relax
}%
}%
}%
}%
}%
}%
}
These use the etoolbox
conditional \ifstrequal
for clarity. There are more efficient ways of doing this using TeX's conditionals, such as \ifnum
. For example:
\renewcommand*{\dtlsetcharcode}[2]{%
#2=`#1\relax
\ifnum#2=230\relax
#2=123\relax
\else
\ifnum#2=248\relax
#2=124\relax
\else
\ifnum#2=229\relax
#2=125\relax
\else
\ifnum#2=198\relax
#2=91\relax
\else
\ifnum#2=216\relax
#2=92\relax
\else
\ifnum#2=197\relax
#2=93\relax
\fi
\fi
\fi
\fi
\fi
\fi
}
\renewcommand*{\dtlsetlccharcode}[2]{%
#2=\lccode`#1\relax
\ifnum#2=230\relax
#2=123\relax
\else
\ifnum#2=248\relax
#2=124\relax
\else
\ifnum#2=229\relax
#2=125\relax
\fi
\fi
\fi
}
These redefinitions can be added to your example:
\documentclass[11pt]{memoir}
\usepackage{multicol}
\usepackage[latin1]{inputenc}
\usepackage{datatool}
\renewcommand*{\dtlsetcharcode}[2]{%
#2=`#1\relax
\ifnum#2=230\relax
#2=123\relax
\else
\ifnum#2=248\relax
#2=124\relax
\else
\ifnum#2=229\relax
#2=125\relax
\else
\ifnum#2=198\relax
#2=91\relax
\else
\ifnum#2=216\relax
#2=92\relax
\else
\ifnum#2=197\relax
#2=93\relax
\fi
\fi
\fi
\fi
\fi
\fi
}
\renewcommand*{\dtlsetlccharcode}[2]{%
#2=\lccode`#1\relax
\ifnum#2=230\relax
#2=123\relax
\else
\ifnum#2=248\relax
#2=124\relax
\else
\ifnum#2=229\relax
#2=125\relax
\fi
\fi
\fi
}
\newcommand{\vocab}[3]{%
\DTLnewrow{vocabsort}%
\DTLnewdbentry{vocabsort}{norsk}{#1}%
\DTLnewdbentry{vocabsort}{class}{#2}%
\DTLnewdbentry{vocabsort}{english}{#3}%
}
\newenvironment{vocabsection}[1]{\noindent \hspace{1em} \textbf{#1} %
\DTLifdbexists{vocabsort}{\DTLcleardb{vocabsort}}{\DTLnewdb{vocabsort}}%
}%
{ \dtlsort{norsk}{vocabsort}{\dtlicompare}%
\displaysorteddb
}
\newcommand{\displaysorteddb}{%
\begin{multicols}{2}
\begin{hangparas}{0.5em}{1}
\begin{DTLenvforeach*}{vocabsort}{\norsk=norsk, \class=class, \english=english}
\textbf{\norsk} (\class) \english~\par
\end{DTLenvforeach*}
\end{hangparas}
\end{multicols}
}
\begin{document}
\subsection{Foo}
\begin{vocabsection}{Body Vocab}
\vocab{hør}{n.}{hair}
\vocab{hode}{n.}{head}
\vocab{øye}{n.}{eyes}
\vocab{øre}{n.}{ear}
\vocab{hake}{n.}{chin}
\vocab{nakke}{n.}{neck}
\vocab{hånd}{n.}{hand}
\vocab{finger}{n.}{finger}
\vocab{arm}{n.}{arm}
\end{vocabsection}
\end{document}\newcommand{\displaysorteddb}{%
\begin{multicols}{2}
\begin{hangparas}{0.5em}{1}
\begin{DTLenvforeach*}{vocabsort}{\norsk=norsk, \class=class, \english=english}
\textbf{\norsk} (\class) \english~\par
\end{DTLenvforeach*}
\end{hangparas}
\end{multicols}
}
\begin{document}
\subsection{Foo}
\begin{vocabsection}{Body Vocab}
\vocab{hør}{n.}{hair}
\vocab{hode}{n.}{head}
\vocab{øye}{n.}{eyes}
\vocab{øre}{n.}{ear}
\vocab{hake}{n.}{chin}
\vocab{nakke}{n.}{neck}
\vocab{hånd}{n.}{hand}
\vocab{finger}{n.}{finger}
\vocab{arm}{n.}{arm}
\end{vocabsection}
\end{document}
This produces:
which has the desired ordering: hake, hode, hør, hånd.
UTF-8
UTF-8 support is enabled by loading inputenc
with the utf8
option before loading datatool-base
(which is automatically loaded by datatool
). You can also enable or disable it later with \dtlenableUTFviii
or \dtldisableUTFviii
.
In this case, \dtlsetUTFviiicharcode
and \dtlsetUTFviiilccharcode
are used to set the character code where the first argument consists if the two octet tokens that make up the UTF-8 character. The default definitions of these commands are:
\newcommand*\dtlsetUTFviiicharcode[2]{\dtlsetdefaultUTFviiicharcode{#1}{#2}}
\newcommand*\dtlsetUTFviiilccharcode[2]{\dtlsetdefaultUTFviiilccharcode{#1}{#2}}
These default commands set the code for some common supplemental accented Latin characters to be the same as the code for the unaccented version. (For example, é
is given the same character code as e
. So a UTF-8 version of your example document is:
\documentclass[11pt]{memoir}
\usepackage{multicol}
\usepackage[utf8]{inputenc}
\usepackage{datatool}
\renewcommand*{\dtlsetUTFviiicharcode}[2]{%
\ifstrequal{#1}{Æ}%
{%
#2=91\relax
}%
{%
\ifstrequal{#1}{Ø}%
{%
#2=92\relax
}%
{%
\ifstrequal{#1}{Å}%
{%
#2=93\relax
}%
{%
\ifstrequal{#1}{æ}%
{%
#2=123\relax
}%
{%
\ifstrequal{#1}{ø}%
{%
#2=124\relax
}%
{%
\ifstrequal{#1}{å}%
{%
#2=125\relax
}%
{%
\dtlsetdefaultUTFviiicharcode{#1}{#2}%
}%
}%
}%
}%
}%
}%
}
\renewcommand*{\dtlsetUTFviiilccharcode}[2]{%
\ifstrequal{#1}{Æ}%
{%
#2=123\relax
}%
{%
\ifstrequal{#1}{Ø}%
{%
#2=124\relax
}%
{%
\ifstrequal{#1}{Å}%
{%
#2=125\relax
}%
{%
\ifstrequal{#1}{æ}%
{%
#2=123\relax
}%
{%
\ifstrequal{#1}{ø}%
{%
#2=124\relax
}%
{%
\ifstrequal{#1}{å}%
{%
#2=125\relax
}%
{%
\dtlsetdefaultUTFviiilccharcode{#1}{#2}%
}%
}%
}%
}%
}%
}%
}
\newcommand{\vocab}[3]{%
\DTLnewrow{vocabsort}%
\DTLnewdbentry{vocabsort}{norsk}{#1}%
\DTLnewdbentry{vocabsort}{class}{#2}%
\DTLnewdbentry{vocabsort}{english}{#3}%
}
\newenvironment{vocabsection}[1]{\noindent \hspace{1em} \textbf{#1} %
\DTLifdbexists{vocabsort}{\DTLcleardb{vocabsort}}{\DTLnewdb{vocabsort}}%
}%
{ \dtlsort{norsk}{vocabsort}{\dtlicompare}%
\displaysorteddb
}
\newcommand{\displaysorteddb}{%
\begin{multicols}{2}
\begin{hangparas}{0.5em}{1}
\begin{DTLenvforeach*}{vocabsort}{\norsk=norsk, \class=class, \english=english}
\textbf{\norsk} (\class) \english~\par
\end{DTLenvforeach*}
\end{hangparas}
\end{multicols}
}
\begin{document}
\subsection{Foo}
\begin{vocabsection}{Body Vocab}
\vocab{hør}{n.}{hair}
\vocab{hode}{n.}{head}
\vocab{øye}{n.}{eyes}
\vocab{øre}{n.}{ear}
\vocab{hake}{n.}{chin}
\vocab{nakke}{n.}{neck}
\vocab{hånd}{n.}{hand}
\vocab{finger}{n.}{finger}
\vocab{arm}{n.}{arm}
\end{vocabsection}
\end{document}
This produces the same result: