xindex - sorting local characters (ÆØÅæøå)
I've just created a new package that adds support for the Unicode collation algorithm for LuaTeX - Lua-UCA. I've already added support for few languages, like Czech, German or Norwegian. We can use it instead of Xindex
built in sorting mechanism.
Try the following version of xindex-norsk.lua
-- FILE: xindex-norsk.lua
-- DESCRIPTION: configuration file for xindex.lua
-- AUTHOR: Herbert Voß
-- MODIFIED: Sveinung Heggen (2020-01-02)
if not modules then modules = { } end modules ['xindex-cfg'] = {
version = 0.20,
comment = "configuration to xindex.lua",
author = "Herbert Voss",
copyright = "Herbert Voss",
license = "LPPL 1.3"
local ducet = require "lua-uca.lua-uca-ducet"
local collator = require "lua-uca.lua-uca-collator"
local languages = require "lua-uca.lua-uca-languages"
local collator_obj =
local language = "en" -- default language
-- language specified on the command line doesn't seem to be available
-- in the config file, so we just try to find it ourselves
for i, a in ipairs(arg) do
if a == "-l" or a=="--language" then
language = arg[i+1]
if languages[language] then
print("[Lua-UCA] Loading language: " .. language)
collator_obj = languages[language](collator_obj)
local upper = unicode.utf8.upper
escape_chars = { -- by default " is the escape char
{'""', "\\escapedquote", '\"{}' },
{'"@', "\\escapedat", "@" },
{'"|', "\\escapedvert", "|" },
{'"!', "\\escapedexcl", "!" },
{'"(', "\\escapedparenleft", "(" },
{'")', "\\escapedparenright", ")" }
itemPageDelimiter = "," -- Hello, 14
compressPages = true -- something like 12--15, instead of 12,13,14,15. the |( ... |) syntax is still valid
fCompress = true -- 3f -> page 3, 4 and 3ff -> page 3, 4, 5
minCompress = 3 -- 14--17 or
numericPage = true -- for non-numerical page numbers, like "VI-17"
sublabels = {"", "---\\,", "--\\,", "-\\,"} -- for the (sub(sub(sub-items first one is for item
pageNoPrefixDel = "" -- a delimiter for page numbers like "VI-17"
indexOpening = "" -- commands after \begin{theindex}
rangeSymbol = "--"
idxnewletter = "\\textbf" -- Only valid if -n is not set
folium = {
de = {"f.", "ff."},
en = {"f.", "ff."},
fr = {"\\,sq","\\,sqq"},
no = {"\\,f.","\\,ff."},
function UTFCompare(a,b)
local A = a["SortKey"]
local B = b["SortKey"]
return collator_obj:compare_strings(A,B)
function SORTendhook(list)
-- get the headers for letter groups
for k,v in ipairs(list) do
-- the collator:get_lowest_char will return character on the given
-- position. It will be lowercase and without accents.
local codepoints = collator_obj:string_to_codepoints(v.Entry)
local codes = collator_obj:get_lowest_char(codepoints, 1)
local sort_char = utf8.char(table.unpack(codes))
v.sortChar = upper(sort_char) -- use unicode.utf8.upper to make the char uppercase
return list
Each character's position in this array-like table determines its 'priority'.
Several characters in the same slot have the same 'priority'.
alphabet_lower = { -- for sorting
{ ' ' }, -- only for internal tests
{ 'a', 'á', 'à', },
{ 'b' },
{ 'c', 'ç' },
{ 'd' },
{ 'e', 'é', 'è', 'ë', 'ê' },
{ 'f' },
{ 'g' },
{ 'h' },
{ 'i', 'í', 'ì', 'î', 'ï' },
{ 'j' },
{ 'k' },
{ 'l' },
{ 'm' },
{ 'n', 'ñ' },
{ 'o', 'ó', 'ò', 'ô' },
{ 'p' },
{ 'q' },
{ 'r' },
{ 's', 'š', 'ß' },
{ 't' },
{ 'u', 'ú', 'ù', 'û' },
{ 'v' },
{ 'w' },
{ 'x' },
{ 'y', 'ý', 'ÿ', 'ü' },
{ 'z', 'ž' },
{ 'æ', 'œ', 'ä' },
{ 'ø', 'ö' },
{ 'å' }
alphabet_upper = { -- for sorting
{ ' ' },
{ 'A', 'Á', 'À', 'Â'},
{ 'B' },
{ 'C', 'Ç' },
{ 'D' },
{ 'E', 'È', 'É', 'Ë', 'Ê' },
{ 'F' },
{ 'G' },
{ 'H' },
{ 'I', 'Í', 'Ì', 'Ï', 'Î' },
{ 'J' },
{ 'K' },
{ 'L' },
{ 'M' },
{ 'N', 'Ñ' },
{ 'O', 'Ó', 'Ò', 'Ô' },
{ 'P' },
{ 'Q' },
{ 'R' },
{ 'S', 'Š' },
{ 'T' },
{ 'U', 'Ú', 'Ù', 'Û' },
{ 'V' },
{ 'W' },
{ 'X' },
{ 'Y', 'Ý', 'Ÿ', 'Ü' },
{ 'Z', 'Ž' },
{ 'Æ', 'Œ', 'Ä' },
{ 'Ø', 'Ö' },
{ 'Å' }
The relevant code is this:
local ducet = require "lua-uca.lua-uca-ducet"
local collator = require "lua-uca.lua-uca-collator"
local languages = require "lua-uca.lua-uca-languages"
local collator_obj =
local language = "en" -- default language
-- language specified on the command line doesn't seem to be available
-- in the config file, so we just try to find it ourselves
for i, a in ipairs(arg) do
if a == "-l" or a=="--language" then
language = arg[i+1]
if languages[language] then
print("[Lua-UCA] Loading language: " .. language)
collator_obj = languages[language](collator_obj)
local upper = unicode.utf8.upper
function UTFCompare(a,b)
local A = a["SortKey"]
local B = b["SortKey"]
return collator_obj:compare_strings(A,B)
function SORTendhook(list)
-- get the headers for letter groups
for k,v in ipairs(list) do
-- the collator:get_lowest_char will return character on the given
-- position. It will be lowercase and without accents.
local codepoints = collator_obj:string_to_codepoints(v.Entry)
local codes = collator_obj:get_lowest_char(codepoints, 1)
local sort_char = utf8.char(table.unpack(codes))
v.sortChar = upper(sort_char) -- use unicode.utf8.upper to make the char uppercase
return list
It loads the needed libraries, creates the sorting object and applies the Norwegian rules. The UTFSort
function is used by Xindex
. We redefine it to use our sorting function. I've found that sorting works, but there is one problem - the first letters are not handled correctly, so Xindex
produced separate headings for uppercase and lowercase letters. This is handled in the SORTendhook
This is the result:
With the current xindex
(version 0.23) and
xindex -u -l no -c norsk <file>
you'll get
Inserted by Sveinung 4.6.2020
Sorting order table for Nordic character according to Norwegian rules (including Sami):
A Á B C Č D Ð E F G H I J K L M N Ŋ O P Q R S Š T Ŧ U V W X Y Z Ž Æ Ä Ø Ö Å Aa
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 75
a á b c č d đ e f g h i j k l m n ŋ o p q r s š t ŧ u v w x y z ž æ ä ø ö å aa
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 76
A 1
a 2
Á 3
á 4
B 5
b 6
C 7
c 8
Č 9
č 10
D 11
d 12
Ð 13
đ 14
E 15
e 16
F 17
f 18
G 19
g 20
H 21
h 22
I 23
i 24
J 25
j 26
K 27
k 28
L 29
l 30
M 31
m 32
N 33
n 34
Ŋ 35
ŋ 36
O 37
o 38
P 39
p 40
Q 41
q 42
R 43
r 44
S 45
s 46
Š 47
š 48
T 49
t 50
Ŧ 51
ŧ 52
U 53
u 54
V 55
v 56
W 57
w 58
X 59
x 60
Y 61
y 62
Z 63
z 64
Ž 65
ž 66
Æ 67
æ 68
Ä 69
ä 70
Ø 71
ø 72
Ö 73
ö 74
Å 75
Aa 75
å 76
aa 76