How to letter space (German) abbreviations automatically
(I rewrote this answer rather significantly after discussions with OP and after receiving very significant coding help from @EgorSkriptunoff)
Here's a solution that doesn't pre-specify a list of all abbreviations for which thinspace should be inserted after interior periods (aka "full stops"). Instead, it sets up a pattern matching function to capture u.a.
, u.a.m.
, u.v.a.m.
, z.Zt.
, Bem.d.Red.
and many more such cases. (See the function insert_thinspaces
in the code below for the exact pattern matches that are performed.)
Observe also the use of unicode.utf8.gsub
instead of string.gsub
inside the Lua function insert_thinspaces
. This lets the code deal correctly with non-ASCII-encoded letters, such as ä
and Ä
, which may occur in abbreviations.
On the downside (potentially), this solution method doesn't capture abbreviations if they occur at the start of a sentence, e.g., Z.T.
or U.U.
; for what it's worth, your parallel answer currently doesn't catch such cases either, right?
The Lua function is assigned to the process_input_buffer
callback via a LaTeX macro called \ExpandAbbrOn
. If, for any reason, you need to suspend operation of the Lua function, simply execute the instruction \ExpandAbbrOff
.
The code checks if the string to be processed lies inside a verbatim-like environment such as verbatim
, Verbatim
, and lstlisting
; if that's the case, no processing is performed. And, with the latest iteration, the code now also ignores material that's in the arguments of inline-verbatim-like macros, such as \verb
, \Verb
, \lstinline
, and \url
. For sure, the contents of URL strings should never be processed by the Lua function, right?
% !TeX program = lualatex
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{fancyvrb} % for "Verbatim" env.
\usepackage[obeyspaces]{url} % for "\url" macro
\usepackage{listings} % for "\lstinline" macro
%% Lua-side code:
\usepackage{luacode} % for 'luacode*' environment
\begin{luacode*}
-- Names of verbatim-like environments
local verbatim_env = { "[vV]erbatim" , "lstlisting" }
-- By default, we're *not* in a verbatim-like env.:
local in_verbatim_env = false
-- Specify number of parameters for every macro; use neg. numbers
-- for macros that support matching pair of curly braces {}
local all_macros = {
verb = 1,
Verb = 1,
lstinline = -1,
url = -1
}
-- List all poss. delimiters
local all_delimiters = [[!"#$%&'*+,-./:;<=>?^_`|~()[]{}0123456789]]
-- Quick check if "s" contains an inline-verbatim-like macro:
function quick_check ( s )
if s:find("\\[vV]erb") or s:find("\\url") or s:find("\\lstinline") then
return true
else
return false
end
end
-- Function to process the part of string "s" that
-- does *not* contain inline-verbatim-like macros
local function insert_thinspaces ( s )
s = unicode.utf8.gsub ( s ,
"(%l%.)(%a%l?%l?%.)(%a%l?%l?%.)(%a%l?%l?%.)",
"%1\\,%2\\,%3\\,%4" ) -- e.g. "u.v.a.m.", "w.z.b.w."
s = unicode.utf8.gsub ( s ,
"(%l%.)(%a%l?%l?%.)(%a%l?%l?%.)",
"%1\\,%2\\,%3" ) -- e.g., "a.d.Gr."
s = unicode.utf8.gsub ( s ,
"(%u%l%l?%.)(%a%l?%l?%.)(%a%l?%l?%.)",
"%1\\,%2\\,%3" ) -- e.g., "Anm.d.Red."
s = unicode.utf8.gsub ( s ,
"(%l%.)(%a%l?%l?%.)",
"%1\\,%2" ) -- e.g., "z.T.", "z.Zt.", "v.Chr."
return s
end
-- Finally, the main Lua function:
function expand_abbr ( s )
-- Check if we're entering or exiting a verbatim-like env.;
-- if so, reset the 'in_verbatim_env' "flag" and break.
for i,p in ipairs ( verbatim_env ) do
if s:find( "\\begin{" .. p .. "}" ) then
in_verbatim_env = true
break
elseif s:find( "\\end{" .. p .. "}" ) then
in_verbatim_env = false
break
end
end
-- Potentially modify "s" only if *not* in a verbatim-like env.:
if not in_verbatim_env then
-- Quick check if "s" contains one or more inlike-verbatim-like macros:
if quick_check ( s ) then
-- See https://stackoverflow.com/a/45688711/1014365 for the source
-- of the following code. Many many thanks, @EgorSkriptunoff!!
s = s:gsub("\\([%a@]+)",
function(macro_name)
if all_macros[macro_name] then
return
"\1\\"..macro_name
..(all_macros[macro_name] < 0 and "\2" or "\3")
:rep(math.abs(all_macros[macro_name]) + 1)
end
end
)
repeat
local old_length = #s
repeat
local old_length = #s
s = s:gsub("\2(\2+)(%b{})", "%2%1")
until old_length == #s
s = s:gsub("[\2\3]([\2\3]+)((["..all_delimiters:gsub("%p", "%%%0").."])(.-)%3)", "%2%1")
until old_length == #s
s = ("\2"..s.."\1"):gsub("[\2\3]+([^\2\3]-)\1", insert_thinspaces):gsub("[\1\2\3]", "")
else
-- Since no inline-verbatim-like macro found in "s", invoke
-- the Lua function 'insert_thinspaces' directly.
s = insert_thinspaces ( s )
end
end
return(s)
end
\end{luacode*}
%% LaTeX-side code: Macros to assign 'expand_abbr'
%% to LuaTeX's 'process_input_buffer' callback.
\newcommand\ExpandAbbrOn{\directlua{%
luatexbase.add_to_callback("process_input_buffer",
expand_abbr, "expand_abbreviations")}}
\newcommand\ExpandAbbrOff{\directlua{%
luatexbase.remove_from_callback("process_input_buffer",
"expand_abbreviations")}}
\AtBeginDocument{\ExpandAbbrOn} % enabled by default
%% Just for this example:
\setlength\parindent{0pt}
\obeylines
\begin{document}
Dies ist u.U. ein Test.
\begin{Verbatim}
Dies ist u.U. ein Test.
\end{Verbatim}
z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e. v.a.
i.e.S. z.T. m.E. i.d.F. z.Z. u.v.m. z.Zt.
u.v.a.m. b.z.b.w. v.Chr. a.d.Gr. Anm.d.Red.
\begin{verbatim}
z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e. v.a.
i.e.S. z.T. m.E. i.d.F. z.Z. u.v.m. z.Zt.
u.v.a.m. b.z.b.w. v.Chr. a.d.Gr. Anm.d.Red.
\end{verbatim}
U.S.A. U.K. % should *not* be processed
\lstinline|u.a. u.Ä. u.ä.|; \Verb$u.a. u.Ä. u.ä.$
% nested verbatim-like macros
\Verb+u.U.\lstinline|u.U.|u.U.+ \lstinline+u.U.\[email protected][email protected].+
% 2 URL strings
u.U. \url{u.U.aaaa.z.T.bbb_u.v.a.m.com} u.U.
u.U. \url?u.U.aaaa.z.T.bbb_u.v.a.m.com? u.U.
\end{document}
Here's a simple solution to the problem. It uses the Lua callback process_input_buffer
to scan each input line for one of the given abbreviations and insert a small space in there.
For that action you only need to specify the abbreviation (table key) you want to replace with a spaced version (table value). That mechanism can, of course, be used to simply replace any content (table key) with some other (table value).
This solution also enables you to use verbatim input, but you have to make the verbatim environment known to the script. If the script wouldn't check for verbatim environments those changes would be visible to the reader.
You should note that the function works, but may not be the fastest as it always checks every line whether it is a verbatim start/end for every verbatim environment known to the script.
Update: I have simplified user input (dicitonary) by generating the required spaces automatically. The most problematic part mentioned in the discussions are abbreviations at sentence start and in inline verbatim. The first one may be handled with this version (check for a dot, a space and then the abbreviation with capitalized letter) but the second one may be very hard to detect and change.
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{luacode}
\begin{luacode}
local tabbr = {"z.B.","u.a.","u.Ä.","u.ä.","u.U.","a.a.O.","d.h.","i.e.",
"i.e.S.","v.a.","z.T.","m.E.","i.d.F.","z.Z.","u.v.m.","u.v.a.m.",
"z.Hd."}
local verbenv = {"[vV]erbatim","lstlisting"}
local tsub = {}
local inverb = false
function createsubstitutes()
for i,p in ipairs(tabbr) do
tsub[p] = unicode.utf8.gsub(p:sub(1,p:len()-1), "%.", ".\\,") .. "."
end
end
function expandabbr(s)
for i,p in ipairs(verbenv) do
if s:find("\\begin{" .. p .. "}") then
inverb = true
break
end
if s:find("\\end{" .. p .. "}") then
inverb = false
break
end
end
if not inverb then
for k,v in pairs(tsub) do
s = unicode.utf8.gsub(s, k:gsub("(%.)","%%%1"), v)
end
end
return(s)
end
\end{luacode}
\AtBeginDocument{%
\luaexec{createsubstitutes()}
\luaexec{luatexbase.add_to_callback("process_input_buffer", expandabbr, "expand_abbreviations")}%
}
\begin{document}
Dies ist z.B. ein Test.\\
In dieser Zeile gibt es z.B. \verb+z.B.+
\begin{verbatim}
Test z.B.
\end{verbatim}
\end{document}
Just a minor tweak to @Mico's answer. I put the replacement in a loop to match abbreviations of unknown length. The drawback is, that each chunk of a abbreviation must meet one of the defined patterns.
For example o.B.d.A.
would get evaluated by the second gsub
pattern "(%l.)(%a)"
and replaced by o.\,B.d.\,A.
. In the next loop the pattern wouldn't match, because the chunk B.d.
doesn't. I guess the combination of the two patterns should match almost every abbreviation but doesn't create too many false-positives.
Another tweak I made was to check only the closing verbatim-name which matches the first opening. With this a construct of several nested verbatim environments is evaluated correctly. Inline verbatim is still missing though as well as other exceptions like \url
.
EDIT: I also wrote a function, that parses inline verbatim correctly, but only if there is only one kind of inline verbatim command per line and if it ends in the same line.
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{fancyvrb} % for "Verbatim" environment
\usepackage{luacode}
\usepackage{url}
%% Lua-side code:
\begin{luacode}
local verbatim_env = { "verbatim" , "Verbatim" , "lstlisting" }
local verbatim_inl = { "verb" , "lstinline" }
-- by default, *not* in a verbatim-like env.:
local cur_verbatim_env = nil
local cur_verbatim_inl = nil
function replace_abbrs ( s )
local rep_rep1 = 1
local rep_rep2 = 1
while rep_rep1 ~= 0 or rep_rep2 ~= 0 do
s,rep_rep1 = unicode.utf8.gsub ( s , "(%l%.)(%a%.)(%a)", "%1\\,%2\\,%3" )
s,rep_rep2 = unicode.utf8.gsub ( s , "(%l%.)(%a)", "%1\\,%2" )
end
return(s)
end
function expand_inline_verb ( s , p )
local r = ""
while string.len(s) > 0 do
local spos,epos = s:find( p.."%A" )
if spos ~= nil then
r = r .. replace_abbrs(s:sub(0,spos-1))
r = r .. s:sub(spos,epos)
local delim = s:sub(epos,epos)
s = s:sub(epos+1 , string.len(s))
local verb_end = s:find( delim )
r = r .. s:sub(0,verb_end)
s = s:sub(verb_end+1 , string.len(s))
else
r = r .. replace_abbrs(s)
break
end
end
return(r)
end
function expandabbr ( s )
if cur_verbatim_env == nil then
for i,p in ipairs ( verbatim_env ) do
if s:find( "\\begin{" .. p .. "}" ) then
cur_verbatim_env = verbatim_env[i]
break
end
end
elseif s:find( "\\end{" .. cur_verbatim_env .. "}" ) then
cur_verbatim_env = nil
end
if cur_verbatim_env == nil and cur_verbatim_inl == nil then
for i,p in ipairs ( verbatim_inl ) do
pos = s:find( "\\" .. p )
if pos ~= nil then
cur_verbatim_inl = s[pos+string.len(p)+1]
break
end
end
elseif cur_veratim_inl ~= nil then
if s:find( cur_veratim_inl ) then
cur_verbatim_inl = nil
end
end
if cur_verbatim_env == nil then
for i,p in ipairs ( verbatim_inl ) do
if s:find( p ) then
return(expand_inline_verb( s , p ))
end
end
s = replace_abbrs ( s )
end
return(s)
end
\end{luacode}
%% LaTeX-side code:
\newcommand\ExpandAbbrOn{\directlua{%
luatexbase.add_to_callback("process_input_buffer",
expandabbr, "expand_abbreviations")}}
\newcommand\ExpandAbbrOff{\directlua{%
luatexbase.remove_from_callback("process_input_buffer",
"expand_abbreviations")}}
\AtBeginDocument{\ExpandAbbrOn} % enabled by default
%% Just for this example:
\setlength\parindent{0pt}
\obeylines
\begin{document}
Dies ist u.U. ein Test.
\begin{Verbatim}
Dies ist u.U. ein Test.
\end{Verbatim}
\begin{Verbatim}
\begin{verbatim}
\end{verbatim}
Dies ist u.U. ein schwierigerer Test.
\end{Verbatim}
z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e.
i.e.S. v.a. z.T. m.E. i.d.F. z.Z. u.v.m.
z.Zt. o.B.d.A. a.d.Gr. n.Chr. Anm.d.Red.
\verb|z.Zt. o.B.d.A. a.d.Gr. n.Chr. Anm.d.Red.|
\begin{verbatim}
z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e.
i.e.S. v.a. z.T. m.E. i.d.F. z.Z. u.v.m.
\end{verbatim}
U.S.A. U.K.
\ExpandAbbrOff % turn off abbreviation expansion
A tricky URL: \url{u.U.aaaa.z.T.bbb}
\end{document}