How to letter space (German) abbreviations automatically

(I rewrote this answer rather significantly after discussions with OP and after receiving very significant coding help from @EgorSkriptunoff)

Here's a solution that doesn't pre-specify a list of all abbreviations for which thinspace should be inserted after interior periods (aka "full stops"). Instead, it sets up a pattern matching function to capture u.a., u.a.m., u.v.a.m., z.Zt., Bem.d.Red. and many more such cases. (See the function insert_thinspaces in the code below for the exact pattern matches that are performed.)

Observe also the use of unicode.utf8.gsub instead of string.gsub inside the Lua function insert_thinspaces. This lets the code deal correctly with non-ASCII-encoded letters, such as ä and Ä, which may occur in abbreviations.

On the downside (potentially), this solution method doesn't capture abbreviations if they occur at the start of a sentence, e.g., Z.T. or U.U.; for what it's worth, your parallel answer currently doesn't catch such cases either, right?

The Lua function is assigned to the process_input_buffer callback via a LaTeX macro called \ExpandAbbrOn. If, for any reason, you need to suspend operation of the Lua function, simply execute the instruction \ExpandAbbrOff.

The code checks if the string to be processed lies inside a verbatim-like environment such as verbatim, Verbatim, and lstlisting; if that's the case, no processing is performed. And, with the latest iteration, the code now also ignores material that's in the arguments of inline-verbatim-like macros, such as \verb, \Verb, \lstinline, and \url. For sure, the contents of URL strings should never be processed by the Lua function, right?

enter image description here

% !TeX program = lualatex
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{fancyvrb}        % for "Verbatim" env.
\usepackage[obeyspaces]{url} % for "\url" macro
\usepackage{listings}        % for "\lstinline" macro

%% Lua-side code:
\usepackage{luacode} % for 'luacode*' environment
\begin{luacode*}

-- Names of verbatim-like environments
local verbatim_env = { "[vV]erbatim" , "lstlisting" }

-- By default, we're *not* in a verbatim-like env.:
local in_verbatim_env = false 

-- Specify number of parameters for every macro; use neg. numbers 
-- for macros that support matching pair of curly braces {} 
local all_macros = {
          verb = 1,
          Verb = 1,
          lstinline = -1,
          url = -1
}

-- List all poss. delimiters
local all_delimiters = [[!"#$%&'*+,-./:;<=>?^_`|~()[]{}0123456789]]

-- Quick check if "s" contains an inline-verbatim-like macro:
function quick_check ( s )
  if s:find("\\[vV]erb") or s:find("\\url") or s:find("\\lstinline") then
    return true
  else
    return false
  end
end

-- Function to process the part of string "s" that 
-- does *not* contain inline-verbatim-like macros
local function insert_thinspaces ( s ) 
  s = unicode.utf8.gsub ( s , 
      "(%l%.)(%a%l?%l?%.)(%a%l?%l?%.)(%a%l?%l?%.)", 
      "%1\\,%2\\,%3\\,%4" ) -- e.g. "u.v.a.m.", "w.z.b.w."
  s = unicode.utf8.gsub ( s , 
      "(%l%.)(%a%l?%l?%.)(%a%l?%l?%.)", 
      "%1\\,%2\\,%3" ) -- e.g., "a.d.Gr." 
  s = unicode.utf8.gsub ( s , 
      "(%u%l%l?%.)(%a%l?%l?%.)(%a%l?%l?%.)", 
      "%1\\,%2\\,%3" ) -- e.g., "Anm.d.Red."
  s = unicode.utf8.gsub ( s , 
      "(%l%.)(%a%l?%l?%.)", 
      "%1\\,%2" ) -- e.g., "z.T.", "z.Zt.", "v.Chr."
  return s
end

-- Finally, the main Lua function:
function expand_abbr ( s )
  -- Check if we're entering or exiting a verbatim-like env.;
  -- if so, reset the 'in_verbatim_env' "flag" and break.
  for i,p in ipairs ( verbatim_env ) do
    if s:find( "\\begin{" .. p .. "}" ) then
      in_verbatim_env = true
      break
    elseif s:find( "\\end{" .. p .. "}" ) then
      in_verbatim_env = false
      break
    end
  end
  -- Potentially modify "s" only if *not* in a verbatim-like env.:
  if not in_verbatim_env then
    -- Quick check if "s" contains one or more inlike-verbatim-like macros:
    if quick_check ( s ) then
      -- See https://stackoverflow.com/a/45688711/1014365 for the source
      -- of the following code. Many many thanks, @EgorSkriptunoff!!
      s = s:gsub("\\([%a@]+)",
        function(macro_name)
          if all_macros[macro_name] then
            return
              "\1\\"..macro_name
              ..(all_macros[macro_name] < 0 and "\2" or "\3")
              :rep(math.abs(all_macros[macro_name]) + 1)
          end
        end
        )
      repeat
        local old_length = #s
        repeat
          local old_length = #s
          s = s:gsub("\2(\2+)(%b{})", "%2%1")
        until old_length == #s
        s = s:gsub("[\2\3]([\2\3]+)((["..all_delimiters:gsub("%p", "%%%0").."])(.-)%3)", "%2%1")
      until old_length == #s
      s = ("\2"..s.."\1"):gsub("[\2\3]+([^\2\3]-)\1", insert_thinspaces):gsub("[\1\2\3]", "")
    else
      -- Since no inline-verbatim-like macro found in "s", invoke 
      -- the Lua function 'insert_thinspaces' directly.
      s = insert_thinspaces ( s )
    end
  end
  return(s)
end

\end{luacode*}

%% LaTeX-side code: Macros to assign 'expand_abbr' 
%% to LuaTeX's 'process_input_buffer' callback.
\newcommand\ExpandAbbrOn{\directlua{%
  luatexbase.add_to_callback("process_input_buffer", 
  expand_abbr, "expand_abbreviations")}}
\newcommand\ExpandAbbrOff{\directlua{%
  luatexbase.remove_from_callback("process_input_buffer", 
  "expand_abbreviations")}}
\AtBeginDocument{\ExpandAbbrOn} % enabled by default

%% Just for this example:
\setlength\parindent{0pt}
\obeylines

\begin{document}

Dies ist u.U. ein Test.
\begin{Verbatim}
Dies ist u.U. ein Test.
\end{Verbatim}

z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e. v.a.
i.e.S. z.T. m.E. i.d.F. z.Z. u.v.m. z.Zt.
u.v.a.m. b.z.b.w. v.Chr. a.d.Gr. Anm.d.Red.

\begin{verbatim}
z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e. v.a.
i.e.S. z.T. m.E. i.d.F. z.Z. u.v.m. z.Zt.
u.v.a.m. b.z.b.w. v.Chr. a.d.Gr. Anm.d.Red.
\end{verbatim}

U.S.A. U.K. % should *not* be processed

\lstinline|u.a. u.Ä. u.ä.|; \Verb$u.a. u.Ä. u.ä.$

% nested verbatim-like macros
\Verb+u.U.\lstinline|u.U.|u.U.+  \lstinline+u.U.\[email protected][email protected].+

% 2 URL strings
u.U. \url{u.U.aaaa.z.T.bbb_u.v.a.m.com} u.U.
u.U. \url?u.U.aaaa.z.T.bbb_u.v.a.m.com? u.U.
\end{document}

Here's a simple solution to the problem. It uses the Lua callback process_input_buffer to scan each input line for one of the given abbreviations and insert a small space in there.

For that action you only need to specify the abbreviation (table key) you want to replace with a spaced version (table value). That mechanism can, of course, be used to simply replace any content (table key) with some other (table value).

This solution also enables you to use verbatim input, but you have to make the verbatim environment known to the script. If the script wouldn't check for verbatim environments those changes would be visible to the reader.

You should note that the function works, but may not be the fastest as it always checks every line whether it is a verbatim start/end for every verbatim environment known to the script.

Update: I have simplified user input (dicitonary) by generating the required spaces automatically. The most problematic part mentioned in the discussions are abbreviations at sentence start and in inline verbatim. The first one may be handled with this version (check for a dot, a space and then the abbreviation with capitalized letter) but the second one may be very hard to detect and change.

\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{luacode}
\begin{luacode}

local tabbr = {"z.B.","u.a.","u.Ä.","u.ä.","u.U.","a.a.O.","d.h.","i.e.",
  "i.e.S.","v.a.","z.T.","m.E.","i.d.F.","z.Z.","u.v.m.","u.v.a.m.",
  "z.Hd."}
local verbenv = {"[vV]erbatim","lstlisting"}
local tsub = {}
local inverb = false

function createsubstitutes()
  for i,p in ipairs(tabbr) do
    tsub[p] = unicode.utf8.gsub(p:sub(1,p:len()-1), "%.", ".\\,") .. "."
  end
end

function expandabbr(s)
    for i,p in ipairs(verbenv) do
      if s:find("\\begin{" .. p .. "}") then
        inverb = true
        break
      end
      if s:find("\\end{" .. p .. "}") then
        inverb = false
        break
      end
    end
    if not inverb then
      for k,v in pairs(tsub) do
        s = unicode.utf8.gsub(s, k:gsub("(%.)","%%%1"), v)
      end
    end
  return(s)
end

\end{luacode}
\AtBeginDocument{%
  \luaexec{createsubstitutes()}
  \luaexec{luatexbase.add_to_callback("process_input_buffer", expandabbr, "expand_abbreviations")}%
}
\begin{document}
Dies ist z.B. ein Test.\\
In dieser Zeile gibt es z.B. \verb+z.B.+
\begin{verbatim}
Test z.B.
\end{verbatim}
\end{document}

Just a minor tweak to @Mico's answer. I put the replacement in a loop to match abbreviations of unknown length. The drawback is, that each chunk of a abbreviation must meet one of the defined patterns.

For example o.B.d.A. would get evaluated by the second gsub pattern "(%l.)(%a)" and replaced by o.\,B.d.\,A.. In the next loop the pattern wouldn't match, because the chunk B.d. doesn't. I guess the combination of the two patterns should match almost every abbreviation but doesn't create too many false-positives.

Another tweak I made was to check only the closing verbatim-name which matches the first opening. With this a construct of several nested verbatim environments is evaluated correctly. Inline verbatim is still missing though as well as other exceptions like \url.

EDIT: I also wrote a function, that parses inline verbatim correctly, but only if there is only one kind of inline verbatim command per line and if it ends in the same line.

\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage{fancyvrb} % for "Verbatim" environment
\usepackage{luacode}
\usepackage{url}

%% Lua-side code:
\begin{luacode}
local verbatim_env = { "verbatim" , "Verbatim" , "lstlisting" }
local verbatim_inl = { "verb" , "lstinline" }
-- by default, *not* in a verbatim-like env.:
local cur_verbatim_env = nil
local cur_verbatim_inl = nil
function replace_abbrs ( s )
    local rep_rep1 = 1
    local rep_rep2 = 1
    while rep_rep1 ~= 0 or rep_rep2 ~= 0 do
        s,rep_rep1 = unicode.utf8.gsub ( s , "(%l%.)(%a%.)(%a)", "%1\\,%2\\,%3" )
        s,rep_rep2 = unicode.utf8.gsub ( s , "(%l%.)(%a)",     "%1\\,%2" )
    end
    return(s)
end
function expand_inline_verb ( s , p )
    local r = ""
    while string.len(s) > 0 do
        local spos,epos = s:find( p.."%A" )
        if spos ~= nil then
            r = r .. replace_abbrs(s:sub(0,spos-1))
            r = r .. s:sub(spos,epos)
            local delim = s:sub(epos,epos)
            s  = s:sub(epos+1 , string.len(s))
            local verb_end = s:find( delim )
            r = r .. s:sub(0,verb_end)
            s = s:sub(verb_end+1 , string.len(s))
        else
            r = r .. replace_abbrs(s)
            break
        end
    end
    return(r)
end
function expandabbr ( s )
    if cur_verbatim_env == nil then
        for i,p in ipairs ( verbatim_env ) do
            if s:find( "\\begin{" .. p .. "}" ) then
                cur_verbatim_env = verbatim_env[i]
                break
            end
        end
    elseif s:find( "\\end{" .. cur_verbatim_env .. "}" ) then
        cur_verbatim_env = nil
    end
    if cur_verbatim_env == nil and cur_verbatim_inl == nil then
        for i,p in ipairs ( verbatim_inl ) do
            pos = s:find( "\\" .. p )
            if pos ~= nil then
                cur_verbatim_inl = s[pos+string.len(p)+1]
                break
            end
        end
    elseif cur_veratim_inl ~= nil then
        if s:find( cur_veratim_inl ) then
            cur_verbatim_inl = nil
        end
    end
    if cur_verbatim_env == nil then
        for i,p in ipairs ( verbatim_inl ) do
            if s:find( p ) then
                return(expand_inline_verb( s , p ))
            end
        end
        s = replace_abbrs ( s )
    end
    return(s)
end
\end{luacode}

%% LaTeX-side code:
\newcommand\ExpandAbbrOn{\directlua{%
  luatexbase.add_to_callback("process_input_buffer", 
  expandabbr, "expand_abbreviations")}}
\newcommand\ExpandAbbrOff{\directlua{%
  luatexbase.remove_from_callback("process_input_buffer", 
  "expand_abbreviations")}}
\AtBeginDocument{\ExpandAbbrOn} % enabled by default

%% Just for this example:
\setlength\parindent{0pt}
\obeylines

\begin{document}
Dies ist u.U. ein Test.
\begin{Verbatim}
Dies ist u.U. ein Test.
\end{Verbatim}

\begin{Verbatim}
\begin{verbatim}
\end{verbatim}
Dies ist u.U. ein schwierigerer Test.
\end{Verbatim}

z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e. 
i.e.S. v.a. z.T. m.E. i.d.F. z.Z. u.v.m.
z.Zt. o.B.d.A. a.d.Gr. n.Chr. Anm.d.Red.
\verb|z.Zt. o.B.d.A. a.d.Gr. n.Chr. Anm.d.Red.|
\begin{verbatim}
z.B. u.a. u.Ä. u.ä. u.U. a.a.O. d.h. i.e. 
i.e.S. v.a. z.T. m.E. i.d.F. z.Z. u.v.m.
\end{verbatim}
U.S.A. U.K.
\ExpandAbbrOff % turn off abbreviation expansion
A tricky URL: \url{u.U.aaaa.z.T.bbb}
\end{document}

How to letter space (German) abbreviations automatically

Tags:

German

Luatex

Letterspacing

Related

Recent Posts