Highlight every occurrence of a list of words?
Solution using LuaTeX callbacks. Library luacolor.lua
from luacolor
is also used.
First package luahighlight.sty
:
\ProvidesPackage{luahighlight}
%\RequirePackage{luacolor}
\@ifpackageloaded{xcolor}{}{\RequirePackage{xcolor}}
\RequirePackage{luatexbase}
\RequirePackage{luacode}
\newluatexattribute\luahighlight
\begin{luacode*}
highlight = require "highlight"
luatexbase.add_to_callback("pre_linebreak_filter", highlight.callback, "higlight")
\end{luacode*}
\newcommand\highlight[2][red]{
\bgroup
\color{#1}
\luaexec{highlight.add_word("\luatexluaescapestring{\current@color}","\luatexluaescapestring{#2}")}
\egroup
}
% save default document color
\luaexec{highlight.default_color("\luatexluaescapestring{\current@color}")}
% stolen from luacolor.sty
\def\luacolorProcessBox#1{%
\luaexec{%
oberdiek.luacolor.process(\number#1)%
}%
}
% process a page box
\RequirePackage{atbegshi}[2011/01/30]
\AtBeginShipout{%
\luacolorProcessBox\AtBeginShipoutBox
}
\endinput
command \highlight
is provided, with one required and one optional parameters. required is highlighted word, optional is color. In pre_linebreak_filter
callback, words are collected and when matched, color information is inserted.
Lua module, highlight.lua
:
local M = {}
require "luacolor"
local words = {}
local chars = {}
-- get attribute allocation number and register it in luacolor
local attribute = luatexbase.attributes.luahighlight
-- local attribute = oberdiek.luacolor.getattribute
oberdiek.luacolor.setattribute(attribute)
-- make local version of luacolor.get
local get_color = oberdiek.luacolor.getvalue
-- we must save default color
local default_color
function M.default_color(color)
default_color = get_color(color)
end
local utflower = unicode.utf8.lower
function M.add_word(color,w)
local w = utflower(w)
words[w] = color
end
local utfchar = unicode.utf8.char
-- we don't want to include punctation
local stop = {}
for _, x in ipairs {".",",","!","“","”","?"} do stop[x] = true end
function M.callback(head)
local curr_text = {}
local curr_nodes = {}
for n in node.traverse(head) do
if n.id == 37 then
local char = utfchar(n.char)
-- exclude punctation
if not stop[char] then
local lchar = chars[char] or utflower(char)
chars[char] = lchar
curr_text[#curr_text+1] = lchar
curr_nodes[#curr_nodes+1] = n
end
-- set default color
local current_color = node.has_attribute(n,attribute) or default_color
node.set_attribute(n, attribute,current_color)
elseif n.id == 10 then
local word = table.concat(curr_text)
curr_text = {}
local color = words[word]
if color then
print(word)
local colornumber = get_color(color)
for _, x in ipairs(curr_nodes) do
node.set_attribute(x,attribute,colornumber)
end
end
curr_nodes = {}
end
end
return head
end
return M
we use pre_linebreak_filter
callback to traverse the node list, we collect the glyph
nodes (id 37) in a table and when we find a glue node (id 10, mainly spaces), we construct a word from collected glyphs. We have some prohibited characters (such as punctuation), which we strip out. All characters are lowercased, so we can detect even words at the beginning of sentences etc.
When a word is matched, we set attribute
field of word glyphs to value under which is related color saved in luacolor
library. Attributed are new concept in LuaTeX, they enable to store information in nodes, which can be processed later, as in our case, because at the shipout time, ale pages are processed by the luacolor
library and nodes are colored, depending on their luahighlight
attribute.
\documentclass{article}
\usepackage[pdftex]{xcolor}
\usepackage{luahighlight}
\usepackage{lipsum}
\highlight[red]{Lorem}
\highlight[green]{dolor}
\highlight[orange]{world}
\highlight[blue]{Curabitur}
\highlight[brown]{elit}
\begin{document}
\def\world{earth}
\section{Hello world}
Hello world, world? world! \textcolor{purple}{but normal colors works} too\footnote{And also footnotes, for instance. World WORLD wOrld}. Hello \world.
\lipsum[1-12]
\end{document}
Here's another with l3regex
.
\documentclass{scrartcl}
\usepackage{xcolor,xparse,l3regex}
\ExplSyntaxOn
\NewDocumentCommand \texthighlight { +m } { \david_texthighlight:n { #1 } }
\cs_new_protected:Npn \david_texthighlight:n #1
{
\group_begin:
\tl_set:Nn \l_tmpa_tl { #1 }
\seq_map_inline:Nn \g_david_highlight_colors_seq
{
\clist_map_inline:cn { g_david_highlight_##1_clist }
{
\regex_replace_all:nnN { (\W)####1(\W) }
{ \1\c{textcolor}\cB\{##1\cE\}\cB\{####1\cE\}\2 } \l_tmpa_tl
}
}
\tl_use:N \l_tmpa_tl
\group_end:
}
\seq_new:N \g_david_highlight_colors_seq
\NewDocumentCommand \addhighlighting { O{red} m }
{
\seq_if_in:NnF \g_david_highlight_colors_seq { #1 }
{ \seq_gput_right:Nn \g_david_highlight_colors_seq { #1 } }
\clist_if_exist:cF { g_david_highlight_#1_clist }
{ \clist_new:c { g_david_highlight_#1_clist } }
\clist_gput_right:cn { g_david_highlight_#1_clist } { #2 }
}
\ExplSyntaxOff
\addhighlighting{amet,Mauris,ut,et,leo}
\addhighlighting[blue]{Phasellus,vestibulum}
\begin{document}
\texthighlight{Lorem ipsum dolor foo sit amet, bar consectetuer adipiscing
elit. Ut purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. Nulla et lectus foo vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem
vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. Duis nibh mi, congue eu,
accumsan eleifend, bar sagittis quis, diam. Duis eget orci sit amet orci
dignissim rutrum.
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur
dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, foo vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, bar nunc. Praesent eget sem
vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. Duis nibh mi, congue eu,
accumsan eleifend, sagittis quis, diam. Duis eget orci sit amet orci
dignissim rutrum.}
\end{document}
Strongly based on my answer at How to insert a symbol to the beginning of a line for which a word appears?. However, I had to extend the logic to handle multiple color assignments. Syntax is multiple invocations of \WordsToNote{space separated list}{color}
and then \NoteWords{multiple paragraph input}
Macros in the input are limited to style (e.g., \textit
) and size (e.g., \small
) changes. Otherwise, only plain text is accepted.
As detailed in the referenced answer, I adapt my titlecaps
package, which normally capitalizes the first letter of each word in its argument, with a user-specified list of exceptions. Here, instead of capitalizing the words, I leave them intact. However, I trap the user-specified word exceptions and use them to set a different color.
In this extension of that method, I had to revise two titlecaps
macros: \titlecap
and \seek@lcwords
.
The method cannot handle word subsets, but it can ignore punctuation.
EDITED to fix bug when flagged word appears with punctuation, and issue with first word of paragraphs.
\documentclass{article}
\usepackage{titlecaps}
\makeatletter
\renewcommand\titlecap[2][P]{%
\digest@sizes%
\if T\converttilde\def~{ }\fi%
\redefine@tertius%
\get@argsC{#2}%
\seek@lcwords{#1}%
\if P#1%
\redefine@primus%
\get@argsC{#2}%
\protected@edef\primus@argi{\argi}%
\else%
\fi%
\setcounter{word@count}{0}%
\redefine@secundus%
\def\@thestring{}%
\get@argsC{#2}%
\if P#1\protected@edef\argi{\primus@argi}\fi%
\whiledo{\value{word@count} < \narg}{%
\addtocounter{word@count}{1}%
\if F\csname found@word\roman{word@count}\endcsname%
\notitle@word{\csname arg\roman{word@count}\endcsname}%
\expandafter\protected@edef\csname%
arg\roman{word@count}\endcsname{\@thestring}%
\else
\notitle@word{\csname arg\roman{word@count}\endcsname}%
\expandafter\protected@edef\csname%
arg\roman{word@count}\endcsname{\color{%
\csname color\romannumeral\value{word@count}\endcsname}%
\@thestring\color{black}{}}%
\fi%
}%
\def\@thestring{}%
\setcounter{word@count}{0}%
\whiledo{\value{word@count} < \narg}{%
\addtocounter{word@count}{1}%
\ifthenelse{\value{word@count} = 1}%
{}{\add@space}%
\protected@edef\@thestring{\@thestring%
\csname arg\roman{word@count}\endcsname}%
}%
\let~\SaveHardspace%
\@thestring%
\restore@sizes%
\un@define}
% SEARCH TERTIUS CONVERTED ARGUMENT FOR LOWERCASE WORDS, SET FLAG
% FOR EACH WORD (T = FOUND IN LIST, F= NOT FOUND IN LIST)
\renewcommand\seek@lcwords[1]{%
\kill@punct%
\setcounter{word@count}{0}%
\whiledo{\value{word@count} < \narg}{%
\addtocounter{word@count}{1}%
\protected@edef\current@word{%
\csname arg\romannumeral\value{word@count}\endcsname}%
\def\found@word{F}%
\setcounter{lcword@index}{0}%
\expandafter\def\csname%
found@word\romannumeral\value{word@count}\endcsname{F}%
\whiledo{\value{lcword@index} < \value{lc@words}}{%
\addtocounter{lcword@index}{1}%
\protected@edef\current@lcword{%
\csname lcword\romannumeral\value{lcword@index}\endcsname}%
%% THE FOLLOWING THREE LINES ARE FROM DAVID CARLISLE
\protected@edef\tmp{\noexpand\scantokens{\def\noexpand\tmp%
{\noexpand\ifthenelse{\noexpand\equal{\current@word}{\current@lcword}}}}}%
\tmp\ifhmode\unskip\fi\tmp
%%
{\expandafter\def\csname%
found@word\romannumeral\value{word@count}\endcsname{T}%
\expandafter\protected@edef\csname color\romannumeral\value{word@count}\endcsname{%
\csname CoLoR\csname lcword\romannumeral\value{lcword@index}\endcsname\endcsname}%
\setcounter{lcword@index}{\value{lc@words}}%
}%
{}%
}%
}%
\if P#1\def\found@wordi{F}\fi%
\restore@punct%
}
\makeatother
\usepackage{xcolor}
\newcommand\WordsToNote[2]{\Addlcwords{#1}\edef\assignedcolor{#2}%
\assigncolor#1 \relax\relax}
\def\assigncolor#1 #2\relax{%
\expandafter\edef\csname CoLoR#1\endcsname{\assignedcolor}%
\ifx\relax#2\else\assigncolor#2\relax\fi%
}
\newcommand\NoteWords[1]{\NoteWordsHelp#1\par\relax}
\long\def\NoteWordsHelp#1\par#2\relax{%
\titlecap[p]{#1}%
\ifx\relax#2\else\par\NoteWordsHelp#2\relax\fi%
}
\begin{document}
\WordsToNote{foo bar at}{red}
\WordsToNote{Nulla dolor nulla}{cyan}
\WordsToNote{amet est et}{orange}
\WordsToNote{Lorem Ut ut felis}{green}
\NoteWords{
\textbf{Lorem ipsum dolor foo sit amet, bar consectetuer adipiscing elit}. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur
dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. \textit{Nulla et lectus foo} vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, nunc. Praesent eget sem
vel leo ultrices bibendum. \scshape Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. \upshape Duis nibh mi, congue eu,
accumsan eleifend, bar sagittis quis, diam. Duis eget orci sit amet orci
dignissim rutrum.
\textsf{Lorem ipsum dolor sit amet}, consectetuer adipiscing elit. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis. Curabitur
dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, foo vulputate a, magna. Donec vehicula augue eu
neque. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Mauris ut leo. Cras viverra metus
rhoncus sem. Nulla et lectus vestibulum urna fringilla ultrices.
Phasellus eu tellus sit amet tortor gravida placerat. Integer sapien
est, iaculis in, pretium quis, viverra ac, bar nunc. Praesent eget sem
vel leo ultrices bibendum. Aenean faucibus. Morbi dolor nulla,
malesuada eu, pulvinar at, mollis ac, nulla. Curabitur auctor semper
nulla. Donec varius orci eget risus. Duis nibh mi, congue eu,
accumsan eleifend, sagittis quis, diam. \Large Duis eget orci sit amet orci
dignissim rutrum.\normalsize
}
\end{document}