Extract first word in a string
You're almost there, just remove the trailing comma
\documentclass{article}
\makeatletter
\newcommand\FirstWord[1]{\@firstword#1 \@nil}%
\newcommand\@firstword{}%
\newcommand\@removecomma{}%
\def\@firstword#1 #2\@nil{\@removecomma#1,\@nil}%
\def\@removecomma#1,#2\@nil{#1}
\makeatother
\begin{document}
X\FirstWord{John, Paul, George and Ringo}X
X\FirstWord{John}X
X\FirstWord{John and Paul}X
X\FirstWord{{John, Paul}, George and Ringo}X
\end{document}
You can add further tests for removing other delimiters
\documentclass{article}
\makeatletter
\newcommand\FirstWord[1]{\@firstword#1 \@nil}%
\def\@firstword#1 #2\@nil{\@removecomma#1,\@nil}%
\def\@removecomma#1,#2\@nil{\@removeperiod#1.\@nil}
\def\@removeperiod#1.#2\@nil{\@removesemicolon#1;\@nil}
\def\@removesemicolon#1;#2\@nil{#1}
\makeatother
\begin{document}
X\FirstWord{John; Paul; George; Ringo}X
X\FirstWord{John. Paul. George. Ringo}X
X\FirstWord{John}X
X\FirstWord{John and Paul}X
X\FirstWord{{John. Paul}. George. Ringo}X
\end{document}
If you don't need expandability, you can use l3regex
:
\documentclass{article}
\usepackage{xparse,l3regex}
\ExplSyntaxOn
\NewDocumentCommand{\FirstWord}{m}
{
% split the argument at spaces
\seq_set_split:Nnn \l_tmpa_seq { ~ } { #1 }
% get the first item
\tl_set:Nx \l_tmpa_tl { \seq_item:Nn \l_tmpa_seq { 1 } }
% remove a trailing period, semicolon or comma (\Z matches the end)
\regex_replace_once:nnN { [.;,]\Z } { } \l_tmpa_tl
% output the result
\tl_use:N \l_tmpa_tl
}
\ExplSyntaxOff
\begin{document}
X\FirstWord{John, Paul, George and Ringo}X
X\FirstWord{John; Paul; George; Ringo}X
X\FirstWord{John. Paul. George. Ringo}X
X\FirstWord{John}X
X\FirstWord{John and Paul}X
X\FirstWord{{John, Paul}, George and Ringo}X
X\FirstWord{{John. Paul}. George. Ringo}X
\end{document}
Thanks for comment by Mico, this is the suggested pattern to use. PS. I am not a pattern matching expert and do not play one on TV, but the nice thing about lualatex is one can employ sophisticated pattern matching procedures if they are needed.
\documentclass{article}
\usepackage{luacode} % for '\luaexec' and '\luastring' macros
\newcommand{\FirstWord}[1]{\luaexec{tex.print(string.match(\luastring{#1}, '\%w+\%-?\%w*'))}}
\begin{document}
\def\lst{John, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}
\def\lst{-John, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}
\def\lst{Marie-Claire, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}
\end{document}
gives
Earlier version
lualatex solution
Updated with another variation of the call just for illustration.
\documentclass{article}
\usepackage{luacode}
\newcommand{\FirstWord}[1]{\luaexec{tex.print(string.match('#1', '([^,]+)'))}}
\begin{document}
\def\lst{John, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}
\end{document}
The above does not handle special cases such as {{John, Paul}, George and Ringo}
. It will still return John
for the above.
Original answer
\documentclass{article}
\usepackage{luacode}
\begin{luacode*}
function FirstWord(arg)
tex.print(string.match(arg, '([^,]+)'))
end
\end{luacode*}
\newcommand{\FirstWord}[1]{\directlua{FirstWord("#1")}}
\begin{document}
\def\lst{John, Paul, George and Ringo}
\textbf{\FirstWord{\lst}} is the first word in \{\lst\}
\end{document}
gives
Admittedly a bit late to the game, but here's a second LuaLaTeX-based solution, which generalizes the earlier answer by @Nasser. This answer's pattern search algorithm satisfies the following criteria:
If the string to be searched starts with a substring that's delimited by matching curly braces, the entire substring is returned.
Otherwise, the first word is returned. Here, a "word" is taken to be either a collection of alphabetic characters -- e.g., "John" or "Nicolò" -- or a hyphenated pair of words -- e.g., "Kröller-Müller" and "Rhys-Davies". (Put differently, a hyphenated word is taken to be two single words that are joined by exactly one instance of
-
; the only restriction on the first word in the hyphenated pair is that it contain at least two characters.) Any non-alphabetic characters that precede the "word" in the full string are automatically discarded. The Lua code is unicode-aware, i.e., the words may contain non-ASCII alphabetic characters (such asö
,ü
, andò
).
% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{fontspec}
\usepackage{luacode} % for 'luacode' environment and '\luastring' macro
%% Lua-side code: A Lua function that does most of the work
\begin{luacode}
function fw ( s )
if string.find ( s , '^%b{}' ) then
first = string.sub ( string.match ( s , '%b{}' ), 2, -2 )
else
first = unicode.utf8.match ( s , '%w+%-?%w+' )
end
tex.sprint ( first )
end
\end{luacode}
%% TeX-side code: A macro that invokes the Lua function
\newcommand{\FW}[1]{\directlua{fw(\luastring{#1})}}
\begin{document}
\def\lst{{John and Paul} but not George or Ringo}
\FW{\lst}
\def\lst{'{Bay- Day} Hay}
\FW{\lst}
\def\lst{Kröller-Müller and Schwassmann-Wassmann}
\FW{\lst}
\end{document}