How can I map HTML named entities to LaTeX commands?
In ConTeXt, the char-ent.lua
file contains the list of all HTML entities. You can access them using the characters.entities
table. For example, the following code prints all the entities and their values to screen.
\starttext
\startluacode
local entities = characters.entities
for name, value in next, entities do
print(name,value)
end
\stopluacode
\stoptext
The entities are not translated to the corresponding TeX command, rather they are translated to the corresponding unicode symbol. If you want the corresponding TeX name for the symbol, you can search the characters.data
table (defined in char-def.lua
)
\starttext
\startluacode
local data = characters.data
local function context_name(value)
value = data[value]
if value then
if value.contextname then
return value.contextname
elseif value.mathname then
return value.mathname
elseif value.mathspec then
return value.mathspec[1].name
else
return "not defined"
end
else
return "not defined"
end
end
local entities = characters.entities
for name, value in next, entities do
print(name,value, context_name(value))
end
\stopluacode
\stoptext
The resulting list is here. I don't know if LaTeX follows the same naming convention for all commands.
Just to provide closure on this, my original question was for mapping HTML 4.0 entities to LaTeX. Aditya's response provides the ability to map a far wider range of named character entities.
I've used Aditya's data set to produce the mappings just for HTML 4.0 as below. If it is not on the list, there is no mapping (or I've messed up the data set reduction).
Please see comments -- this table is only of much value in ConTeXt.
HTML 4.0 / LaTeX
Aacute / Aacute
aacute / aacute
Acirc / Acircumflex
acirc / acircumflex
acute / textacute
AElig / AEligature
aelig / aeligature
Agrave / Agrave
agrave / agrave
alefsym / aleph
Alpha / greekAlpha
alpha / greekalpha
and / wedge
ang / angle
Aring / Aring
aring / aring
asymp / approx
Atilde / Atilde
atilde / atilde
Auml / Adiaeresis
auml / adiaeresis
bdquo / quotedblbase
Beta / greekBeta
beta / greekbeta
brvbar / textbrokenbar
bull / textbullet
cap / cap
Ccedil / Ccedilla
ccedil / ccedilla
cedil / textcedilla
cent / textcent
Chi / greekChi
chi / greekchi
circ / textcircumflex
clubs / clubsuit
cong / approxEq
copy / copyright
crarr / carriagereturn
cup / cup
curren / textcurrency
dagger / textdag
Dagger / textddag
darr / downarrow
dArr / Downarrow
deg / textdegree
Delta / greekDelta
delta / greekdelta
diams / blacklozenge
divide / textdiv
Eacute / Eacute
eacute / eacute
Ecirc / Ecircumflex
ecirc / ecircumflex
Egrave / Egrave
egrave / egrave
empty / emptyset
emsp / emspace
ensp / enspace
Epsilon / greekEpsilon
epsilon / greekepsilon
equiv / equiv
Eta / greekEta
eta / greeketa
ETH / Eth
eth / eth
Euml / Ediaeresis
euml / ediaeresis
exist / exists
fnof / fhook
forall / forall
frac12 / onehalf
frac14 / onequarter
frac34 / threequarter
frasl / textfraction
Gamma / greekGamma
gamma / greekgamma
ge / geq
gt / gt
harr / leftrightarrow
hArr / Leftrightarrow
hellip / textellipsis
Iacute / Iacute
iacute / iacute
Icirc / Icircumflex
icirc / icircumflex
iexcl / exclamdown
Igrave / Igrave
igrave / igrave
image / Im
infin / infty
int / intop
Iota / greekIota
iota / greekiota
iquest / questiondown
isin / in
Iuml / Idiaeresis
iuml / idiaeresis
Kappa / greekKappa
kappa / greekkappa
Lambda / greekLambda
lambda / greeklambda
lang / langle
laquo / leftguillemot
larr / leftarrow
lArr / Leftarrow
lceil / lceiling
ldquo / quotedblleft
le / leq
lfloor / lfloor
lowast / ast
loz / lozenge
lsaquo / guilsingleleft
lsquo / quoteleft
lt / lt
macr / textmacron
mdash / emdash
micro / textmu
middot / periodcentered
Mu / greekMu
mu / greekmu
nbsp / nobreakspace
ndash / endash
ne / neq
ni / ni
not / textlognot
notin / nin
nsub / nsubset
Ntilde / Ntilde
ntilde / ntilde
Nu / greekNu
nu / greeknu
Oacute / Oacute
oacute / oacute
Ocirc / Ocircumflex
ocirc / ocircumflex
OElig / OEligature
oelig / oeligature
Ograve / Ograve
ograve / ograve
Omega / greekOmega
omega / greekomega
Omicron / greekOmicron
omicron / greekomicron
oplus / oplus
or / vee
ordf / ordfeminine
ordm / ordmasculine
Oslash / Ostroke
oslash / ostroke
Otilde / Otilde
otilde / otilde
otimes / otimes
Ouml / Odiaeresis
ouml / odiaeresis
para / paragraphmark
part / partial
permil / perthousand
perp / bot
Phi / greekPhi
phi / greekphi
Pi / greekPi
pi / greekpi
piv / greekpialt
plusmn / textpm
pound / textsterling
prime / prime
Prime / doubleprime
prod / prod
prop / propto
Psi / greekPsi
psi / greekpsi
radic / surd
rang / rangle
raquo / rightguillemot
rarr / rightarrow
rArr / Rightarrow
rceil / rceiling
rdquo / quotedblright
real / Re
reg / registered
rfloor / rfloor
Rho / greekRho
rho / greekrho
rsaquo / guilsingleright
rsquo / quoteright
sbquo / quotesinglebase
Scaron / Scaron
scaron / scaron
sdot / cdot
sect / sectionmark
shy / softhyphen
Sigma / greekSigma
sigma / greeksigma
sigmaf / greekfinalsigma
sim / sim
spades / spadesuit
sub / subset
sube / subseteq
sum / sum
sup / supset
sup1 / onesuperior
sup2 / twosuperior
sup3 / threesuperior
supe / supseteq
szlig / ssharp
Tau / greekTau
tau / greektau
there4 / therefore
Theta / greekTheta
theta / greektheta
thetasym / greekthetaalt
thinsp / breakablethinspace
THORN / Thorn
thorn / thorn
tilde / texttilde
times / textmultiply
trade / trademark
Uacute / Uacute
uacute / uacute
uarr / uparrow
uArr / Uparrow
Ucirc / Ucircumflex
ucirc / ucircumflex
Ugrave / Ugrave
ugrave / ugrave
uml / textdiaeresis
Upsilon / greekUpsilon
upsilon / greekupsilon
Uuml / Udiaeresis
uuml / udiaeresis
weierp / wp
Xi / greekXi
xi / greekxi
Yacute / Yacute
yacute / yacute
yen / textyen
yuml / ydiaeresis
Yuml / Ydiaeresis
Zeta / greekZeta
zeta / greekzeta
zwj / zwj
zwnj / zwnj
There's a chart here: ISO character entities and their LATEX equivalents by Vidar Bronken Gundersen and Rune Mathisen. Source material is here: http://www.bitjungle.com/isoent/ - including a big XML file with mappings between various formats.
The same folks have turned that into a Perl program, available here: http://llg.cubic.org/docs/ent2latex.html (from this answer to How to look up a symbol or identify a math symbol or character?.)
And here's another list: http://www.w3.org/Math/characters/unicode.xml, and the same data compiled into python: https://gist.github.com/piquadrat/798549