What language is this word?
Retina, 51 bytes
.*[aeio]$
1
A`en$|ch|ei|au
^$
2
A`[jkz]|gy|m$
\D+
4
I came up with the regexes and @MartinBüttner did the conversion to/golfing in Retina so ... hurray for team effort?
The mapping is 1 -> Italian, 2 -> German, (empty) -> Hungarian, 4 -> English
, with the amount classified in each category being 4506 + 1852 + 2092 + 3560 = 12010
.
Try it online! | Modified multiline version
Explanation
First off, the equivalent Python is something like this:
import re
def f(s):
if re.search("[aeio]$", s):
return 1
if re.search("en$|ch|ei|au", s):
return 2
if re.search("[jkz]|gy|m$", s):
return ""
return 4
Let me just say that o$
is an excellent indicator of Italian.
The Retina version is similar, with pairs of lines forming replacement stages. For example, the first two lines
.*[aeio]$
1
replaces matches of the first line with the contents of the second.
The next three lines do the same, but using Retina's anti-grep mode - anti-grep (specified with A`
) removes the line if it matches the given regex, and the following two lines is a replacement from an empty line to the desired output.
A`en$|ch|ei|au
^$
2
The following line uses anti-grep again, but doesn't replace the empty line, giving the fixed output for Hungarian.
A`[jkz]|gy|m$
Finally, the last two lines
\D+
4
replaces a non-empty non-digit line with 4
. All of the substitutions can only happen if no previous substitution activated, simulating an if/else if
chain.