Unicode-aware string functions?
See if this helps:
Needs["JLink`"];
ClearAll[toUpperCase];
toUpperCase[s_String] :=
JavaBlock[JavaNew["java.lang.String", s]@toUpperCase[]];
I had started working on a homegrown solution to this issue, directly by downloading Unicode data from the source. I’ll post it here, as it may be expanded to other functions were Java might not come and save the day!
unicodeData = StringSplit[#, ";"] & /@
StringSplit[Import["ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt"], "\n"];
upperCaseData = (FromDigits[#, 16] & /@ # &) /@
Select[unicodeData, (Length[#] > 12) && (StringLength[#[[13]]] > 0) &][[;; , {1, 13}]];
unichar[s_] := FromCharacterCode[FromDigits[s, 16], "Unicode"];
upperCaseChar[i_] := Module[{r},
r = Select[upperCaseData, #[[1]] == i &];
Return[If[Length[r] > 0, FromCharacterCode[r[[1, 2]]], FromCharacterCode[i]]];
]
upperCase[s_] := StringJoin[upperCaseChar /@ ToCharacterCode[s, "Unicode"]];
Which works:
In[127]:= upperCase["foéàçÿœÆijķnjđӽծÿ"]
Out[127]= "FOÉÀÇŸŒÆIJĶNJĐӼԾŸ"
It's on my list of things to tackle as part of the Incremental Language Development project.
In the meantime, there is a built-in solution that is fairly fast, which is:
Needs["MachineLearning`"];
ToUpperCaseUnicode[{"éàÇœßþσς", "ijķnjđӽծ", "ÿ"}]
{"ÉÀÇŒSSÞΣΣ", "IJĶNJĐӼԾ", "Ÿ"}
(not sure how to copy-paste the sigma etc without it turning into \σ)