Check if letter is emoji
You could use emoji4j library. The following should solve the issue.
String htmlifiedText = EmojiUtils.htmlify(text);
// regex to identify html entitities in htmlified text
Matcher matcher = htmlEntityPattern.matcher(htmlifiedText);
while (matcher.find()) {
String emojiCode = matcher.group();
if (isEmoji(emojiCode)) {
emojis.add(EmojiUtils.getEmoji(emojiCode).getEmoji());
}
}
This function I created checks if given String consists of only emojis. in other words if the String contains any character not included in the Regex, it will return false.
private static boolean isEmoji(String message){
return message.matches("(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|" +
"[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|" +
"[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|" +
"[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|" +
"[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|" +
"[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|" +
"[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|" +
"[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|" +
"[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|" +
"[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|" +
"[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)+");
}
Example of implementation:
public static int detectEmojis(String message){
int len = message.length(), NumEmoji = 0;
// if the the given String is only emojis.
if(isEmoji(message)){
for (int i = 0; i < len; i++) {
// if the charAt(i) is an emoji by it self -> ++NumEmoji
if (isEmoji(message.charAt(i)+"")) {
NumEmoji++;
} else {
// maybe the emoji is of size 2 - so lets check.
if (i < (len - 1)) { // some Emojis are two characters long in java, e.g. a rocket emoji is "\uD83D\uDE80";
if (Character.isSurrogatePair(message.charAt(i), message.charAt(i + 1))) {
i += 1; //also skip the second character of the emoji
NumEmoji++;
}
}
}
}
return NumEmoji;
}
return 0;
}
given is a function that runs on a string (of only emojis) and return the number of emojis in it. (with the help of other answers i found here on StackOverFlow).
It seems like those emojis are two characters long, but with split("")
you are splitting between each single character, thus none of those letters can be the emoji you are looking for.
Instead, you could try splitting between words:
for (String word : sentence.split(" ")) {
if (word.matches(emo_regex)) {
System.out.println(word);
}
}
But of course this will miss emojis that are joined to a word, or punctuation.
Alternatively, you could just use a Matcher
to find
any group
in the sentence that matches the regex.
Matcher matcher = Pattern.compile(emo_regex).matcher(sentence);
while (matcher.find()) {
System.out.println(matcher.group());
}