How to do word counts for a mixture of English and Chinese in Javascript
Try a regex like this:
/[\u00ff-\uffff]|\S+/g
For example, "I am a 香港人".match(/[\u00ff-\uffff]|\S+/g)
gives:
["I", "am", "a", "香", "港", "人"]
Then you can just check the length of the resulting array.
The \u00ff-\uffff
part of the regex is a unicode character range; you probably want to narrow this down to just the characters you want to count as words. For example, CJK Unified would be \u4e00-\u9fcc
.
function countWords(str) {
var matches = str.match(/[\u00ff-\uffff]|\S+/g);
return matches ? matches.length : 0;
}