Detect UTF-8 encoding (How does MS IDE do it)?
If encoding is UTF-8, the first character you see over 0x7F must be the start of a UTF-8 sequence. So test it for that. Here is the code we use for that:
unc ::IsUTF8(unc *cpt)
{
if (!cpt)
return 0;
if ((*cpt & 0xF8) == 0xF0) { // start of 4-byte sequence
if (((*(cpt + 1) & 0xC0) == 0x80)
&& ((*(cpt + 2) & 0xC0) == 0x80)
&& ((*(cpt + 3) & 0xC0) == 0x80))
return 4;
}
else if ((*cpt & 0xF0) == 0xE0) { // start of 3-byte sequence
if (((*(cpt + 1) & 0xC0) == 0x80)
&& ((*(cpt + 2) & 0xC0) == 0x80))
return 3;
}
else if ((*cpt & 0xE0) == 0xC0) { // start of 2-byte sequence
if ((*(cpt + 1) & 0xC0) == 0x80)
return 2;
}
return 0;
}
If you get a return of 0, it is not valid UTF-8. Else skip the number of chars returned and continue checking the next one over 0x7F.