Most efficient way to find if a string is mixedCase
If you know the character encoding that's going to be used (I've used ISO/IEC 8859-15 in the code example), a look-up table may be the fastest solution. This also allows you to decide which characters from the extended character set, such as µ or ß, you'll count as upper case, lower case or non-alphabetical.
char test_case(const char *s) {
static const char alphabet[] = {
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // ABCDEFGHIJKLMNO
1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0, // PQRSTUVWXYZ
0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // abcdefghijklmno
2,2,2,2,2,2,2,2,2,2,2,0,0,0,0,0, // pqrstuvwxyz
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,1,0,2,0,2,0,0,0,0, // Š š ª
0,0,0,0,0,1,2,0,0,2,0,2,0,1,2,1, // Žµ ž º ŒœŸ
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1, // ÐÑÒÓÔÕÖ ØÙÚÛÜÝÞß
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // àáâãäåæçèéêëìíîï
2,2,2,2,2,2,2,0,2,2,2,2,2,2,2,2}; // ðñòóôõö øùúûüýþÿ
char cases = 0;
while (*s && cases != 3) {
cases |= alphabet[(unsigned char) *s++];
}
return cases; // 0 = none, 1 = upper, 2 = lower, 3 = mixed
}
As suggested in a comment by chux, you can set the value of alphabet[0]
to 4, and then you need only one condition cases < 3
in the while loop.
This should be fairly efficient - it checks the minimum number of characters necessary. This assumes a bias towards lower-case characters, so checking for lower-case first should be slightly more efficient:
#include <ctype.h>
int ismixed( const unsigned char *str )
{
int hasUpper = 0;
int hasLower = 0;
while ( *str )
{
// can't be both upper and lower case
// but it can be neither
if ( islower( *str ) )
{
hasLower = 1;
}
else if ( isupper( *str ) )
{
hasUpper = 1;
}
// return true as soon as we hit
// both upper and lower case
if ( hasLower && hasUpper )
{
return( 1 );
}
str++;
}
return( 0 );
}
Depending on whether your input is biased to lower or upper case, checking isupper()
first might be better.
If we assume ASCII
If we assume all alpha,
Then code only needs to count the "case" bits. Is the sum 0, same as string length or otherwise?
void test_case(const char *s) {
const char *start = s;
size_t sum = 0;
size_t mask = 'A' ^ 'a';
while (*s) {
sum += *s++ & mask;
}
ptrdiff_t len = s - start;
sum /= mask;
if (len == 0) puts("Empty string");
else if (sum == 0) puts("All UC");
else if (sum == len) puts("All LC");
else puts("Mixed");
}
Note: with slight mods, will work for EBCIDIC too.