Case sensitivity in square-bracket globbing
In bash version 4.3 and later, there is a shopt option called globasciiranges
:
According to shopt builtin gnu man pages:
globasciiranges
If set, range expressions used in pattern matching bracket expressions (see Pattern Matching) behave as if in the traditional C locale when performing comparisons. That is, the current locale’s collating sequence is not taken into account, so ‘b’ will not collate between ‘A’ and ‘B’, and upper-case and lower-case ASCII characters will collate together.
As a result you can
$ shopt -s globasciiranges
$ echo [A-Z]*
Use shopt -u
for disabling.
Another way is to change locale to C. You can do this temporarily using a subshell:
$ ( LC_ALL=C ; printf '%s\n' [A-Z]*; )
You will get the results you need, and when the sub shell is finished, the locale of your main shell remains unchanged to whatever was before.
Another alternative is instead of [A-Z]
to use brace expansion {A..Z}
together with nullglob
bash shopt option.
By enabling the nullglob
option, if a pattern is not matched during pathname expansion, a null string is returned instead of the pattern itself.
As a result this one will work as expected:
$ shopt -s nullglob;printf '%s\n' {A..Z}*
You can write the all the uppercase letters just fine like:
[ABCDEFGHIJKLMNOPQRSTUVWXYZ]*
or use can use the named character class [:upper:]
to represent all uppercase letters in your current locale
:
[[:upper:]]*
As you have noticed, while using range like [B-C]
the upper and lower case for same alphabetic character are being arranged adjacently (according to collation order of the locale
).
Including “unintuitive” characters in character ranges, such as including lowercase letters in a range whose boundaries are uppercase letters, is due to the LC_COLLATE
locale setting. LC_COLLATE
is supposed to indicate sorting order, but it does a poor job of it (sorting strings is more complex than what locales can do) and you're better off without it. I recommend to remove LC_COLLATE
from your locale settings. If you're setting LANG
, or LANGUAGE
, don't do that and set only the ones you need: LC_CTYPE
, LC_MESSAGES
, LC_TIME
.
For more background about locales, see What should I set my locale to and what are the implications of doing so? and set LC_* but not LC_ALL
To get reliable results in a script regardless of the user's settings, set LC_ALL=C
.