History of Bash globbing
bash
was initially designed in the late 80s as a partial clone of ksh
with some interactive features from csh/tcsh.
The origins of globbing have to be found in those earlier shells which it builds upon.
ksh
itself is an extension of the Bourne shell. The Bourne shell itself (first released in 1979 in Unix V7) was a clean implementation from scratch, but it did not depart completely from the Thompson shell (the shell of V1 -> V6) and incorporated features from the Mashey shell.
In particular, command arguments were still separated by blanks, |
was now the new pipe operator but ^
was still supported as an alternative (and also explains why you do [!a-z]
and not [^a-z]
), $1
was still the first argument to a script and backslash was still the escape character. So many of the regexp operators (^\|$
) have a special meaning of their own in the shell.
The Thompson shell relied on an external utility for globbing. When sh
found unquoted *
, [
or ?
s in the command, it would run the command through glob
.
rm *.txt
would end up running glob as:
["glob", "rm", "*.txt"]
and glob would end up running rm
with the list of files matching that pattern.
grep a.\*b *.txt
would run glob
as:
["glob", "grep", "a.\252b", "*.txt"]
The *
above has been quoted by setting the 8th bit on that character, preventing glob
from treating it as a wildcard. glob
would then remove that bit before calling grep
.
To do the equivalent with regexps, that would have been:
regexp rm '\.txt$'
Or:
regexp rm '^[^.].*\.txt$'
to exclude dot-files.
The need to escape the operators as they double as shell special characters, the fact that .
, common in filenames is a regexp operator makes it not very appropriate to match filenames and complicated for a beginner. In most cases, all you need is wildcards that can replace either one (?
) or any number (*
) of characters.
Now, different shells added different globbing operators. Nowadays, the ksh and zsh globs (and to some extent bash -O extglob
which implements a subset of ksh globs) are functionally equivalent to regexps with a syntax that is less cumbersome to use with filenames and the current shell syntax. For instance, in zsh
(with extendedglob extension), you can do:
echo a#.txt
if you want (unlikely) to match filenames that consist of sequences of a
followed by .txt
. Easier than echo (^a*\.txt$)
(here using braces as a way to isolate the regex operators from the shell operators which could have been one way shells could deal with it).
echo (foo|bar|<1-20>).(#i)mpg
For mpg files (case insensitive) whose basename is foo, bar or a decimal number from 1 to 20...
ksh93
now can also incorporate regexps (basic, extended, perl-like or "augmented") in its globs (though it's quite buggy) and even provides a tool to convert between glob and regexp (printf %R
, printf %P
):
echo ~(Ei:.*\.txt)
to match (non-hidden) txt files with Extended regular expressions, case-insensitively.
Regular languages were introduced by Kleene in 1956. The seminal paper didn't have the full modern notation for regular expressions, but it did introduce the “Kleen star”: A*
meaning “any number of repetitions of A
”. In the next decade, some more or less standard notations emerged, in particular .
for an arbitrary character and ?
to mean that the previous character is optional.
Bash's globbing notation stems from the glob
command introduced all the way back in Unix v1 in 1971. At the time, globbing was performed by a separate program; it was later moved into the shell. The early glob
command has ?
to mean “any one character” and *
to mean “any sequence of characters”. I don't know why the characters were chosen; ?
is pretty intuitive, and *
may have been inspired from the one in regular expressions.
Globbing wasn't intended to be as general as regular expressions, and regular expressions were not very widespread at the time, so there was no call to unify the concepts. From the start, there were syntactic incompatibilities, with ?
, .
and *
meaning different things in file name patterns and in regular expressions.
Modern shells such as bash expand on glob patterns, but it was gradual evolution maintaining backward compatibility. Ksh88 (the 1988 version of the Korn shell) introduced an extended syntax for shell patterns, which could not be the same syntax as usual regular expressions but was strongly inspired by it: *(PATTERN)
to mean any number of repetitions of PATTERN
, @(PATTERN1|PATTERN2)
to mean “PATTERN1
or PATTERN2
”, etc.
Modern versions of bash (since 2.02) support ksh88's extended patterns, if you issue shopt -s extglob
first.