dirname and basename vs parameter expansion
Both have their quirks, unfortunately.
Both are required by POSIX, so the difference between them isn't a portability concern¹.
The plain way to use the utilities is
base=$(basename -- "$filename")
dir=$(dirname -- "$filename")
Note the double quotes around variable substitutions, as always, and also the --
after the command, in case the file name begins with a dash (otherwise the commands would interpret the file name as an option). This still fails in one edge case, which is rare but might be forced by a malicious user²: command substitution removes trailing newlines. So if a filename is called foo/bar
then base
will be set to bar
instead of bar
. A workaround is to add a non-newline character and strip it after the command substitution:
base=$(basename -- "$filename"; echo .); base=${base%.}
dir=$(dirname -- "$filename"; echo .); dir=${dir%.}
With parameter substitution, you don't run into edge cases related to expansion of weird characters, but there are a number of difficulties with the slash character. One thing that is not an edge case at all is that computing the directory part requires different code for the case where there is no /
.
base="${filename##*/}"
case "$filename" in
*/*) dirname="${filename%/*}";;
*) dirname=".";;
esac
The edge case is when there's a trailing slash (including the case of the root directory, which is all slashes). The basename
and dirname
commands strip off trailing slashes before they do their job. There's no way to strip the trailing slashes in one go if you stick to POSIX constructs, but you can do it in two steps. You need to take care of the case when the input consists of nothing but slashes.
case "$filename" in
*/*[!/]*)
trail=${filename##*[!/]}; filename=${filename%%"$trail"}
base=${filename##*/}
dir=${filename%/*};;
*[!/]*)
trail=${filename##*[!/]}
base=${filename%%"$trail"}
dir=".";;
*) base="/"; dir="/";;
esac
If you happen to know that you aren't in an edge case (e.g. a find
result other than the starting point always contains a directory part and has no trailing /
) then parameter expansion string manipulation is straightforward. If you need to cope with all the edge cases, the utilities are easier to use (but slower).
Sometimes, you may want to treat foo/
like foo/.
rather than like foo
. If you're acting on a directory entry then foo/
is supposed to be equivalent to foo/.
, not foo
; this makes a difference when foo
is a symbolic link to a directory: foo
means the symbolic link, foo/
means the target directory. In that case, the basename of a path with a trailing slash is advantageously .
, and the path can be its own dirname.
case "$filename" in
*/) base="."; dir="$filename";;
*/*) base="${filename##*/}"; dir="${filename%"$base"}";;
*) base="$filename"; dir=".";;
esac
The fast and reliable method is to use zsh with its history modifiers (this first strips trailing slashes, like the utilities):
dir=$filename:h base=$filename:t
¹ Unless you're using pre-POSIX shells like Solaris 10 and older's /bin/sh
(which lacked parameter expansion string manipulation features on machines still in production — but there's always a POSIX shell called sh
in the installation, only it's /usr/xpg4/bin/sh
, not /bin/sh
).
² For example: submit a file called foo
to a file upload service that doesn't protect against this, then delete it and cause foo
to be deleted instead
Both are in POSIX, so portability "should" be of no concern. The shell substitutions should be presumed to run faster.
However - it depends on what you mean by portable. Some (not necessariy) old systems did not implement those features in their /bin/sh
(Solaris 10 and older come to mind), while on the other hand, a while back, developers were cautioned that dirname
was not as portable as basename
.
For reference:
- basename - return non-directory portion of a pathname (POSIX)
dirname - return the directory portion of a pathname (POSIX)
The dirname utility originated in System III. It has evolved through the System V releases to a version that matches the requirements specified in this description in System V Release 3. 4.3 BSD and earlier versions did not include dirname.
sh manual page on Solaris 10 (Oracle)
The manual page does not mention##
or%/
.
In considering portability, I would have to take into account all of the systems where I maintain programs. Not all are POSIX, so there are tradeoffs. Your tradeoffs may differ.
There is also:
mkdir '
'; dir=$(basename ./'
'); echo "${#dir}"
0
Weird stuff like that happens because there's a lot of interpreting and parsing and the rest that needs to happen when two processes talk. Command substitutions will strip trailing newlines. And NULs (though that's obviously not relevant here). basename
and dirname
will also strip trailing newlines in any case because how else do you talk to them? I know, trailing newlines in a filename are kind of anathema anyway, but you never know. And it doesn't make sense to go the possibly flawed way when you could do otherwise.
Still... ${pathname##*/} != basename
and likewise ${pathname%/*} != dirname
. Those commands are specified to carry out a mostly well-defined sequence of steps to arrive at their specified results.
The spec is below, but first here's a terser version:
basename()
case $1 in
(*[!/]*/) basename "${1%"${1##*[!/]}"}" ${2+"$2"} ;;
(*/[!/]*) basename "${1##*/}" ${2+"$2"} ;;
(${2:+?*}"$2") printf %s%b\\n "${1%"$2"}" "${1:+\n\c}." ;;
(*) printf %s%c\\n "${1##///*}" "${1#${1#///}}" ;;
esac
That's a fully POSIX compliant basename
in simple sh
. It's not difficult to do. I merged a couple branches I use below there because I could without affecting results.
Here's the spec:
basename()
case $1 in
("") # 1. If string is a null string, it is
# unspecified whether the resulting string
# is '.' or a null string. In either case,
# skip steps 2 through 6.
echo .
;; # I feel like I should flip a coin or something.
(//) # 2. If string is "//", it is implementation-
# defined whether steps 3 to 6 are skipped or
# or processed.
# Great. What should I do then?
echo //
;; # I guess it's *my* implementation after all.
(*[!/]*/) # 3. If string consists entirely of <slash>
# characters, string shall be set to a sin‐
# gle <slash> character. In this case, skip
# steps 4 to 6.
# 4. If there are any trailing <slash> characters
# in string, they shall be removed.
basename "${1%"${1##*[!/]}"}" ${2+"$2"}
;; # Fair enough, I guess.
(*/) echo /
;; # For step three.
(*/*) # 5. If there are any <slash> characters remaining
# in string, the prefix of string up to and
# including the last <slash> character in
# string shall be removed.
basename "${1##*/}" ${2+"$2"}
;; # == ${pathname##*/}
("$2"|\
"${1%"$2"}") # 6. If the suffix operand is present, is not
# identical to the characters remaining
# in string, and is identical to a suffix of
# the characters remaining in string, the
# the suffix suffix shall be removed from
# string. Otherwise, string is not modi‐
# fied by this step. It shall not be
# considered an error if suffix is not
# found in string.
printf %s\\n "$1"
;; # So far so good for parameter substitution.
(*) printf %s\\n "${1%"$2"}"
esac # I probably won't do dirname.
...maybe the comments are distracting....