Sort an array of pathnames of files by their basenames
sort
in GNU coreutils allows custom field separator and key. You set /
as field separator and sort based on second field to sort on the basename, instead of entire path.
printf "%s\n" "${filearray[@]}" | sort -t/ -k2
will produce
dir2/0003.pdf
dir1/0010.pdf
dir3/0040.pdf
Sorting with gawk expression (supported by bash's readarray
):
Sample array of filenames containing whitespaces:
filearray=("dir1/name 0010.pdf" "dir2/name 0003.pdf" "dir3/name 0040.pdf")
readarray -t sortedfilearr < <(printf '%s\n' "${filearray[@]}" | awk -F'/' '
BEGIN{PROCINFO["sorted_in"]="@val_num_asc"}
{ a[$0]=$NF }
END{ for(i in a) print i}')
The output:
echo "${sortedfilearr[*]}"
dir2/name 0003.pdf dir1/name 0010.pdf dir3/name 0040.pdf
Accessing single item:
echo "${sortedfilearr[1]}"
dir1/name 0010.pdf
That assumes that no file path contains newline characters. Note that the numerical sorting of the values in @val_num_asc
only applies to the leading numerical part of the key (none in this example) with fallback to lexical comparison (based on strcmp()
, not the locale's sorting order) for ties.
oldIFS="$IFS"; IFS=$'\n'
if [[ -o noglob ]]; then
setglob=1; set -o noglob
else
setglob=0
fi
sorted=( $(printf '%s\n' "${filearray[@]}" |
awk '{ print $NF, $0 }' FS='/' OFS='/' |
sort | cut -d'/' -f2- ) )
IFS="$oldIFS"; unset oldIFS
(( setglob == 1 )) && set +o noglob
unset setglob
Sorting of file names with newlines in their names will cause issues at the sort
step.
It generates a /
-delimited list with awk
that contains the basename in the first column and the complete path as the remaining columns:
0003.pdf/dir2/0003.pdf
0010.pdf/dir1/0010.pdf
0040.pdf/dir3/0040.pdf
This is what is sorted, and cut
is used to remove the first /
-delimited column. The result is turned into a new bash
array.