Bash sort array according to length of elements?
If the strings don't contain newlines, the following should work. It sorts the indices of the array by the length, using the strings themselves as the secondary sort criterion.
#!/bin/bash
array=(
"tiny string"
"the longest string in the list"
"middle string"
"medium string"
"also a medium string"
"short string"
)
expected=(
"the longest string in the list"
"also a medium string"
"medium string"
"middle string"
"short string"
"tiny string"
)
indexes=( $(
for i in "${!array[@]}" ; do
printf '%s %s %s\n' $i "${#array[i]}" "${array[i]}"
done | sort -nrk2,2 -rk3 | cut -f1 -d' '
))
for i in "${indexes[@]}" ; do
sorted+=("${array[i]}")
done
diff <(echo "${expected[@]}") \
<(echo "${sorted[@]}")
Note that moving to a real programming language can greatly simplify the solution, e.g. in Perl, you can just
sort { length $b <=> length $a or $a cmp $b } @array
readarray -t array < <(
for str in "${array[@]}"; do
printf '%d\t%s\n' "${#str}" "$str"
done | sort -k 1,1nr -k 2 | cut -f 2- )
This reads the values of the sorted array from a process substitution.
The process substitution contains a loop. The loop output each element of the array prepended by the element's length and a tab character in-between.
The output of the loop is sorted numerically from largest to smallest (and alphabetically if the lengths are the same; use -k 2r
in place of -k 2
to reverse the alphabetical order) and the result of that is sent to cut
which deletes the column with the string lengths.
Sort test script followed by a test run:
array=(
"tiny string"
"the longest string in the list"
"middle string"
"medium string"
"also a medium string"
"short string"
)
readarray -t array < <(
for str in "${array[@]}"; do
printf '%d\t%s\n' "${#str}" "$str"
done | sort -k 1,1nr -k 2 | cut -f 2- )
printf '%s\n' "${array[@]}"
$ bash script.sh
the longest string in the list
also a medium string
medium string
middle string
short string
tiny string
This assumes that the strings do not contain newlines. On GNU systems with a recent bash
, you can support embedded newlines in the data by using the nul-character as the record separator instead of newline:
readarray -d '' -t array < <(
for str in "${array[@]}"; do
printf '%d\t%s\0' "${#str}" "$str"
done | sort -z -k 1,1nr -k 2 | cut -z -f 2- )
Here, the data is printed with trailing \0
in the loop instead of newlines, the sort
and cut
reads nul-delimited lines through their -z
GNU options and readarray
finally reads the nul-delimited data with -d ''
.
I won't completely repeat what I've already said about sorting in bash, just you can sort within bash, but maybe you shouldn't. Below is a bash-only implementation of an insertion sort, which is O(n2), and so is only tolerable for small arrays. It sorts the array elements in-place by their length, in decreasing order. It does not do a secondary alphabetical sort.
array=(
"tiny string"
"the longest string in the list"
"middle string"
"medium string"
"also a medium string"
"short string"
)
function sort_inplace {
local i j tmp
for ((i=0; i <= ${#array[@]} - 2; i++))
do
for ((j=i + 1; j <= ${#array[@]} - 1; j++))
do
local ivalue jvalue
ivalue=${#array[i]}
jvalue=${#array[j]}
if [[ $ivalue < $jvalue ]]
then
tmp=${array[i]}
array[i]=${array[j]}
array[j]=$tmp
fi
done
done
}
echo Initial:
declare -p array
sort_inplace
echo Sorted:
declare -p array
As evidence that this is a specialized solution, consider the timings of the existing three answers on various size arrays:
# 6 elements
Choroba: 0m0.004s
Kusalananda: 0m0.004s
Jeff: 0m0.018s ## already 4 times slower!
# 1000 elements
Choroba: 0m0.004s
Kusalananda: 0m0.004s
Jeff: 0m0.021s ## up to 5 times slower, now!
5000 elements
Choroba: 0m0.004s
Kusalananda: 0m0.004s
Jeff: 0m0.019s
# 10000 elements
Choroba: 0m0.004s
Kusalananda: 0m0.006s
Jeff: 0m0.020s
# 99000 elements
Choroba: 0m0.015s
Kusalananda: 0m0.012s
Jeff: 0m0.119s
Choroba and Kusalananda have the right idea: compute the lengths once and use dedicated utilities for sorting and text processing.