Bash sort array according to length of elements?

If the strings don't contain newlines, the following should work. It sorts the indices of the array by the length, using the strings themselves as the secondary sort criterion.

#!/bin/bash
array=(
    "tiny string"
    "the longest string in the list"
    "middle string"
    "medium string"
    "also a medium string"
    "short string"
)
expected=(
    "the longest string in the list"
    "also a medium string"
    "medium string"
    "middle string"
    "short string"
    "tiny string"
)

indexes=( $(
    for i in "${!array[@]}" ; do
        printf '%s %s %s\n' $i "${#array[i]}" "${array[i]}"
    done | sort -nrk2,2 -rk3 | cut -f1 -d' '
))

for i in "${indexes[@]}" ; do
    sorted+=("${array[i]}")
done

diff <(echo "${expected[@]}") \
     <(echo "${sorted[@]}")

Note that moving to a real programming language can greatly simplify the solution, e.g. in Perl, you can just

sort { length $b <=> length $a or $a cmp $b } @array

readarray -t array < <(
for str in "${array[@]}"; do
    printf '%d\t%s\n' "${#str}" "$str"
done | sort -k 1,1nr -k 2 | cut -f 2- )

This reads the values of the sorted array from a process substitution.

The process substitution contains a loop. The loop output each element of the array prepended by the element's length and a tab character in-between.

The output of the loop is sorted numerically from largest to smallest (and alphabetically if the lengths are the same; use -k 2r in place of -k 2 to reverse the alphabetical order) and the result of that is sent to cut which deletes the column with the string lengths.

Sort test script followed by a test run:

array=(
    "tiny string"
    "the longest string in the list"
    "middle string"
    "medium string"
    "also a medium string"
    "short string"
)

readarray -t array < <(
for str in "${array[@]}"; do
    printf '%d\t%s\n' "${#str}" "$str"
done | sort -k 1,1nr -k 2 | cut -f 2- )

printf '%s\n' "${array[@]}"

$ bash script.sh
the longest string in the list
also a medium string
medium string
middle string
short string
tiny string

This assumes that the strings do not contain newlines. On GNU systems with a recent bash, you can support embedded newlines in the data by using the nul-character as the record separator instead of newline:

readarray -d '' -t array < <(
for str in "${array[@]}"; do
    printf '%d\t%s\0' "${#str}" "$str"
done | sort -z -k 1,1nr -k 2 | cut -z -f 2- )

Here, the data is printed with trailing \0 in the loop instead of newlines, the sort and cut reads nul-delimited lines through their -z GNU options and readarray finally reads the nul-delimited data with -d ''.

I won't completely repeat what I've already said about sorting in bash, just you can sort within bash, but maybe you shouldn't. Below is a bash-only implementation of an insertion sort, which is O(n²), and so is only tolerable for small arrays. It sorts the array elements in-place by their length, in decreasing order. It does not do a secondary alphabetical sort.

array=(
    "tiny string"
    "the longest string in the list"
    "middle string"
    "medium string"
    "also a medium string"
    "short string"
    )

function sort_inplace {
  local i j tmp
  for ((i=0; i <= ${#array[@]} - 2; i++))
  do
    for ((j=i + 1; j <= ${#array[@]} - 1; j++))
    do
      local ivalue jvalue
        ivalue=${#array[i]}
        jvalue=${#array[j]}
        if [[ $ivalue < $jvalue ]]
        then
                tmp=${array[i]}
                array[i]=${array[j]}
                array[j]=$tmp
        fi
    done
  done
}

echo Initial:
declare -p array

sort_inplace

echo Sorted:
declare -p array

As evidence that this is a specialized solution, consider the timings of the existing three answers on various size arrays:

# 6 elements
Choroba: 0m0.004s
Kusalananda: 0m0.004s
Jeff: 0m0.018s         ## already 4 times slower!

# 1000 elements
Choroba: 0m0.004s
Kusalananda: 0m0.004s
Jeff: 0m0.021s        ## up to 5 times slower, now!

5000 elements
Choroba: 0m0.004s
Kusalananda: 0m0.004s
Jeff: 0m0.019s

# 10000 elements
Choroba: 0m0.004s
Kusalananda: 0m0.006s
Jeff: 0m0.020s

# 99000 elements
Choroba: 0m0.015s
Kusalananda: 0m0.012s
Jeff: 0m0.119s

Choroba and Kusalananda have the right idea: compute the lengths once and use dedicated utilities for sorting and text processing.

Bash sort array according to length of elements?

Tags:

Bash

Sort

Array

Shell Script

Related

Recent Posts