How to return an array in bash without using globals?

What's wrong with globals?

Returning arrays is really not practical. There are lots of pitfalls.

That said, here's one technique that works if it's OK that the variable have the same name:

$ f () { local a; a=(abc 'def ghi' jkl); declare -p a; }
$ g () { local a; eval $(f); declare -p a; }
$ f; declare -p a; echo; g; declare -p a
declare -a a='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

declare -a a='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

The declare -p commands (except for the one in f() are used to display the state of the array for demonstration purposes. In f() it's used as the mechanism to return the array.

If you need the array to have a different name, you can do something like this:

$ g () { local b r; r=$(f); r="declare -a b=${r#*=}"; eval "$r"; declare -p a; declare -p b; }
$ f; declare -p a; echo; g; declare -p a
declare -a a='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

-bash: declare: a: not found
declare -a b='([0]="abc" [1]="def ghi" [2]="jkl")'
-bash: declare: a: not found

With Bash version 4.3 and above, you can make use of a nameref so that the caller can pass in the array name and the callee can use a nameref to populate the named array, indirectly.

#!/usr/bin/env bash

create_array() {
    local -n arr=$1             # use nameref for indirection
    arr=(one "two three" four)
}

use_array() {
    local my_array
    create_array my_array       # call function to populate the array
    echo "inside use_array"
    declare -p my_array         # test the array
}

use_array                       # call the main function

Produces the output:

inside use_array
declare -a my_array=([0]="one" [1]="two three" [2]="four")

You could make the function update an existing array as well:

update_array() {
    local -n arr=$1             # use nameref for indirection
    arr+=("two three" four)     # update the array
}

use_array() {
    local my_array=(one)
    update_array my_array       # call function to update the array
}

This is a more elegant and efficient approach since we don't need command substitution $() to grab the standard output of the function being called. It also helps if the function were to return more than one output - we can simply use as many namerefs as the number of outputs.

Here is what the Bash Manual says about nameref:

A variable can be assigned the nameref attribute using the -n option to the declare or local builtin commands (see Bash Builtins) to create a nameref, or a reference to another variable. This allows variables to be manipulated indirectly. Whenever the nameref variable is referenced, assigned to, unset, or has its attributes modified (other than using or changing the nameref attribute itself), the operation is actually performed on the variable specified by the nameref variable’s value. A nameref is commonly used within shell functions to refer to a variable whose name is passed as an argument to the function. For instance, if a variable name is passed to a shell function as its first argument, running

declare -n ref=$1 inside the function creates a nameref variable ref whose value is the variable name passed as the first argument. References and assignments to ref, and changes to its attributes, are treated as references, assignments, and attribute modifications to the variable whose name was passed as $1.

Bash can't pass around data structures as return values. A return value must be a numeric exit status between 0-255. However, you can certainly use command or process substitution to pass commands to an eval statement if you're so inclined.

This is rarely worth the trouble, IMHO. If you must pass data structures around in Bash, use a global variable--that's what they're for. If you don't want to do that for some reason, though, think in terms of positional parameters.

Your example could easily be rewritten to use positional parameters instead of global variables:

use_array () {
    for idx in "$@"; do
        echo "$idx"
    done
}

create_array () {
    local array=("a" "b" "c")
    use_array "${array[@]}"
}

This all creates a certain amount of unnecessary complexity, though. Bash functions generally work best when you treat them more like procedures with side effects, and call them in sequence.

# Gather values and store them in FOO.
get_values_for_array () { :; }

# Do something with the values in FOO.
process_global_array_variable () { :; }

# Call your functions.
get_values_for_array
process_global_array_variable

If all you're worried about is polluting your global namespace, you can also use the unset builtin to remove a global variable after you're done with it. Using your original example, let my_list be global (by removing the local keyword) and add unset my_list to the end of my_algorithm to clean up after yourself.

How to return an array in bash without using globals?

Tags:

Arrays

Bash

Parameter Passing

Global Variables

Related

Recent Posts