Calculating Mean of arrays with different lengths

The below function also works by adding columns of arrays of different lengths:

def avgNestedLists(nested_vals):
    """
    Averages a 2-D array and returns a 1-D array of all of the columns
    averaged together, regardless of their dimensions.
    """
    output = []
    maximum = 0
    for lst in nested_vals:
        if len(lst) > maximum:
            maximum = len(lst)
    for index in range(maximum): # Go through each index of longest list
        temp = []
        for lst in nested_vals: # Go through each list
            if index < len(lst): # If not an index error
                temp.append(lst[index])
        output.append(np.nanmean(temp))
    return output

Going off of your first example:

avgNestedLists([[1, 2, 3, 4, 8], [5, 6, 7, 8, 7, 8], [1, 2, 3, 4]])

Outputs:

[2.3333333333333335,
 3.3333333333333335,
 4.333333333333333,
 5.333333333333333,
 7.5,
 8.0]

The reason np.amax(nested_lst) or np.max(nested_lst) was not used in the beginning to find the max value is because it will return an array if the nested lists are of different sizes.


numpy.ma.mean allows you to compute the mean of non-masked array elements. However, to use numpy.ma.mean, you have to first combine your three numpy arrays into one masked array:

import numpy as np
x = np.array([[1, 2], [3, 4]])
y = np.array([[1, 2, 3], [3, 4, 5]])
z = np.array([[7], [8]])

arr = np.ma.empty((2,3,3))
arr.mask = True
arr[:x.shape[0],:x.shape[1],0] = x
arr[:y.shape[0],:y.shape[1],1] = y
arr[:z.shape[0],:z.shape[1],2] = z
print(arr.mean(axis = 2))

yields

[[3.0 2.0 3.0]
 [4.66666666667 4.0 5.0]]

I often needed this for plotting mean of performance curves with different lengths.

Plot of multiple curves with different lengths

Solved it with simple function (based on answer of @unutbu):

def tolerant_mean(arrs):
    lens = [len(i) for i in arrs]
    arr = np.ma.empty((np.max(lens),len(arrs)))
    arr.mask = True
    for idx, l in enumerate(arrs):
        arr[:len(l),idx] = l
    return arr.mean(axis = -1), arr.std(axis=-1)

y, error = tolerant_mean(list_of_ys_diff_len)
ax.plot(np.arange(len(y))+1, y, color='green')

So applying that function to the list of above-plotted curves yields the following:

Plot of mean and std of different curves with different lengths