Numpy:zero mean data and standardization
This is also called zscore
.
SciPy has a utility for it:
>>> from scipy import stats
>>> stats.zscore([ 0.7972, 0.0767, 0.4383, 0.7866, 0.8091,
... 0.1954, 0.6307, 0.6599, 0.1065, 0.0508])
array([ 1.1273, -1.247 , -0.0552, 1.0923, 1.1664, -0.8559, 0.5786,
0.6748, -1.1488, -1.3324])
Follow the comments in the code below
import numpy as np
# create x
x = np.asarray([1,2,3,4], dtype=np.float64)
np.mean(x) # calculates the mean of the array x
x-np.mean(x) # this is euivalent to subtracting the mean of x from each value in x
x-=np.mean(x) # the -= means can be read as x = x- np.mean(x)
np.std(x) # this calcualtes the standard deviation of the array
x/=np.std(x) # the /= means can be read as x = x/np.std(x)
From the given syntax you have I conclude, that your array is multidimensional. Hence I will first discuss the case where your x is just a linear array:
np.mean(x)
will compute the mean, by broadcasting x-np.mean(x)
the mean of x
will be subtracted form all the entries. x -=np.mean(x,axis = 0)
is equivalent to x = x-np.mean(x,axis = 0). Similar for
x/np.std(x)`.
In the case of multidimensional arrays the same thing happens, but instead of computing the mean over the entire array, you just compute the mean over the first "axis". Axis is the numpy
word for dimension. So if your x
is two dimensional, then np.mean(x,axis =0) = [np.mean(x[:,0], np.mean(x[:,1])...]
. Broadcasting again will ensure, that this is done to all elements.
Note, that this only works with the first dimension, otherwise the shapes will not match for broadcasting. If you want to normalize wrt another axis you need to do something like:
x -= np.expand_dims(np.mean(x,axis = n),n)