Correlation between two vectors?

To perform a linear regression between two vectors x and y follow these steps:

[p,err] = polyfit(x,y,1);   % First order polynomial
y_fit = polyval(p,x,err);   % Values on a line
y_dif = y - y_fit;          % y value difference (residuals)
SSdif = sum(y_dif.^2);      % Sum square of difference
SStot = (length(y)-1)*var(y);   % Sum square of y taken from variance
rsq = 1-SSdif/SStot;        % Correlation 'r' value. If 1.0 the correlelation is perfect

For x=[10;200;7;150] and y=[0.001;0.45;0.0007;0.2] I get rsq = 0.9181.

Reference URL: http://www.mathworks.com/help/matlab/data_analysis/linear-regression.html


For correlations you can just use the corr function (statistics toolbox)

corr(A_1(:), A_2(:))

Note that you can also just use

corr(A_1, A_2)

But the linear indexing guarantees that your vectors don't need to be transposed.


Try xcorr, it's a built-in function in MATLAB for cross-correlation:

c = xcorr(A_1, A_2);

However, note that it requires the Signal Processing Toolbox installed. If not, you can look into the corrcoef command instead.


Given:

A_1 = [10 200 7 150]';
A_2 = [0.001 0.450 0.007 0.200]';

(As others have already pointed out) There are tools to simply compute correlation, most obviously corr:

corr(A_1, A_2);  %Returns 0.956766573975184  (Requires stats toolbox)

You can also use base Matlab's corrcoef function, like this:

M = corrcoef([A_1 A_2]):  %Returns [1 0.956766573975185; 0.956766573975185 1];
M(2,1);  %Returns 0.956766573975184 

Which is closely related to the cov function:

cov([condition(A_1) condition(A_2)]);

As you almost get to in your original question, you can scale and adjust the vectors yourself if you want, which gives a slightly better understanding of what is going on. First create a condition function which subtracts the mean, and divides by the standard deviation:

condition = @(x) (x-mean(x))./std(x);  %Function to subtract mean AND normalize standard deviation

Then the correlation appears to be (A_1 * A_2)/(A_1^2), like this:

(condition(A_1)' * condition(A_2)) / sum(condition(A_1).^2);  %Returns 0.956766573975185

By symmetry, this should also work

(condition(A_1)' * condition(A_2)) / sum(condition(A_2).^2); %Returns 0.956766573975185

And it does.

I believe, but don't have the energy to confirm right now, that the same math can be used to compute correlation and cross correlation terms when dealing with multi-dimensiotnal inputs, so long as care is taken when handling the dimensions and orientations of the input arrays.