pandas.DataFrame corrwith() method

corrwith defined as DataFrame.corrwith(other, axis=0, drop=False), so the axis=0 per default - i.e. Compute pairwise correlation between columns of two **DataFrame** objects

So the column names / labels must be the same in both DFs:

In [134]: frame.drop(labels='a', axis=1).corrwith(frame[['a']].rename(columns={'a':'b'}))
Out[134]:
b   -1.0
c    NaN
dtype: float64

NaN- means (in this case) there is nothing to compare / correlate with, because there is NO column named c in other DF

if you pass a series as other it will be translated (from the link, you've posted in comment) into:

In [142]: frame.drop(labels='a', axis=1).apply(frame.a.corr)
Out[142]:
b   -1.0
c    0.0
dtype: float64

What I think you're looking for:

Let's say your frame is:

frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])

You want the 'cost' and 'amount' columns to be correlated with all other columns in every combination.

focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)

enter image description here

Answering what you asked:

Compute pairwise correlation between rows or columns of two DataFrame objects.

Parameters:

other : DataFrame

axis : {0 or ‘index’, 1 or ‘columns’},

default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series

corrwith is behaving similarly to add, sub, mul, div in that it expects to find a DataFrame or a Series being passed in other despite the documentation saying just DataFrame.

When other is a Series it broadcast that series and matches along the axis specified by axis, default is 0. This is why the following worked:

frame.drop(labels='a', axis=1).corrwith(frame.a)

b   -1.0
c    0.0
dtype: float64

When other is a DataFrame it will match the axis specified by axis and correlate each pair identified by the other axis. If we did:

frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))

a    NaN
b    NaN
c    1.0
dtype: float64

Only c was in common and only c had its correlation calculated.

In the case you specified:

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

frame[['a']] is a DataFrame because of the [['a']] and now plays by the DataFrame rules in which its columns must match up with what its being correlated with. But you explicitly drop a from the first frame then correlate with a DataFrame with nothing but a. The result is NaN for every column.

pandas.DataFrame corrwith() method

What I think you're looking for:

Answering what you asked:

Tags:

Python

Pandas

Dataframe

Related

Recent Posts