pandas.DataFrame corrwith() method
corrwith defined as DataFrame.corrwith(other, axis=0, drop=False)
, so the axis=0
per default - i.e. Compute pairwise correlation between columns of two **DataFrame** objects
So the column names / labels must be the same in both DFs:
In [134]: frame.drop(labels='a', axis=1).corrwith(frame[['a']].rename(columns={'a':'b'}))
Out[134]:
b -1.0
c NaN
dtype: float64
NaN
- means (in this case) there is nothing to compare / correlate with, because there is NO column named c
in other
DF
if you pass a series as other
it will be translated (from the link, you've posted in comment) into:
In [142]: frame.drop(labels='a', axis=1).apply(frame.a.corr)
Out[142]:
b -1.0
c 0.0
dtype: float64
What I think you're looking for:
Let's say your frame is:
frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])
You want the 'cost'
and 'amount'
columns to be correlated with all other columns in every combination.
focus_cols = ['cost', 'amount']
frame.corr().filter(focus_cols).drop(focus_cols)
Answering what you asked:
Compute pairwise correlation between rows or columns of two DataFrame objects.
Parameters:
other : DataFrame
axis : {0 or ‘index’, 1 or ‘columns’},
default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from result, default returns union of all Returns: correls : Series
corrwith
is behaving similarly to add
, sub
, mul
, div
in that it expects to find a DataFrame
or a Series
being passed in other
despite the documentation saying just DataFrame
.
When other
is a Series
it broadcast that series and matches along the axis specified by axis
, default is 0. This is why the following worked:
frame.drop(labels='a', axis=1).corrwith(frame.a)
b -1.0
c 0.0
dtype: float64
When other
is a DataFrame
it will match the axis specified by axis
and correlate each pair identified by the other axis. If we did:
frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))
a NaN
b NaN
c 1.0
dtype: float64
Only c
was in common and only c
had its correlation calculated.
In the case you specified:
frame.drop(labels='a', axis=1).corrwith(frame[['a']])
frame[['a']]
is a DataFrame
because of the [['a']]
and now plays by the DataFrame
rules in which its columns must match up with what its being correlated with. But you explicitly drop a
from the first frame then correlate with a DataFrame
with nothing but a
. The result is NaN
for every column.