Checking whether data frame is copy or view in Pandas
I've elaborated on this example with pandas 1.0.1. There's not only a boolean _is_view
attribute, but also _is_copy
which can be None
or a reference to the original DataFrame:
df = pd.DataFrame([[1,2,3,4],[5,6,7,8]], index = ['row1','row2'],
columns = ['a','b','c','d'])
df2 = df.iloc[0:2, :]
df3 = df.loc[df['a'] == 1, :]
# df is neither copy nor view
df._is_view, df._is_copy
Out[1]: (False, None)
# df2 is a view AND a copy
df2._is_view, df2._is_copy
Out[2]: (True, <weakref at 0x00000236635C2228; to 'DataFrame' at 0x00000236635DAA58>)
# df3 is not a view, but a copy
df3._is_view, df3._is_copy
Out[3]: (False, <weakref at 0x00000236635C2228; to 'DataFrame' at 0x00000236635DAA58>)
So checking these two attributes should tell you not only if you're dealing with a view or not, but also if you have a copy or an "original" DataFrame.
See also this thread for a discussion explaining why you can't always predict whether your code will return a view or not.
Answers from HYRY and Marius in comments!
One can check either by:
testing equivalence of the
values.base
attribute rather than thevalues
attribute, as in:df.values.base is df2.values.base
instead ofdf.values is df2.values
.or using the (admittedly internal)
_is_view
attribute (df2._is_view
isTrue
).
Thanks everyone!