Pandas DataFrame mutability

Great question, thanks. I ended up with playing around a bit after reading the other answers. So I want to share this with you.

Here some code for playing around:

import pandas as pd
import numpy as np
df=pd.DataFrame([[1,2,3],[4,5,6]])
print('start',df,sep='\n',end='\n\n')
def testAddCol(df):
    df=pd.DataFrame(df, copy=True) #experiment in this line: df=df.copy(), df=df.iloc[:2,:2], df.iloc[:2,:2].copy(), nothing, ...
    df['newCol']=11
    df.iloc[0,0]=100
    return df
df2=testAddCol(df)
print('df',df,sep='\n',end='\n\n')
print('df2',df2,sep='\n',end='\n\n')

output:

start
   0  1  2
0  1  2  3
1  4  5  6

df
   0  1  2
0  1  2  3
1  4  5  6

df2
     0  1  2  newCol
0  100  2  3      11
1    4  5  6      11

This:

df2 = pd.DataFrame(df1)

Constructs a new DataFrame. There is a copy parameter whose default argument is False. According to the documentation, it means:

> Copy data from inputs. Only affects DataFrame / 2d ndarray input

So data will be shared between df2 and df1 by default. If you want there to be no sharing, but rather a complete copy, do this:

df2 = pd.DataFrame(df1, copy=True)

Or more concisely and idiomatically:

df2 = df1.copy()

If you do this:

df2 = df1.iloc[2:3,1:2].copy()

You will again get an independent copy. But if you do this:

df2 = pd.DataFrame(df1.iloc[2:3,1:2])

It will probably share the data, but this style is pretty unclear if you intend to modify df, so I suggest not writing such code. Instead, if you want no copy, just say this:

df2 = df1.iloc[2:3,1:2]

In summary: if you want a reference to existing data, do not call pd.DataFrame() or any other method at all. If you want an independent copy, call .copy().


It will probably share the data, but this style is pretty unclear if you intend to modify df, so I suggest not writing such code. Instead, if you want no copy, just say this:

df2 = df1.iloc[2:3,1:2]

In summary: if you want a reference to existing data, do not call > pd.DataFrame() or any other method at all. If you want an independent copy, call .copy()

I do not agree. Doing the above would still return a reference to the sliced section of the original DataFrame. So, if you make any changes to df2, it will reflect in df1.

Rather the .copy() should be used,

df2 = df1.iloc[2:3,1:2].copy()