Why use pandas.assign rather than simply initialize new column?
The premise on assign
is that it returns:
A new DataFrame with the new columns in addition to all the existing columns.
And also you cannot do anything in-place to change the original dataframe.
The callable must not change input DataFrame (though pandas doesn't check it).
On the other hand df['ln_A'] = np.log(df['A'])
will do things inplace.
So is there a reason I should stop using my old method in favour of
df.assign
?
I think you can try df.assign
but if you do memory intensive stuff, better to work what you did before or operations with inplace=True
.
The difference concerns whether you wish to modify an existing frame, or create a new frame while maintaining the original frame as it was.
In particular, DataFrame.assign
returns you a new object that has a copy of the original data with the requested changes ... the original frame remains unchanged.
In your particular case:
>>> df = DataFrame({'A': range(1, 11), 'B': np.random.randn(10)})
Now suppose you wish to create a new frame in which A
is everywhere 1
without destroying df
. Then you could use .assign
>>> new_df = df.assign(A=1)
If you do not wish to maintain the original values, then clearly df["A"] = 1
will be more appropriate. This also explains the speed difference, by necessity .assign
must copy the data while [...]
does not.