What is the best way to remove columns in pandas
Follow the doc:
DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
And pandas.DataFrame.drop
:
Drop specified labels from rows or columns.
So, I think we should stick with df.drop
. Why? I think the pros are:
It gives us more control of the remove action:
# This will return a NEW DataFrame object, leave the original `df` untouched. df.drop('a', axis=1) # This will modify the `df` inplace. **And return a `None`**. df.drop('a', axis=1, inplace=True)
It can handle more complicated cases with it's args. E.g. with
level
, we can handle MultiIndex deletion. And witherrors
, we can prevent some bugs.It's a more unified and object oriented way.
And just like @jezrael noted in his answer:
Option 1: Using key word del
is a limited way.
Option 3: And df=df[['b','c']]
isn't even a deletion in essence. It first select data by indexing with []
syntax, then unbind the name df
with the original DataFrame and bind it with the new one (i.e. df[['b','c']]
).
The recommended way to delete a column or row in pandas dataframes is using drop.
To delete a column,
df.drop('column_name', axis=1, inplace=True)
To delete a row,
df.drop('row_index', axis=0, inplace=True)
You can refer this post to see a detailed conversation about column delete approaches.
From a speed perspective, option 1 seems to be the best. Obviously, based on the other answers, that doesn't mean it's actually the best option.
In [52]: import timeit
In [53]: s1 = """
...: import pandas as pd
...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
...: del df['a']
...: """
In [54]: s2 = """
...: import pandas as pd
...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
...: df=df.drop('a',1)
...: """
In [55]: s3 = """
...: import pandas as pd
...: df=pd.DataFrame({'a':[1,2,3,4,5],'b':[6,7,8,9,10],'c':[11,12,13,14,15]})
...: df=df[['b','c']]
...: """
In [56]: timeit.timeit(stmt=s1, number=100000)
Out[56]: 53.37321400642395
In [57]: timeit.timeit(stmt=s2, number=100000)
Out[57]: 79.68139410018921
In [58]: timeit.timeit(stmt=s3, number=100000)
Out[58]: 76.25269913673401