How to remove a subset of a data frame in Python?
As you seem to be unable to post a representative example I will demonstrate one approach using merge
with param indicator=True
:
So generate some data:
In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[116]:
a b c
0 -0.134933 -0.664799 -1.611790
1 1.457741 0.652709 -1.154430
2 0.534560 -0.781352 1.978084
3 0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092 1.179237
take a subset:
In [118]:
df_subset=df.iloc[2:3]
df_subset
Out[118]:
a b c
2 0.53456 -0.781352 1.978084
now perform a left merge
with param indicator=True
this will add _merge
column which indicates whether the row is left_only
, both
or right_only
(the latter won't appear in this example) and we filter the merged df to show only left_only
:
In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new
Out[121]:
a b c _merge
0 -0.134933 -0.664799 -1.611790 left_only
1 1.457741 0.652709 -1.154430 left_only
3 0.844243 -0.234208 -2.415347 left_only
4 -0.118761 -0.287092 1.179237 left_only
here is the original merged df:
In [122]:
df.merge(df_subset, how='left', indicator=True)
Out[122]:
a b c _merge
0 -0.134933 -0.664799 -1.611790 left_only
1 1.457741 0.652709 -1.154430 left_only
2 0.534560 -0.781352 1.978084 both
3 0.844243 -0.234208 -2.415347 left_only
4 -0.118761 -0.287092 1.179237 left_only
The pandas cheat sheet suggests also the following technique
adf[~adf.x1.isin(bdf.x1)]
where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.
The particular question asked by the OP can also be solved by
new_df = df.drop(df1.index)