Deleting rows which sum to zero in 1 column but are otherwise duplicates in pandas

I put some comments in the code, so hopefully, my line of thought should be clear :

cond = df.assign(temp=df.Val2.abs())
# a way to get the same values (differentiated by their sign)
# to follow each other
cond = cond.sort_values(["ID", "Val1", "temp"])

# cumsum should yield a zero for numbers that are different
# only by their sign
cond["check"] = cond.groupby(["ID", "temp"]).Val2.cumsum()
cond["check"] = np.where(cond.check != 0, np.nan, cond.check)

# the backward fill here allows us to assign an identifier
# to the two values that summed to zero
cond["check"] = cond["check"].bfill(limit=1)

# this is where we implement your other condition
# essentially, it looks for rows that are duplicates
# and rows that any two rows sum to zero
cond.loc[
    ~(cond.duplicated(["ID", "Val1"], keep=False) & (cond.check == 0)),
    ["ID", "Val1", "Val2"],
]



     ID Val1    Val2
2   A001    2   50
3   A002    5   -40
4   A002    6   40
6   A003    8   -50
5   A003    8   60
9   A004    3   10

Deleting rows which sum to zero in 1 column but are otherwise duplicates in pandas

Tags:

Python

Pandas

Related

Recent Posts