Pandas - Groupby with conditional formula

Use only one condition if never values in columns SibSp and Parch are less as 0:

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)

df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

If is impossible use first use both conditions:

m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family', 
    np.where(m2, 'No Family', 'Not'))

df = df.groupby(a)['Survived'].mean()
print (df)
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.

family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out: 
Has Family    0.5
No Family     1.0
Name: Survived, dtype: float64

You could define your conditions in a list and use the function group_by_condition below to create a filtered list for each condition. Afterwards you can select the resulting items using pattern matching:

df = [
  {"Survived": 0, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 1, "Parch": 0},
  {"Survived": 1, "SibSp": 0, "Parch": 0}]

conditions = [
  lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0),  # has family
  lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0)  # no family
]

def group_by_condition(l, conditions):
    return [[item for item in l if condition(item)] for condition in conditions]

[has_family, no_family] = group_by_condition(df, conditions)

Pandas - Groupby with conditional formula

Tags:

Python

Pandas

Conditional Statements

Dataframe

Pandas Groupby

Related

Recent Posts