Keep indices in Pandas DataFrame with a certain number of non-NaN entires
Let us try filter
out = df.groupby(level=0).filter(lambda x : x.isna().sum()<=1)
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
Or we do isin
df[df.index.isin(df.isna().sum(level=0).loc[lambda x : x['X']<=1].index)]
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
As another option, let's try filtering via GroupBy.transform
and boolean indexing:
df1[df1['X'].isna().groupby(df1.index).transform('sum') <= 1]
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
Or, almost the same way,
df1[df1.assign(X=df1['X'].isna()).groupby(level=0)['X'].transform('sum') <= 1]
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0
You might have a good shot at getting this to work with Dask too.
I am new to dask , looked at some examples and docs , however the following seems to work;
from dask import dataframe as dd
sd = dd.from_pandas(df1, npartitions=3)
#converts X to boolean checking for isna() and the groupby on index and sum
s = sd.X.isna().groupby(sd.index).sum().compute()
#using the above we can boolean index to check if sum is less than 2 , then use loc
out_dd = sd.loc[list(s[s<2].index)]
out_dd.head(6,npartitions=-1)
X
b 1.0
b 1.0
b NaN
c 1.0
c 1.0
c 1.0