A function to return the frequency counts of all or specific columns
Based on your comment, you just want to return a list of dataframe:
def count_all_columns_freq(df):
return [df.groupby(column).size().reset_index(name="total")
for column in df]
You can select columns in many ways in pandas
, e.g. by slicing or by passing a list of columns like in df[['colA', 'colB']]
. You don't need to change the function for that.
Personally, I would return a dictionary instead:
def frequency_dict(df):
return {column: df.groupby(column).size()
for column in df}
# so that I could use it like this:
freq = frequency_dict(df)
freq['someColumn'].loc[value]
EDIT: "What if I want to count the number of NaN
?"
In that case, you can pass dropna=False
to groupby
(this works for pandas >= 1.1.0
):
def count_all_columns_freq(df):
return [df.groupby(column, dropna=False).size().reset_index(name="total")
for column in df]