impute missing values in python code example
Example 1: replace missing values, encoded as np.nan, using the mean value of the columns
# Univariate feature imputation
import numpy as np
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit([[1, 2], [np.nan, 3], [7, 6]])
# SimpleImputer()
X = [[np.nan, 2], [6, np.nan], [7, 6]]
print(imp.transform(X))
# [[4. 2. ]
# [6. 3.666...]
# [7. 6. ]]
# SimpleImputer class also supports categorical data
import pandas as pd
df = pd.DataFrame([["a", "x"],
[np.nan, "y"],
["a", np.nan],
["b", "y"]], dtype="category")
imp = SimpleImputer(strategy="most_frequent")
print(imp.fit_transform(df))
# [['a' 'x']
# ['a' 'y']
# ['a' 'y']
# ['b' 'y']]
Example 2: how to check missing values in python
# Total missing values for each featureprint df.isnull().sum()Out:ST_NUM 2ST_NAME 0OWN_OCCUPIED 2NUM_BEDROOMS 4