Displaying pair plot in Pandas data frame
This code worked for me using Python 3.5.2:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import datasets
iris_dataset = datasets.load_iris()
X = iris_dataset.data
Y = iris_dataset.target
iris_dataframe = pd.DataFrame(X, columns=iris_dataset.feature_names)
# Create a scatter matrix from the dataframe, color by y_train
grr = pd.plotting.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8)
For pandas version < v0.20.0.
Thanks to michael-szczepaniak for pointing out that this API had been deprecated.
grr = pd.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8)
I just had to remove the cmap=mglearn.cm3
piece, because I was not able to make mglearn work. There is a version mismatch issue with sklearn.
To not display the image and save it directly to file you can use this method:
plt.savefig('foo.png')
Also remove
# %matplotlib inline
Just an update to Vikash's excellent answer. The last two lines should now be:
grr = pd.plotting.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8)
The scatter_matrix function has been moved to the plotting package, so the original answer, while correct is now deprecated.
So the complete code would now be:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn import datasets
iris_dataset = datasets.load_iris()
X = iris_dataset.data
Y = iris_dataset.target
iris_dataframe = pd.DataFrame(X, columns=iris_dataset.feature_names)
# create a scatter matrix from the dataframe, color by y_train
grr = pd.plotting.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8)
This is also possible using seaborn:
import seaborn as sns
df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")