Pairwise Scatter Plots with Histograms and Correlations
trellisPlot[data, DataTicks -> Automatic, DataSpacing->.1, DataLabels -> labels,
ImageSize -> 500, PlotStyle -> colors]
The picture above is produced in two steps: First the function pwScatterPlot
is used to produce a scatter plot with histograms on the diagonal, then we add linear fits and correlations using the function addFitsAndCorrelations
(both functions defined below.) The function trellisPlot
is the composition of addFitsAndCorrelations
and ``pwScatterPlot`.
We combine PairwiseScatterPlot
from the "StatisticalPlots`"
package with some post-processing steps to get a function that takes the same argument and options as PairwiseScatterPlot
and gives histograms on the diagonal of the panel.
The setting for the PlotStyle
option should be an n-by-n matrix of directives (where n is the number of series in the input data) with diagonal entries different from off-diagonal ones.
ClearAll[addHistograms, pwScatterPlot]
Needs["StatisticalPlots`"]
addHistograms = Module[{diag = Diagonal[Partition[Cases[#, {dir_, ___Point} :> dir, All],
Round @ PlotRange[#][[1, 2]]]]},
# /. {c : Alternatives@@diag, p__Point} :> Module[{xy = Cases[{p}, Point[x_] :> x]},
Inset[Histogram[xy[[All, 1]], ChartStyle->c, AspectRatio->1/GoldenRatio, Axes->False],
Min /@ Transpose @ xy, {Left, Bottom}, {.9, .9}]]] &;
pwScatterPlot = addHistograms @* PairwiseScatterPlot;
Example:
SeedRandom[1]
data = RandomVariate[NormalDistribution[10, 5], {500, 5}].RandomReal[{-2, 2}, {5, 5}];
labels = StringTake[RandomWord["Noun", 5], 4];
One possible way to produce a matrix of styles with diagonal entries different from off-diagonal ones:
mat = # + Transpose[UpperTriangularize[#, 1]] & @
PadLeft[TakeList[Range[# + (# - 1) #/2], Reverse@Range[#]]] &;
colors = Map[ColorData[97], mat[Dimensions[data][[2]]], {-1}];
pwScatterPlot[data, DataTicks -> Automatic,
DataLabels -> labels, ImageSize -> 500, PlotStyle -> colors]
With another layer of post-processing we can add linear fit lines and texts:
ClearAll[addFitsAndCorrelations, trellisPlot]
addFitsAndCorrelations = # /. {dir_, p__Point} :>
Module[{xy = Cases[{p}, Point[x_] :> x]}, {dir, p,
Text[Style[Round[Correlation @@ Transpose[xy], .001], Black, FontSize -> Scaled[.025]],
Offset[{5, -15}, Floor[Min /@ Transpose[xy] + {0, 1}]], Left],
First @ Plot[Evaluate @ LinearModelFit[xy, t, t][t],
{t, Min[xy[[All, 1]]], Max[xy[[All, 1]]]}, PlotStyle -> Directive[Thick, Black]]}] &;
trellisPlot = addFitsAndCorrelations @* pwScatterPlot;
trellisPlot[data, DataTicks -> Automatic, DataLabels -> labels,
ImageSize -> 500, PlotStyle -> colors]
picture at the top
You can use the function VariableDependenceGrid
from the package "MathematicaForPredictionUtilities.m".
?VariableDependenceGrid
"VariableDependenceGrid[data_?MatrixQ,columnNames,opts] makes a grid with variable dependence plots."
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MathematicaForPredictionUtilities.m"]
wineData = (Flatten@*List) @@@ ExampleData[{"MachineLearning", "WineQuality"}, "Data"];
wineColumnNames = (Flatten@*List) @@ ExampleData[{"MachineLearning", "WineQuality"}, "VariableDescriptions"];
VariableDependenceGrid[RandomSample[wineData, 200][[All, 1 ;; -1 ;; 3]], wineColumnNames[[1 ;; -1 ;; 3]]]
The function VariableDependenceGrid
also produces plots of dependencies with categorical variables and it also works over Dataset
objects. Here is an example:
titanicData = (Flatten@*List) @@@ ExampleData[{"MachineLearning", "Titanic"}, "Data"];
titanicColumnNames = (Flatten@*List) @@ ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"];
dsTitanic = Dataset[titanicData][All, AssociationThread[titanicColumnNames -> #] &];
VariableDependenceGrid[dsTitanic]