Pairwise Scatter Plots with Histograms and Correlations

trellisPlot[data, DataTicks -> Automatic, DataSpacing->.1, DataLabels -> labels, 
      ImageSize -> 500, PlotStyle -> colors]

enter image description here

The picture above is produced in two steps: First the function pwScatterPlot is used to produce a scatter plot with histograms on the diagonal, then we add linear fits and correlations using the function addFitsAndCorrelations (both functions defined below.) The function trellisPlot is the composition of addFitsAndCorrelations and ``pwScatterPlot`.

We combine PairwiseScatterPlot from the "StatisticalPlots`" package with some post-processing steps to get a function that takes the same argument and options as PairwiseScatterPlot and gives histograms on the diagonal of the panel.

The setting for the PlotStyle option should be an n-by-n matrix of directives (where n is the number of series in the input data) with diagonal entries different from off-diagonal ones.

ClearAll[addHistograms, pwScatterPlot] 
Needs["StatisticalPlots`"]      
addHistograms = Module[{diag = Diagonal[Partition[Cases[#, {dir_, ___Point} :> dir, All],  
      Round @ PlotRange[#][[1, 2]]]]}, 
  # /. {c : Alternatives@@diag, p__Point} :> Module[{xy = Cases[{p}, Point[x_] :> x]},
   Inset[Histogram[xy[[All, 1]], ChartStyle->c, AspectRatio->1/GoldenRatio, Axes->False],
     Min /@ Transpose @ xy, {Left, Bottom}, {.9, .9}]]] &; 

pwScatterPlot = addHistograms @* PairwiseScatterPlot;

Example:

SeedRandom[1]
data = RandomVariate[NormalDistribution[10, 5], {500, 5}].RandomReal[{-2, 2}, {5, 5}];
labels = StringTake[RandomWord["Noun", 5], 4];

One possible way to produce a matrix of styles with diagonal entries different from off-diagonal ones:

mat = # + Transpose[UpperTriangularize[#, 1]] & @ 
      PadLeft[TakeList[Range[# + (# - 1) #/2], Reverse@Range[#]]] &;
colors = Map[ColorData[97], mat[Dimensions[data][[2]]], {-1}];

pwScatterPlot[data, DataTicks -> Automatic, 
   DataLabels -> labels, ImageSize -> 500, PlotStyle -> colors]

enter image description here

With another layer of post-processing we can add linear fit lines and texts:

ClearAll[addFitsAndCorrelations, trellisPlot]
addFitsAndCorrelations = # /. {dir_, p__Point} :> 
  Module[{xy = Cases[{p}, Point[x_] :> x]}, {dir, p, 
   Text[Style[Round[Correlation @@ Transpose[xy], .001], Black, FontSize -> Scaled[.025]], 
       Offset[{5, -15}, Floor[Min /@ Transpose[xy] + {0, 1}]], Left], 
   First @ Plot[Evaluate @ LinearModelFit[xy, t, t][t],
     {t, Min[xy[[All, 1]]], Max[xy[[All, 1]]]}, PlotStyle -> Directive[Thick, Black]]}] &;

trellisPlot = addFitsAndCorrelations @* pwScatterPlot;

trellisPlot[data, DataTicks -> Automatic, DataLabels -> labels, 
  ImageSize -> 500, PlotStyle -> colors]

picture at the top


You can use the function VariableDependenceGrid from the package "MathematicaForPredictionUtilities.m".

?VariableDependenceGrid

"VariableDependenceGrid[data_?MatrixQ,columnNames,opts] makes a grid with variable dependence plots."

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MathematicaForPredictionUtilities.m"]

wineData = (Flatten@*List) @@@ ExampleData[{"MachineLearning", "WineQuality"}, "Data"];
wineColumnNames = (Flatten@*List) @@ ExampleData[{"MachineLearning", "WineQuality"}, "VariableDescriptions"];

VariableDependenceGrid[RandomSample[wineData, 200][[All, 1 ;; -1 ;; 3]], wineColumnNames[[1 ;; -1 ;; 3]]]

enter image description here

The function VariableDependenceGrid also produces plots of dependencies with categorical variables and it also works over Dataset objects. Here is an example:

titanicData = (Flatten@*List) @@@ ExampleData[{"MachineLearning", "Titanic"}, "Data"];
titanicColumnNames = (Flatten@*List) @@ ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"];

dsTitanic = Dataset[titanicData][All, AssociationThread[titanicColumnNames -> #] &];

VariableDependenceGrid[dsTitanic]

enter image description here