How to Classify with LogisticRegression with a non linear border?
Here is one approach.
dataraw =
Import["https://raw.githubusercontent.com/anirudhjayaraman/Machine-Learning/master/Andrew%20Ng%20Stanford%20Coursera/Week%2003/ex2/ex2data2.txt", "CSV"];
X = dataraw[[All, 1 ;; 2]];
y = dataraw[[All, 3]];
data = Flatten[#] & /@ Transpose[{X, y}];
lr = LogitModelFit[data, {x1, x1^2, x2, x2^2, x1 x2}, {x1, x2}];
data1 = Select[data, #[[3]] == 1 &];
data0 = Select[data, #[[3]] == 0 &];
Show[ContourPlot[lr[x1, x2], {x1, -1, 1.3}, {x2, -1, 1.3},
Contours -> {Length[data1]/(Length[data1] + Length[data0])},
ContourShading -> None, ContourStyle -> Thick],
ListPlot[{data1[[All, {1, 2}]], data0[[All, {1, 2}]]},
PlotStyle -> {Green, Red}, PlotLegends -> {"1", "0"}]]
Here's how to do it using Classify
:
X = dataraw[[All, 1 ;; 2]];
y = dataraw[[All, 3]];
data = Flatten[#] & /@ Transpose[{X, y}];
data1 = Select[data, #[[3]] == 1 &];
data0 = Select[data, #[[3]] == 0 &];
XX = {#[[1]], Exp[#[[1]]^2], #[[2]],
Exp[#[[2]]^2], #[[1]] #[[2]]} & /@ X;
data2 = Thread[XX -> y];
cflogistic = Classify[data2, Method -> {"LogisticRegression"}];
decisionboundarylogistic =
ContourPlot[
cflogistic[{x1, Exp[x1^2], x2, Exp[x2^2], x1 x2},
"Probability" -> 1], {x1, -1, 1}, {x2, -1, 1},
Contours -> {0, 0.5, 1}, ContourShading -> False,
PlotLegends -> {"LogisticRegression"}];
Show[decisionboundarylogistic,
ListPlot[{data1[[All, {1, 2}]], data0[[All, {1, 2}]]},
PlotMarkers -> {"\[HappySmiley]", "\[SadSmiley]"}]]
It appears that one must eponentiate each even-numbered predictor value to get the expected logistic regression. (If I have some time in the near future, I'll see if I can pin that down. Don't know if that might be a bug or an undocumented feature.)