Regression tree in R
This can also happen if column names are integers (1:N)
, even though they are stored as characters.
Turn your learn matrix into a data frame.
Example:
load("exa.Rda")
library(rpart)
learn <- data.frame(learn)
rt.model <- rpart(razlika ~ ., learn)
rt.model
yields:
n= 226
node), split, n, deviance, yval
* denotes terminal node
1) root 226 31417.5100 3.3849560
2) B.reb>=40.80799 117 12661.2300 0.4871795
4) B.ft>=0.7666193 31 2685.4190 -5.7741940
8) A.fg2< 0.4645683 22 1846.7730 -8.3181820
16) A.ft< 0.7464692 7 365.4286 -14.2857100 *
17) A.ft>=0.7464692 15 1115.7330 -5.5333330 *
9) A.fg2>=0.4645683 9 348.2222 0.4444444 *
5) B.ft< 0.7666193 86 8322.3720 2.7441860
10) B.avg.conceded.< 98.19592 76 7255.6320 1.7105260
20) A.reb< 39.29941 19 1520.6320 -3.5789470 *
21) A.reb>=39.29941 57 5026.2110 3.4736840
42) A.3pt< 0.3945418 35 2500.1710 0.7714286
84) A.ft< 0.7460665 17 1270.2350 -2.4705880 *
85) A.ft>=0.7460665 18 882.5000 3.8333330 *
43) A.3pt>=0.3945418 22 1863.8640 7.7727270
86) B.ft>=0.7214165 13 718.9231 4.0769230 *
87) B.ft< 0.7214165 9 710.8889 13.1111100 *
11) B.avg.conceded.>=98.19592 10 368.4000 10.6000000 *
3) B.reb< 40.80799 109 16719.2500 6.4954130
6) A.fouls>=24.51786 23 2349.9130 -2.2173910
12) A.fg2< 0.4551468 16 1266.0000 -5.5000000 *
13) A.fg2>=0.4551468 7 517.4286 5.2857140 *
7) A.fouls< 24.51786 86 12156.3800 8.8255810
14) B.fouls< 22.80863 24 3271.9580 2.5416670
28) A.3pt< 0.3738479 9 626.0000 -6.0000000 *
29) A.3pt>=0.3738479 15 1595.3330 7.6666670 *
15) B.fouls>=22.80863 62 7569.8710 11.2580600
30) A.fouls< 22.32999 18 1650.5000 5.5000000 *
31) A.fouls>=22.32999 44 5078.4320 13.6136400
62) A.ft.drawn>=29.18849 7 208.8571 3.8571430 *
63) A.ft.drawn< 29.18849 37 4077.1890 15.4594600
126) A.fg2< 0.4588535 18 1696.5000 11.5000000 *
127) A.fg2>=0.4588535 19 1831.1580 19.2105300 *
The problem is not, I believe, that you have a matrix rather than a data frame. When I download and then load you data set, I get a data frame, not a matrix.
The problem is that you have bad characters in the column names. Use gsub
to remove the characters "-", " ", "(" and ")" from the column names. Or you can simply redefine the column names yourself entirely using colnames
.
Or do as ulvund does and simply call data.frame
, which forces R to do the column name cleaning for you, by default.
When I do this, rpart
runs just fine.