Using CUT and Quartile to generate breaks in R function
There is also cut2 in the venerable Hmisc package. It does quantile cuts.
From the help:
Function like cut but left endpoints are inclusive and labels are of the form [lower, upper), except that last interval is [lower,upper]. If cuts are given, will by default make sure that cuts include entire range of x. Also, if cuts are not given, will cut x into quantile groups (g given) or groups with a given minimum number of observations (m). Whereas cut creates a category object, cut2 creates a factor object.
You can very easily accomplish this automatically with the content
method in the bin
function in the OneR package:
library(OneR)
set.seed(700)
clientID <- round(runif(200, min = 2000, max = 3000), 0)
orders <- round(runif(200, min = 1, max = 50), 0)
df <- data.frame(cbind(clientID, orders))
df$Quintiles <- bin(df$orders, method = "content")
table(df$Quintile)
##
## (0.952,9.8] (9.8,19] (19,31.4] (31.4,38.2] (38.2,49]
## 40 41 39 40 40
(Full disclosure: I am the author of this package)
Try the following:
set.seed(700)
clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)
df <- df <- data.frame(cbind(clientID,orders))
ApplyQuintiles <- function(x) {
cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))),
labels=c("0-20","20-40","40-60","60-80","80-100"), include.lowest=TRUE)
}
df$Quintile <- sapply(df$orders, ApplyQuintiles)
table(df$Quintile)
0-20 20-40 40-60 60-80 80-100
40 41 39 40 40
I included include.lowest=TRUE
in your cut function, which seems to make it work. See ?cut
for more details.