Using CUT and Quartile to generate breaks in R function

There is also cut2 in the venerable Hmisc package. It does quantile cuts.

From the help:

Function like cut but left endpoints are inclusive and labels are of the form [lower, upper), except that last interval is [lower,upper]. If cuts are given, will by default make sure that cuts include entire range of x. Also, if cuts are not given, will cut x into quantile groups (g given) or groups with a given minimum number of observations (m). Whereas cut creates a category object, cut2 creates a factor object.


You can very easily accomplish this automatically with the content method in the bin function in the OneR package:

library(OneR)
set.seed(700)

clientID <- round(runif(200, min = 2000, max = 3000), 0)
orders <- round(runif(200, min = 1, max = 50), 0)
df <- data.frame(cbind(clientID, orders))

df$Quintiles <- bin(df$orders, method = "content")
table(df$Quintile)
## 
## (0.952,9.8]    (9.8,19]   (19,31.4] (31.4,38.2]   (38.2,49] 
##          40          41          39          40          40

(Full disclosure: I am the author of this package)


Try the following:

set.seed(700)

clientID <- round(runif(200,min=2000, max=3000),0)
orders <- round(runif(200,min=1, max=50),0)

df <- df <- data.frame(cbind(clientID,orders))

ApplyQuintiles <- function(x) {
  cut(x, breaks=c(quantile(df$orders, probs = seq(0, 1, by = 0.20))), 
      labels=c("0-20","20-40","40-60","60-80","80-100"), include.lowest=TRUE)
}
df$Quintile <- sapply(df$orders, ApplyQuintiles)
table(df$Quintile)

0-20  20-40  40-60  60-80 80-100 
  40     41     39     40     40 

I included include.lowest=TRUE in your cut function, which seems to make it work. See ?cut for more details.

Tags:

R

Cut