Output a numeric value from 'cut()' in R

Much of the behavior of cut is related to creating the labels that you're not interested in. You're probably better off using findInterval or .bincode.

You would start with the data

set.seed(17)
df <- data.frame(x=300 * runif(100))

Then set the breaks and find the intervals:

breaks <- c(0,25,75,125,175,225,299)
df$interval <- findInterval(df$x, breaks)
df$start <- breaks[df$interval]
df$end <- breaks[df$interval + 1]

I'm guessing at what you want, since if you wanted the "original numbers", you could just use df$x. I presume you are after some number to reflect the group? In that guess, what about the following.

## Generate some example data
x = runif(5, 0, 300)
## Specify the labels
labels = c(0,25,75,125,175,225)
## Use cut as before
y = cut(x, 
    breaks = c(0,25,75,125,175,225,300),
    labels = labels,
    right = TRUE)

When we convert y to a numeric, this gives the index of the label. Hence,

labels[as.numeric(y)]

or simpler

labels[y]

I would go for the usage of regex since all the information is in the output of cut.

cut_borders <- function(x){
pattern <- "(\\(|\\[)(-*[0-9]+\\.*[0-9]*),(-*[0-9]+\\.*[0-9]*)(\\)|\\])"

start <- as.numeric(gsub(pattern,"\\2", x))
end <- as.numeric(gsub(pattern,"\\3", x))

data.frame(start, end)
}

The pattern in words:

  • Group 1: either a ( or a [, so we use (\\(|\\[).

  • Group 2: number might be negative, so we (-*), we are looking for at least one number ([0-9]+) which can have decimal places, i.e. a point (\\.*) and decimals after point ([0-9]*).

  • Next there is a comma (,)

  • Group 3: same as group 2.

  • Group 4: analog to group 1 we are expecting either a ) or a ].

Here is some random variable cut with quantiles. The function cut_borders returns what we are looking for:

x <- rnorm(10)

x_groups <- cut(x, quantile(x, 0:4/4), include.lowest= TRUE)

cut_borders(x_groups)

Tags:

R

Cut