Format labels produced by cut() as percentages

Use gsub with some regex after multiplying your original data by 100:

gsub("([0-9.]+)","\\1%",levels(cut(x*100,breaks=10)))
 [1] "(0.449%,10.4%]" "(10.4%,20.3%]"  "(20.3%,30.2%]"  "(30.2%,40.2%]"  "(40.2%,50.1%]"  "(50.1%,60%]"    "(60%,69.9%]"    "(69.9%,79.9%]"  "(79.9%,89.8%]"  "(89.8%,99.7%]"

Why not copy the code for cut.default and create your own version with modified levels? See this gist.

Two lines were changed:

Line 22: ch.br <- formatC(breaks, digits = dig, width = 1) changed to ch.br <- formatC(breaks*100, digits = dig, width = 1).

Line 29: else "[", ch.br[-nb], ",", ch.br[-1L], if (right) changed to else "[", ch.br[-nb], "%, ", ch.br[-1L], "%", if (right)

The rest is the same. And here it is in action:

library(devtools)
source_gist(4593967)

set.seed(1)
x <- runif(100)
levels(cut2(x, breaks=10))
#  [1] "(1.24%, 11%]"   "(11%, 20.9%]"   "(20.9%, 30.7%]" "(30.7%, 40.5%]" "(40.5%, 50.3%]"
#  [6] "(50.3%, 60.1%]" "(60.1%, 69.9%]" "(69.9%, 79.7%]" "(79.7%, 89.5%]" "(89.5%, 99.3%]"

I have implemented cut_format() in version 0.2-3 of my kimisc package, version 0.3 is on CRAN now.

# devtools::install_github("krlmlr/kimisc")
x <- seq(0.1, 0.9, by = 0.2)

breaks <- seq(0, 1, by = 0.25)

cut(x, breaks)
## [1] (0,0.25]   (0.25,0.5] (0.25,0.5] (0.5,0.75] (0.75,1]  
## Levels: (0,0.25] (0.25,0.5] (0.5,0.75] (0.75,1]

cut_format(x, breaks, format_fun = scales::percent)
## [1] (0%, 25%]   (25%, 50%]  (25%, 50%]  (50%, 75%]  (75%, 100%]
## Levels: (0%, 25%] (25%, 50%] (50%, 75%] (75%, 100%]

It's still not perfect, passing the number of breaks (as in the original example) doesn't work yet.

Tags:

R