Collapse consecutive runs of numbers to a string of ranges
Adding another alternative, you could use a deparse
ing approach. For example:
deparse(c(1L, 2L, 3L))
#[1] "1:3"
Taking advantage of as.character
"deparse"ing a given "list" as input, we could use:
as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3" "5" "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"
I assume that the vector is sorted as in the example. If not use vec <- sort(vec)
beforehand.
Edit note: @DavidArenburg spotted a mistake in my original answer where c(min(x), x)
should actually be c(0, x)
. Since we know now that we always need to add a 0
in the first place, we can omit the first step of creating x
and do it "on the fly". The original answer and additional options are now edited to reflect that (you can check the edit history for the original post). Thanks David!
A note on calls to unname
: I used unname(sapply(...))
to ensure that the resulting vector is not named, otherwise it would be named 0:(n-1) where n equals the length of new_vec
. As @Tensibai noted correctly in the comments, this doesn't matter if the final aim is to generate a length-1 character vector as produced by running toString(new_vec)
since vector names will be omitted by toString
anyway.
One option (possibly not the shortest) would be:
new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
if(length(y) == 1) y else paste0(head(y, 1), "-", tail(y, 1))
}))
Result:
new_vec
#[1] "1-3" "5" "7-12"
toString(new_vec)
#[1] "1-3, 5, 7-12"
Thanks to @Zelazny7 it can be shortened by using the range
function:
new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
paste(unique(range(y)), collapse='-')
}))
Thanks to @DavidArenburg it can be further shortened by using tapply
instead of sapply
+ split
:
new_vec <- unname(tapply(vec, c(0, cumsum(diff(vec) > 1)), function(y) {
paste(unique(range(y)), collapse = "-")
}))
EDITS: I sped up docendo's code quite a bit by sorting the vector first, so now they are actually on equal footing.
I also added alexis' approach.
readable_integers <- function(integers)
{
integers <- sort(unique(integers))
group <- cumsum(c(0, diff(integers)) != 1)
paste0(vapply(split(integers, group),
function(x){
if (length(x) == 1) as.character(x)
else paste0(range(x), collapse = "-")
},
character(1)),
collapse = "; ")
}
library(microbenchmark)
vec = c(1, 2, 3, 5, 7, 8, 9, 10, 11, 12)
microbenchmark(
docendo = {vec <- sort(vec)
x <- cumsum(diff(vec) > 1)
toString(tapply(vec, c(min(x), x), function(y) paste(unique(range(y)), )collapse = "-"))
},
Benjamin = readable_integers(vec),
alexis = {vec <- sort(vec)
as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
toString(gsub(":", "-", .Last.value))}
)
Unit: microseconds
expr min lq mean median uq max neval
docendo 205.273 220.3755 230.3134 228.293 235.4780 467.142 100
Benjamin 121.991 128.4420 135.5302 133.574 143.3980 161.286 100
alexis 121.698 128.0030 137.0374 136.507 143.3975 169.790 100
set.seed(pi)
vec = sample(1:1000, 900)
set.seed(pi)
vec = sample(1:1000, 900)
microbenchmark(
docendo = {vec <- sort(vec)
x <- cumsum(diff(vec) > 1)
toString(tapply(sort(vec), c(min(x), x), function(y) paste(unique(range(y)), collapse = "-")))
},
Benjamin = readable_integers(vec),
alexis = {vec <- sort(vec)
as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
toString(gsub(":", "-", .Last.value))}
)
Unit: microseconds
expr min lq mean median uq max neval
docendo 1307.294 1353.7735 1420.3088 1379.7265 1427.8190 2554.473 100
Benjamin 615.525 626.8155 661.2513 638.8385 665.3765 1676.493 100
alexis 799.684 808.3355 866.1516 820.0650 833.2615 1974.138 100