R: how to sample without replacement AND without consecutive same values
Maybe using replicate()
with a repeat
loop is faster. here an example with 3
sequences. Looks like this would take approx. 1490 seconds with 300
(not tested).
set.seed(42)
seqc <- rep(1:4, each=12) # starting sequence
system.time(
res <- replicate(3, {
repeat {
seqcs <- sample(seqc, 48, replace=FALSE)
if (!any(diff(seqcs) == 0)) break
}
seqcs
})
)
# user system elapsed
# 14.88 0.00 14.90
res[1:10, ]
# [,1] [,2] [,3]
# [1,] 4 2 3
# [2,] 1 1 4
# [3,] 3 2 1
# [4,] 1 1 4
# [5,] 2 3 1
# [6,] 4 1 2
# [7,] 3 4 4
# [8,] 2 1 1
# [9,] 3 4 4
# [10,] 4 3 2
You can take out consecutive values and placing them where they are not consecutive.
unConsecutive <- function(x) {
repeat{
tt <- c(FALSE, diff(x)==0)
if(any(tt)) {
y <- x[which(tt)]
x <- x[which(!tt)]
i <- x != y[1]
i <- which(c(c(TRUE, diff(i)==0) & i,FALSE)
| c(FALSE, c(diff(i)==0, TRUE) & i))
if(length(i) > 0) {
i <- i[1]-1
x <- c(x[seq_len(i)], y, x[i+seq_len(length(x)-i)])
} else {
x <- c(x, y)
break
}
} else {break}
}
x
}
unConsecutive(c(1,1,2))
#[1] 1 2 1
unConsecutive(c(1,1,1))
#[1] 1 1 1
set.seed(7)
system.time(
res <- replicate(300, unConsecutive(sample(rep(1:4,12))))
)
# user system elapsed
# 0.058 0.011 0.069
all(apply(res, 2, table) == 12)
#[1] TRUE
all(apply(res, 2, diff) != 0)
#[1] TRUE
Another option is to use a Markov Chain Monte-Carlo method to swap 2 numbers randomly and move to the new sample only when 1) we are not swapping the same number and 2) no 2 identical numbers are adjacent. To address correlated samples, we can generate a lot of samples and then randomly select 300 of them:
v <- rep(1:4, 12)
l <- 48
nr <- 3e5
m <- matrix(0, nrow=nr, ncol=l)
count <- 0
while(count < nr) {
i <- sample(l, 2)
if (i[1L] != i[2L]) {
v[i] = v[i[2:1]]
if (!any(diff(v)==0)) {
count <- count + 1
m[count, ] <- v
} else {
v[i] = v[i[2:1]]
}
}
}
a <- m[sample(nr, 300),]
a