Increment by 1 for every change in column
These look like a run-length encoding (rle)
x = c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1")
r = rle(x)
with
> rle(x)
Run Length Encoding
lengths: int [1:6] 2 1 1 3 1 2
values : chr [1:6] "a" "1" "0" "b" "c" "1"
This says that the first value ("a") occurred 2 times in a row, then "1" occurred once, etc. What you're after is to create a sequence along the 'lengths', and replicate each element of sequence by the number of times the element occurs, so
> rep(seq_along(r$lengths), r$lengths)
[1] 1 1 2 3 4 4 4 5 6 6
The other answers are semi-deceptive, since they rely on the column being a factor(); they fail when the column is actually a character().
> diff(x)
Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] :
non-numeric argument to binary operator
A work-around would be to map the characters to integers, along the lines of
> diff(match(x, x))
[1] 0 2 1 1 0 0 3 -5 0
Hmm, but having said that I find that rle's don't work on factors!
> f = factor(x)
> rle(f)
Error in rle(factor(x)) : 'x' must be a vector of an atomic type
> rle(as.vector(f))
Run Length Encoding
lengths: int [1:6] 2 1 1 3 1 2
values : chr [1:6] "a" "1" "0" "b" "c" "1"
How about using diff()
and cumsum()
. For example
df$var2 <- cumsum(c(1,diff(df$var1)!=0))
Building on Mr Flick answer:
df$var2 <- cumsum(c(0,as.numeric(diff(df$var1))!=0))
But if you don't want to use diff
you can still use:
df$var2 <- c(0,cumsum(as.numeric(with(df,var1[1:(length(var1)-1)] != var1[2:length(var1)]))))
It starts at 0, not at 1 but I'm sure you see how to change it if you want to.