How to subset consecutive rows if they meet a condition
An approach with data.table
which is slightly different from @jlhoward's approach (using the same data):
library(data.table)
setDT(df)
df[, hotday := +(MAX>=44.5 & MIN>=24.5)
][, hw.length := with(rle(hotday), rep(lengths,lengths))
][hotday == 0, hw.length := 0]
this produces a datatable with a heat wave length variable (hw.length
) instead of a TRUE
/FALSE
variable for a specific heat wave length:
> df
YEAR MONTH DAY MAX MIN hotday hw.length
1: 1989 7 18 45.0 23.5 0 0
2: 1989 7 19 44.2 26.1 0 0
3: 1989 7 20 44.7 24.4 0 0
4: 1989 7 21 44.6 29.5 1 1
5: 1989 7 22 44.4 31.6 0 0
6: 1989 7 23 44.2 26.7 0 0
7: 1989 7 24 44.5 25.0 1 3
8: 1989 7 25 44.8 26.0 1 3
9: 1989 7 26 44.8 24.6 1 3
10: 1989 7 27 45.0 24.3 0 0
11: 1989 7 28 44.8 26.0 1 1
12: 1989 7 29 44.4 24.0 0 0
13: 1989 7 30 45.2 25.0 1 1
I may be missing something here but I don't see the point of subsetting beforehand. If you have data for every day, in chronological order, you can use run length encoding (see the docs on the rle(...)
function).
In this example we create an artificial data set and define "heat wave" as MAX >= 44.5 and MIN >= 24.5. Then:
# example data set
df <- data.frame(YEAR=1989, MONTH=7, DAY=18:30,
MAX=c(45, 44.2, 44.7, 44.6, 44.4, 44.2, 44.5, 44.8, 44.8, 45, 44.8, 44.4, 45.2),
MIN=c(23.5, 26.1, 24.4, 29.5, 31.6, 26.7, 25, 26, 24.6, 24.3, 26, 24, 25))
r <- with(with(df, rle(MAX>=44.5 & MIN>=24.5)),rep(lengths,lengths))
df$heat.wave <- with(df,MAX>=44.5&MIN>=24.5) & (r>2)
df
# YEAR MONTH DAY MAX MIN heat.wave
# 1 1989 7 18 45.0 23.5 FALSE
# 2 1989 7 19 44.2 26.1 FALSE
# 3 1989 7 20 44.7 24.4 FALSE
# 4 1989 7 21 44.6 29.5 FALSE
# 5 1989 7 22 44.4 31.6 FALSE
# 6 1989 7 23 44.2 26.7 FALSE
# 7 1989 7 24 44.5 25.0 TRUE
# 8 1989 7 25 44.8 26.0 TRUE
# 9 1989 7 26 44.8 24.6 TRUE
# 10 1989 7 27 45.0 24.3 FALSE
# 11 1989 7 28 44.8 26.0 FALSE
# 12 1989 7 29 44.4 24.0 FALSE
# 13 1989 7 30 45.2 25.0 FALSE
This creates a column, heat.wave
which is TRUE
if there was a heat wave on that day. If you need to extract only the hw days, use
df[df$heat.wave,]
# YEAR MONTH DAY MAX MIN heat.wave
# 7 1989 7 24 44.5 25.0 TRUE
# 8 1989 7 25 44.8 26.0 TRUE
# 9 1989 7 26 44.8 24.6 TRUE