grepl in R to find matches to any of a list of character strings
Not sure what you tried but this seems to work:
data$keep <- ifelse(grepl(paste(matches, collapse = "|"), data$animal), "Keep","Discard")
Similar to the answer you linked to.
The trick is using the paste:
paste(matches, collapse = "|")
#[1] "cat|dog"
So it creates a regular expression with either dog OR cat and would also work with a long list of patterns without typing each.
Edit:
In case you are doing this to later on subset the data.frame according to "Keep" and "Discard" entries, you could do this more directly using:
data[grepl(paste(matches, collapse = "|"), data$animal),]
This way, the results of grepl
which are TRUE or FALSE are used for the subset.
You can use an "or" (|
) statement inside the regular expression of grepl
.
ifelse(grepl("dog|cat", data$animal), "keep", "discard")
# [1] "keep" "keep" "discard" "keep" "keep" "keep" "keep" "discard"
# [9] "keep" "keep" "keep" "keep" "keep" "keep" "discard" "keep"
#[17] "discard" "keep" "keep" "discard" "keep" "keep" "discard" "keep"
#[25] "keep" "keep" "keep" "keep" "keep" "keep" "keep" "keep"
#[33] "keep" "discard" "keep" "discard" "keep" "discard" "keep" "keep"
#[41] "keep" "keep" "keep" "keep" "keep" "keep" "keep" "keep"
#[49] "keep" "discard"
The regular expression dog|cat
tells the regular expression engine to look for either "dog"
or "cat"
, and return the matches for both.
Try to avoid ifelse
as much as possible. This, for example, works nicely
c("Discard", "Keep")[grepl("(dog|cat)", data$animal) + 1]
For a 123
seed you will get
## [1] "Keep" "Keep" "Discard" "Keep" "Keep" "Keep" "Discard" "Keep"
## [9] "Discard" "Discard" "Keep" "Discard" "Keep" "Discard" "Keep" "Keep"
## [17] "Keep" "Keep" "Keep" "Keep" "Keep" "Keep" "Keep" "Keep"
## [25] "Keep" "Keep" "Discard" "Discard" "Keep" "Keep" "Keep" "Keep"
## [33] "Keep" "Keep" "Keep" "Discard" "Keep" "Keep" "Keep" "Keep"
## [41] "Keep" "Discard" "Discard" "Keep" "Keep" "Keep" "Keep" "Discard"
## [49] "Keep" "Keep"