Getting the unique count of strings from a text string
You could use str_extract_all
and then calculate the length of the unique elements.
Input:
A <- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')
fruits <- "apples|pineapples|grapes|bananas"
Result
length(unique(c(stringr::str_extract_all(A, fruits, simplify = TRUE))))
# [1] 3
Not exactly elegant, but you could use str_detect
like this.
sum(str_detect(df$A, "apples"),
str_detect(df$A, "pineapples"),
str_detect(df$A, "grapes"),
str_detect(df$A, "bananas"))
Or, based on the comments below, if you put all these terms in their own vector you could then use an apply function:
fruits <- c("apples", "pineapples", "grapes", "bananas")
sum(sapply(fruits, function(x) str_detect(df$A, x)))
One base possibility could be:
length(unique(unlist(regmatches(A, gregexpr("apples|pineapples|grapes|bananas", A, perl = TRUE)))))
[1] 3
Or in a shortened form:
fruits <- c("apples|pineapples|grapes|bananas")
length(unique(unlist(regmatches(A, gregexpr(fruits, A, perl = TRUE)))))