Getting the unique count of strings from a text string

You could use str_extract_all and then calculate the length of the unique elements.

Input:

A <- c('I have a lot of pineapples, apples and grapes. One day the pineapples person gave the apples person two baskets of grapes')
fruits <- "apples|pineapples|grapes|bananas"

Result

length(unique(c(stringr::str_extract_all(A, fruits, simplify = TRUE))))
# [1] 3

Not exactly elegant, but you could use str_detect like this.

sum(str_detect(df$A, "apples"), 
    str_detect(df$A, "pineapples"), 
    str_detect(df$A, "grapes"), 
    str_detect(df$A, "bananas"))

Or, based on the comments below, if you put all these terms in their own vector you could then use an apply function:

fruits <- c("apples", "pineapples", "grapes", "bananas")
sum(sapply(fruits, function(x) str_detect(df$A, x)))

One base possibility could be:

length(unique(unlist(regmatches(A, gregexpr("apples|pineapples|grapes|bananas", A, perl = TRUE)))))

[1] 3

Or in a shortened form:

fruits <- c("apples|pineapples|grapes|bananas")
length(unique(unlist(regmatches(A, gregexpr(fruits, A, perl = TRUE)))))

Tags:

R

Stringr

Dplyr

Tm