Count the number of pattern matches in a string
You can use gregexpr
to find the positions of "CG"
in vec
. We have to check whether there was no match (-1
). The function sum
counts the number of matches.
> vec <- "AAAAAAACGAAAAAACGAAADGCGEDCG"
> sum(gregexpr("CG", vec)[[1]] != -1)
[1] 4
If you have a vector of strings, you can use sapply
:
> vec <- c("ACACACACA", "GGAGGAGGAG", "AACAACAACAAC", "GGCCCGCCGC", "TTTTGTT", "AGAGAGA")
> sapply(gregexpr("CG", vec), function(x) sum(x != -1))
[1] 0 0 0 2 0 0
If you have a list of strings, you can use unlist(vec)
and then use the solution above.
The Bioconductor package Biostrings has a matchPattern function
countGC <- matchPattern("GC",DNSstring_object)
Note that DNSstring_object
is FASTA sequence read in using the biostring function readDNAStringSet
or readAAStringSet
Use str_count
from stringr
. It's simple to remember and read, though not a base function.
library(stringr)
str_count("AAAAAAACGAAAAAACGAAADGCGEDCG", "CG")
# [1] 4