Count values separated by a comma in a character string
These two approaches are each short, work on vectors of strings, do not involve the expense of explicitly constructing the split string and do not use any packages. Here d
is a vector of strings such as d <- c("1,2,3", "5,2")
:
1) count.fields
count.fields(textConnection(d), sep = ",")
2) gregexpr
lengths(gregexpr(",", d)) + 1
Here is a possibility
> as.numeric(unlist(strsplit("30,3", ",")))
# 30 3
You could also try stringi
package stri_count_*
funcitons (should be very effcient)
library(stringi)
stri_count_regex(d, "\\d+")
## [1] 2
stri_count_fixed(d, ",") + 1
## [1] 2
stringr
package has a similar functionality
library(stringr)
str_count(d, "\\d+")
## [1] 2
Update:
If you want to subset your data set by length 2 vectors, could try
df[stri_count_regex(df$d, "\\d+") == 2,, drop = FALSE]
# d
# 2 30,5
Or simpler
subset(df, stri_count_regex(d, "\\d+") == 2)
# d
# 2 30,5
Update #2
Here's a benchmark that illustrates why one should consider using external packages (@rengis answer wasn't included because it doesn't answer the question)
library(microbenchmark)
library(stringi)
d <- rep("30,3", 1e4)
microbenchmark( akrun = nchar(gsub('[^,]', '', d))+1,
GG1 = count.fields(textConnection(d), sep = ","),
GG2 = sapply(gregexpr(",", d), length) + 1,
DA1 = stri_count_regex(d, "\\d+"),
DA2 = stri_count_fixed(d, ",") + 1)
# Unit: microseconds
# expr min lq mean median uq max neval
# akrun 8817.950 9479.9485 11489.7282 10642.4895 12480.845 46538.39 100
# GG1 55451.474 61906.2460 72324.0820 68783.9935 78980.216 150673.72 100
# GG2 33026.455 43349.5900 60960.8762 51825.6845 72293.923 203126.27 100
# DA1 4730.302 5120.5145 6206.8297 5550.7930 7179.536 10507.09 100
# DA2 380.147 418.2395 534.6911 448.2405 597.259 2278.11 100
You could use scan
.
v1 <- scan(text=d, sep=',', what=numeric(), quiet=TRUE)
v1
#[1] 30 3
Or using stri_split
from stringi
. This should take both character
and factor
class without converting explicitly to character using as.character
library(stringi)
v2 <- as.numeric(unlist(stri_split(d,fixed=',')))
v2
#[1] 30 3
You can do the count
using base R
by
length(v1)
#[1] 2
Or
nchar(gsub('[^,]', '', d))+1
#[1] 2
Visualize the regex
[^,]
Debuggex Demo
Update
If d
is a column in a dataset df
and want to subset rows with number of digits equals 2
d<-c("30,3,5","30,5")
df <- data.frame(d,stringsAsFactors=FALSE)
df[nchar(gsub('[^,]', '',df$d))+1==2,,drop=FALSE]
# d
#2 30,5
Just to test
df[nchar(gsub('[^,]', '',df$d))+1==10,,drop=FALSE]
#[1] d
#<0 rows> (or 0-length row.names)