Using regexp to select rows in R dataframe

Here you go.

First recreate your data:

dat <- read.table(text="
aName   bName   pName   call  alleles   logRatio    strength
AX-11086564 F08_ADN103  2011-02-10_R10  AB  CG  0.363371    10.184215
AX-11086564 A01_CD1919  2011-02-24_R11  BB  GG  -1.352707   9.54909
AX-11086564 B05_CD2920  2011-01-27_R6   AB  CG  -0.183802   9.766334
AX-11086564 D04_CD5950  2011-02-09_R9   AB  CG  0.162586    10.165051
AX-11086564 D07_CD6025  2011-02-10_R10  AB  CG  -0.397097   9.940238
AX-11086564 B05_CD3630  2011-02-02_R7   AA  CC  2.349906    9.153076
AX-11086564 D04_ADN103  2011-02-10_R2   BB  GG  -1.898088   9.872966
AX-11086564 A01_CD2588  2011-01-27_R5   BB  GG  -1.208094   9.239801
", header=TRUE)

Next, use grepl to construct a logical index of matches:

index1 <- with(dat, grepl("ADN", bName))
index2 <- with(dat, grepl("2011-02-10_R2", pName))

Now subset using the & operator:

dat[index1 & index2, ]
        aName      bName         pName call alleles  logRatio strength
7 AX-11086564 D04_ADN103 2011-02-10_R2   BB      GG -1.898088 9.872966

Corrected according Andrie advice. I hope this should work. :)

df[grepl("ADN", df$bName),]
df[grepl("ADN", df$bName) & df$pName == "2011-02-10_R2",]

subset(dat, grepl("ADN", bName)  &  pName == "2011-02-10_R2" )

Note "&" (and not "&&" which is not vectorized) and that "==" (and not"=" which is assignment).

Note that you could have used:

 dat[ with(dat,  grepl("ADN", bName)  &  pName == "2011-02-10_R2" ) , ]

... and that might be preferable when used inside functions, however, that will return NA values for any lines where dat$pName is NA. That defect (which some regard as a feature) could be removed by the addition of & !is.na(dat$pName) to the logical expression.

Using regexp to select rows in R dataframe

Tags:

Regex

R

Dataframe

Related

Recent Posts