How does "gsub" handle spaces?
[\sb,\sc]
means "one character among space, b
, ,
, space, c
".
You probably want something like (\sb|\sc)
, which means "space followed by b
, or space followed by c
"
or \s[bc]
which means "space followed by b
or c
".
s <- "ab b cde"
gsub( "(\\sb|\\sc)", " ", s, perl=TRUE )
gsub( "\\s[bc]", " ", s, perl=TRUE )
gsub( "[[:space:]][bc]", " ", s, perl=TRUE ) # No backslashes
To remove multiple instances of a letter (as in the second example) include a +
after the letter to be removed.
s2 <- "akui i ii"
gsub("\\si+", " ", s2)
There is a simple solution to this.
gsub("\\s[bc]", " ", "ab b cde", perl=T)
This will give you what you want.
You can use lookbehind matching like this:
gsub("(?<=\\s)i+", " ", "akui i ii", perl=T)
Edit: lookbehind is still the way to go, demonstrated with an other example from your original post. Hope this helps.