How does "gsub" handle spaces?

[\sb,\sc] means "one character among space, b, ,, space, c". You probably want something like (\sb|\sc), which means "space followed by b, or space followed by c" or \s[bc] which means "space followed by b or c".

s <- "ab b cde"
gsub( "(\\sb|\\sc)",     "  ", s, perl=TRUE )
gsub( "\\s[bc]",         "  ", s, perl=TRUE )
gsub( "[[:space:]][bc]", "  ", s, perl=TRUE )  # No backslashes

To remove multiple instances of a letter (as in the second example) include a + after the letter to be removed.

s2 <- "akui i ii"
gsub("\\si+", " ", s2)

There is a simple solution to this.

    gsub("\\s[bc]", " ", "ab b cde", perl=T)

This will give you what you want.


You can use lookbehind matching like this:

gsub("(?<=\\s)i+", " ", "akui i ii", perl=T)

Edit: lookbehind is still the way to go, demonstrated with an other example from your original post. Hope this helps.

Tags:

Regex

R