R: Capitalizing everything after a certain character
You were very close:
gsub("(_.*)","\\U\\1",x,perl=TRUE)
seems to work. You just needed to use _.*
(underscore followed by zero or more other characters) rather than _*
(zero or more underscores) ...
To take this apart a bit more:
_.*
gives a regular expression pattern that matches an underscore_
followed by any number (including 0) of additional characters;.
denotes "any character" and*
denotes "zero or more repeats of the previous element"- surrounding this regular expression with parentheses
()
denotes that it is a pattern we want to store \\1
in the replacement string says "insert the contents of the first matched pattern", i.e. whatever matched_.*
\\U
, in conjunction withperl=TRUE
, says "put what follows in upper case" (uppercasing_
has no effect; if we wanted to capitalize everything after (for example) a lower-case g, we would need to exclude the g from the stored pattern and include it in the replacement pattern:gsub("g(.*)","g\\U\\1",x,perl=TRUE)
)
For more details, search for "replacement" and "capitalizing" in ?gsub
(and ?regexp
for general information about regular expressions)
gsubfn
in the gsubfn package is like gsub
except the replacement string can be a function. Here we match _ and everything afterwards feeding the match through toupper
:
> library(gsubfn)
>
> gsubfn("_.*", toupper, x)
[1] "NYC_23DF" "BOS_3_RB" "mgh_3_3_F"
Note that this approach involves a particularly simple regular expression.