Dictionary style replace multiple items
map = setNames(c("0101", "0102", "0103"), c("AA", "AC", "AG"))
foo[] <- map[unlist(foo)]
assuming that map
covers all the cases in foo
. This would feel less like a 'hack' and be more efficient in both space and time if foo
were a matrix (of character()), then
matrix(map[foo], nrow=nrow(foo), dimnames=dimnames(foo))
Both matrix and data frame variants run afoul of R's 2^31-1 limit on vector size when there are millions of SNPs and thousands of samples.
Here is a quick solution
dict = list(AA = '0101', AC = '0102', AG = '0103')
foo2 = foo
for (i in 1:3){foo2 <- replace(foo2, foo2 == names(dict[i]), dict[i])}
One of the most readable way to replace value in a string or a vector of string with a dictionary is stringr::str_replace_all
, from the stringr
package. Beware: this method is based on regex (see here). The pattern needed by str_replace_all
can be a dictionnary, expressed as a list: c("regex" = "desired value")
.
# 1. Made your dictionnary
dictio_replace= c("AA"= "0101",
"AC"= "0102",
"AG"= "0103") # short example of dictionnary.
# 2. Replace all pattern, according to the dictionary-values (only a single vector of string, or a single string)
foo$snp1 <- stringr::str_replace_all(string = foo$snp1,
pattern= dictio_replace) # we only use the 'pattern' option here: 'replacement' is useless since we provide a dictionnary.
Repeat step 2 with foo$snp2 & foo$snp3. If you have more vectors to transform it's a good idea to use another func', in order to replace values in each of the columns/vector in the dataframe without repeating yourself.
If you're open to using packages, plyr
is a very popular one and has this handy mapvalues() function that will do just what you're looking for:
foo <- mapvalues(foo, from=c("AA", "AC", "AG"), to=c("0101", "0102", "0103"))
Note that it works for data types of all kinds, not just strings.