Convert comma separated string to numeric columns

Old question, but I came across it from another recent question (which seems unrelated).

Both existing answers are appropriate, but I wanted to share an answer related to a package I have created called "splitstackshape" that is fast and has straightforward syntax.

Here's some sample data:

set.seed(1)
dat = data.frame(
  reaction_time = apply(matrix(round(
    runif(24, 1, 2000)), 6, 4), 1, paste, collapse = ","))

This is the splitting:

library(splitstackshape)
cSplit(dat, "reaction_time", ",")
#    reaction_time_1 reaction_time_2 reaction_time_3 reaction_time_4
# 1:             532            1889            1374             761
# 2:             745            1322             769            1555
# 3:            1146            1259            1540            1869
# 4:            1817             125             996             425
# 5:             404             413            1436            1304
# 6:            1797             354            1984             252

And, optionally, if you need to take the rowMeans:

rowMeans(cSplit(dat, "reaction_time", ","))
# [1] 1139.00 1097.75 1453.50  840.75  889.25 1096.75

I think you are looking for the strsplit() function;

a = "2000,1450,1800,2200"
strsplit(a, ",")
[[1]]                                                                                                                                                       
[1] "2000" "1450" "1800" "2200"

Notice that strsplit returns a list, in this case with only one element. This is because strsplit takes vectors as input. Therefore, you can also put a long vector of your single cell characters into the function and get back a splitted list of that vector. In a more relevant example this look like:

# Create some example data
dat = data.frame(reaction_time = 
       apply(matrix(round(runif(100, 1, 2000)), 
                     25, 4), 1, paste, collapse = ","),
                     stringsAsFactors=FALSE)
splitdat = do.call("rbind", strsplit(dat$reaction_time, ","))
splitdat = data.frame(apply(splitdat, 2, as.numeric))
names(splitdat) = paste("trial", 1:4, sep = "")
head(splitdat)
  trial1 trial2 trial3 trial4
1    597   1071   1430    997
2    614    322   1242   1140
3   1522   1679     51   1120
4    225   1988   1938   1068
5    621    623   1174     55
6   1918   1828    136   1816

and finally, to calculate the mean per person:

apply(splitdat, 1, mean)
[1] 1187.50  361.25  963.75 1017.00  916.25 1409.50  730.00 1310.75 1133.75
[10]  851.25  914.75  881.25  889.00 1014.75  676.75  850.50  805.00 1460.00
[19]  901.00 1443.50  507.25  691.50 1090.00  833.25  669.25

A nifty, if rather heavy-handed, way is to use read.csv in conjunction with textConnection. Assuming your data is in a data frame, df:

x <- read.csv(textConnection(df[["reaction times"]]))

Convert comma separated string to numeric columns

Tags:

Csv

R

Related

Recent Posts