less clunky reshaping of anscombe data
If you are really dealing with the "anscombe" dataset, then I would say @Thela's reshape
solution is very direct.
However, here are a few other options to consider:
Option 1: Base R
You can write your own "reshape" function, perhaps something like this:
myReshape <- function(indf = anscombe, stubs = c("x", "y")) {
temp <- sapply(stubs, function(x) {
unlist(indf[grep(x, names(indf))], use.names = FALSE)
})
s <- rep(seq_along(grep(stubs[1], names(indf))), each = nrow(indf))
data.frame(s, temp)
}
Notes:
- I'm not sure that this is necessarily less clunky than what you're already doing
- This approach will not work if the data are "unbalanced" (for example, more "x" columns than "y" columns.)
Option 2: "dplyr" + "tidyr"
Since pipes are the rage these days, you can also try:
library(dplyr)
library(tidyr)
anscombe %>%
gather(var, val, everything()) %>%
extract(var, into = c("variable", "s"), "(.)(.)") %>%
group_by(variable, s) %>%
mutate(ind = sequence(n())) %>%
spread(variable, val)
Notes:
- I'm not sure that this is necessarily less clunky than what you're already doing, but some people like the pipe approach.
- This approach should be able to handle unbalanced data.
Option 3: "splitstackshape"
Before @Arun went and did all that fantastic work on melt.data.table
, I had written merged.stack
in my "splitstackshape" package. With that, the approach would be:
library(splitstackshape)
setnames(
merged.stack(
data.table(anscombe, keep.rownames = TRUE),
var.stubs = c("x", "y"), sep = "var.stubs"),
".time_1", "s")[]
A few notes:
merged.stack
needs something to treat as an "id", hence the need fordata.table(anscombe, keep.rownames = TRUE)
, which adds a column named "rn" with the row numbers- The
sep = "var.stubs"
basically means that we don't really have a separator variable, so we'll just strip out the stub and use whatever remains for the "time" variable merged.stack
will work if the data are unbalanced. For instance, try using it withanscombe2 <- anscombe[1:7]
as your dataset instead of "anscombe".- The same package also has a function called
Reshape
that builds uponreshape
to let it reshape unbalanced data. But it's slower and less flexible thanmerged.stack
. The basic approach would beReshape(data.table(anscombe, keep.rownames = TRUE), var.stubs = c("x", "y"), sep = "")
and then rename the "time" variable usingsetnames
.
Option 4: melt.data.table
This was mentioned in the comments above, but hasn't been shared as an answer. Outside of base R's reshape
, this is a very direct approach that handles column renaming from within the function itself:
library(data.table)
melt(as.data.table(anscombe),
measure.vars = patterns(c("x", "y")),
value.name=c('x', 'y'),
variable.name = "s")
Notes:
- Will be insanely fast.
- Much better supported than "splitstackshape" or
reshape
;-) - Handles unbalanced data just fine.
I think this meets the criteria of being 1) short 2) comprehensible and 3) no hardcoded column numbers. And it doesn't require any other packages.
reshape(anscombe, varying=TRUE, sep="", direction="long", timevar="s")
# s x y id
#1.1 1 10 8.04 1
#...
#11.1 1 5 5.68 11
#1.2 2 10 9.14 1
#...
#11.2 2 5 4.74 11
#1.3 3 10 7.46 1
#...
#11.3 3 5 5.73 11
#1.4 4 8 6.58 1
#...
#11.4 4 8 6.89 11
I don't know if a non-reshape solution would be acceptable, but here you go:
library(data.table)
#create the pattern that will have the Xs
#this will make it easy to create the Ys
pattern <- 1:4
#use Map to create a list of data.frames with the needed columns
#and also use rbindlist to rbind the list produced by Map
lists <- rbindlist(Map(data.frame,
pattern,
anscombe[pattern],
anscombe[pattern+length(pattern)]
)
)
#set the correct names
setnames(lists, names(lists), c('s','x','y'))
Output:
> lists
s x y
1: 1 10 8.04
2: 1 8 6.95
3: 1 13 7.58
4: 1 9 8.81
5: 1 11 8.33
6: 1 14 9.96
7: 1 6 7.24
8: 1 4 4.26
9: 1 12 10.84
10: 1 7 4.82
....