How to create an empty datatable with columns names and then append datatables to it?
To create an empty data.table
use (assuming all columns are numeric):
library(data.table)
data <- data.table(va=numeric(), vb=numeric(), vc=numeric())
data
which results in:
> data
Empty data.table (0 rows) of 3 cols: va,vb,vc
To do a self join over all columns use (even though the result is the same ;-):
data2 <- data.table(va=c(-1,0,1), vb=c(-1,0,1), vc=c(-1,0,1))
data2
merge(data2, data2,by=names(data2))
The reason why you have to specify the by
parameter is the documented semantics of merge
:
by:
A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the key of x.
Since you don't have set any keys the "join" columns to merge the data tables are unclear.
There is no implicit "use all column" semantics if you omit the by
parameter (as cited above the shared key columns are taken).
To append all rows of a data.table to another one you use rbind
("row bind") instead of merge
:
data3 <- rbind(data2, data2)
data3
Which results in:
> data3
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
4: -1 -1 -1
5: 0 0 0
6: 1 1 1
To create an empty data.table, you can start from an empty matrix:
library(data.table)
data <- setNames(data.table(matrix(nrow = 0, ncol = 3)), c("va", "vb", "vc"))
data
Empty data.table (0 rows) of 3 cols: va,vb,vc
Then you can use rbindlist
to append new data.table to it:
data2=data.table(va=c(-1,0,1), vb=c(-1,0,1), vc=c(-1,0,1))
data2
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
rbindlist(list(data, data2))
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
Or even simpler, the following also works:
data <- data.table()
data <- rbindlist(list(data, data2))
data
va vb vc
1: -1 -1 -1
2: 0 0 0
3: 1 1 1
Another way to create an empty data.table with defined column names but without having to define data types:
data <- data.table(1)[,`:=`(c("va", "vb", "vc"),NA)][,V1:=NULL][.0]
This does the following
data.table(1)
: Create a non-NULL
data.table to which you can add columns- Has a one column
V1
with one row. Value1
- You can use any value (other than
NULL
) in the place of1
- Has a one column
[,`:=`(c("va", "vb", "vc"),NA)]
: Add columnsva
,vb
,vc
- Now has four columns (starting with
V1
) and one row. value1,NA,NA,NA
- Any non-
NULL
value can be substituted for NA
- Now has four columns (starting with
[,V1:=NULL]
: Remove theV1
column[.0]
: Return a blank row- You can actually use [.n] where n is any integer.
If you don't like the black magic of [.0]
you can also use
data <- data.table(1)[,`:=`(c("va", "vb", "vc"),NA)][,V1:=NULL][!is.na(va)]