Storing R Objects in a relational database
Use the serialization feature to turn any R object into a (raw or character) string, then store that string. See help(serialize)
.
Reverse this for retrieval: get the string, then unserialize()
into a R object.
For sqlite
(and possibly others):
CREATE TABLE data (blob BLOB);
Now in R
:
RSQLite::dbGetQuery(db.conn, 'INSERT INTO data VALUES (:blob)', params = list(blob = list(serialize(some_object)))
Note the list
wrapper around some_object
. The output of serialize
is a raw vector. Without list
, the INSERT statement would be executed for each vector element. Wrapping it in a list allows RSQLite::dbGetQuery
to see it as one element.
To get the object back from the database:
some_object <- unserialize(RSQLite::dbGetQuery(db.conn, 'SELECT blob FROM data LIMIT 1')$blob[[1]])
What happens here is you take the field blob
(which is a list since RSQLite doesn't know how many rows will be returned by the query). Since LIMIT 1
assures only 1 row is returned, we take it with [[1]]
, which is the original raw vector. Then you need to unserialize
the raw vector to get your object.
An example R variable, that's fairly complex:
library(nlme)
model <- lme(uptake ~ conc + Treatment, CO2, random = ~ 1 | Plant / Type)
The best storage database method for R variables depends upon how you want to use it.
I need to do in-database analytics on the values
In this case, you need to break the object down into values that the database can handle natively. This usually means converting it into one or more data frames. The easiest way to do this is to use the broom
package.
library(broom)
coefficients_etc <- tidy(model)
model_level_stats <- glance(model)
row_level_stats <- augment(model)
I just want storage
In this case you want to serialize your R variables. That is, converting them to be a string or a binary blob. There are several methods for this.
My data has to be accessible by programs other than R, and needs to be human-readable
You should store your data in a cross-platform text format; probably JSON or YAML. JSON doesn't support some important concepts like Inf
; YAML is more general but the support in R isn't as mature. XML is also possible, but is too verbose to be useful for storing large arrays.
library(RJSONIO)
model_as_json <- toJSON(model)
nchar(model_as_json) # 17916
library(yaml)
# yaml package doesn't yet support conversion of language objects,
# so preprocessing is needed
model2 <- within(
model,
{
call <- as.character(call)
terms <- as.character(terms)
}
)
model_as_yaml <- as.yaml(model2)
nchar(model_as_yaml) # 14493
My data has to be accessible by programs other than R, and doesn't need to be human-readable
You could write your data to an open, cross-platform binary format like HFD5. Currently support for HFD5 files (via rhdf5
) is limited, so complex objects are not supported. (You'll probably need to unclass
everything.)
library(rhdf5)
h5save(rapply(model2, unclass, how = "replace"), file = "model.h5")
bin_h5 <- readBin("model.h5", "raw", 1e6)
length(bin_h5) # 88291 not very efficient in this case
The feather
package let's you save data frames in a format readable by both R and Python. To use this, you would first have to convert the model object into data frames, as described in the broom section earlier in the answer.
library(feather)
library(broom)
write_feather(augment(model), "co2_row.feather") # 5474 bytes
write_feather(tidy(model), "co2_coeff.feather") # 2093 bytes
write_feather(glance(model), "co2_model.feather") # 562 bytes
Another alternative is to save a text version of the variable (see previous section) to a zipped file and store its bytes in the database.
writeLines(model_as_json)
tar("model.tar.bz", "model.txt", compression = "bzip2")
bin_bzip <- readBin("model.tar.bz", "raw", 1e6)
length(bin_bzip) # only 42 bytes!
My data only needs to be accessible by R, and needs to be human-readable
There are two options for turning a variable into a string: serialize
and deparse
.
p <- function(x)
{
paste0(x, collapse = "\n")
}
serialize
needs to be sent to a text connection, and rather than writing to file, you can write to the console and capture it.
model_serialized <- p(capture.output(serialize(model, stdout())))
nchar(model_serialized) # 23830
Use deparse
with control = "all"
to maximise the reversibility when re-parsing later.
model_deparsed <- p(deparse(model, control = "all"))
nchar(model_deparsed) # 22036
My data only needs to be accessible by R, and doesn't need to be human-readable
The same sorts of techniques shown in the previous sections can be applied here. You can zip a serialized or deparsed variable and re-read it as a raw vector.
serialize
can also write variables in a binary format. In this case, it is most easily used with its wrapper saveRDS
.
saveRDS(model, "model.rds")
bin_rds <- readBin("model.rds", "raw", 1e6)
length(bin_rds) # 6350