Cannot save - load xml_document generated from rvest in R

I have found a workaround, not very efficient but it does the job.

The logic is to save the xml_document as a string and read it in again with read_html.

library(rvest)
library(magrittr)
doc <- read_html("http://www.example.com/")

# convert it to character
doc %<>% as("character")

save(doc, file=paste0(getwd(), "/example.RData"))
rm(doc)

load(file=paste0(getwd(), "/example.RData"))
doc %>% read_html %>% html_node("h1") %>% html_text

I wrote some ad hoc functions to accomplish this task. They are slightly better than the previous answer because they work for lists of rvest objects and they use RDS instead of RData files. This allows one to name the object anything one wants.

write_rvest = function(x, path, ...) {
  #convert to character
  #is list?
  if (is.list(x)) {
    x %<>% map(as.character)
  } else {
    x %<>% as.character
  }

  #save
  write_rds(x, path = path, ...)
}

read_rvest = function(path) {
  #load from file
  x = read_rds(path)

  #read
  if (is.list(x)) {
    x %<>% map(read_html)
  } else {
    x %<>% read_html
  }

  x
}

Tests for equality work but fail for identity. Nevertheless, the objects work and they have the same size in bytes, so I don't know why identity fails. Maybe it's sensitive to RAM position.

Tags:

Xml

R

Rvest