write a gzip file from data frame

writeLines expects a list of strings. The simplest way to write this to a gzip file would be

df1 <- data.frame(id = seq(1,10,1), var1 = runif(10), var2 = runif(10))
gz1 <- gzfile("df1.gz", "w")
write.csv(df1, gz1)
close(gz1)

This will write it as a gzipped csv. Also see write.table and write.csv2 for alternate ways of writing the file out.

EDIT:Based on the updates to the post about desired format, I made the following helper (quickly thrown together, probably admits tons of simplification):

function(df) {
    rowCount <- nrow(df)
    dfNames <- names(df)
    dfNamesIndex <- length(dfNames)
    sapply(1:rowCount, function(rowIndex) {
        paste(rowIndex, '|', 
            paste(sapply(1:dfNamesIndex, function(element) {
                c(dfNames[element], ':', df[rowIndex, element])
            }), collapse=' ')
        )
    })
}

So the output looks like

a <- data.frame(x=1:10,y=rnorm(10))
writeLines(myser(a))
# 1 | x : 1 y : -0.231340933021948
# 2 | x : 2 y : 0.896777389870928
# 3 | x : 3 y : -0.434875004781075
# 4 | x : 4 y : -0.0269824962632977
# 5 | x : 5 y : 0.67654540494899
# 6 | x : 6 y : -1.96965253674725
# 7 | x : 7 y : 0.0863177759402661
# 8 | x : 8 y : -0.130116466571162
# 9 | x : 9 y : 0.418337557610229
# 10 | x : 10 y : -1.22890714891874

And all that is necessary is to pass the gzfile in to writeLines to get the desired output.


To write something to a gzip file you need to "serialize" it to text. For R objects you can have a stab at that by using dput:

gz1 = gzfile("df1.gz","w")
dput(df1, gz1)
close(gz1)

However you've just written a text representation of the data frame to the file. This will quite probably be less efficient than using save(df1,file="df1.RData") to save it to a native R data file. Ask yourself: why am I saving it as a .gz file?

In a quick test with some random numbers, the gz file was 54k, the .RData file was 34k


You can use the gzip function in R.utils:

library(R.utils)
library(data.table)

#Write gzip file
df <- data.table(var1='Compress me',var2=', please!')
fwrite(df,'filename.csv',sep=',')
gzip('filename.csv',destname='filename.csv.gz')`

#Read gzip file
fread('gzip -dc filename.csv.gz')
          var1      var2
1: Compress me , please!

Another very simple way to do it is:

# We create the .csv file
write.csv(df1, "df1.csv")

# We compress it deleting the .csv
system("gzip df1.csv")

Got the idea from: http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html