read.sas7bdat unable to read compressed file

According to the sas7bdat vignette [vignette('sas7bdat')], COMPRESS=BINARY (or COMPRESS=YES) is not currently supported as of 2013 (and this was the vignette active on 6/16/2014 when I wrote this). COMPRESS=CHAR is supported.

These are basically internal compression routines, intended to make filesizes smaller. They're not as good as gz or similar (not nearly as good), but they're supported by SAS transparently while writing SAS programs. Obviously they change the file format significantly, hence the lack of implementation yet.

If you have SAS, you need to write these to an uncompressed dataset.

options compress=no;
libname lib '//drive/path/to/files';
data lib.want;
set lib.have;
run;

That's the simplest way (of many), assuming you have a libname defined as lib as above and change have and want to names that are correct (have should be the filename without extension of the file, in most cases; want can be changed to anything logical with A-Z or underscore only, and 32 or fewer characters).

If you don't have SAS, you'll have to ask your data provided to make the data available uncompressed, or as a different format. If you're getting this from a PUDS somewhere on the web, you might post where you're getting it from and there might be a way to help you identify an uncompressed source.


This admittedly is not a pure R solution, but in many situations (e.g. if you aren't on a pc and don't have the ability to write the SAS file yourself) the other solutions posted are not workable.

Fortunately, Python has a module (https://pypi.python.org/pypi/sas7bdat) which supports reading compressed SAS data sets - it's certainly better using this than needing to acquire SAS if you don't already have it. Once you extract the file and save it to text via Python, you can then access it in R.

from sas7bdat import SAS7BDAT
import pandas as pd

InFileName = "myfile.sas7bdat"
OutFileName = "myfile.txt"

with SAS7BDAT(InFileName) as f:
    df = f.to_data_frame()

df.to_csv(path_or_buf = OutFileName, sep = "\t", encoding = 'utf-8', index = False)

"RevoScaleR" is a good package to read SAS data sets (compressed or uncompressed).You can use rxImport function of this package. Below is the example

Importing library

library(RevoScaleR)

Reading data

R_df_name <- rxImport("fake_path/file_name.sas7bdat")

The speed of this function is far better than haven/sas7bdat/sas7bdat.parso. I hope this helps anyone who struggles to read SAS data sets in R.

Cheers!

Tags:

R

Sas