Loading .RData files into Python

People ask this sort of thing on the R-help and R-dev list and the usual answer is that the code is the documentation for the .RData file format. So any other implementation in any other language is hard++.

I think the only reasonable way is to install RPy2 and use R's load function from that, converting to appropriate python objects as you go. The .RData file can contain structured objects as well as plain tables so watch out.

Linky: http://rpy.sourceforge.net/rpy2/doc-2.4/html/

Quicky:

>>> import rpy2.robjects as robjects
>>> robjects.r['load'](".RData")

objects are now loaded into the R workspace.

>>> robjects.r['y']
<FloatVector - Python:0x24c6560 / R:0xf1f0e0>
[0.763684, 0.086314, 0.617097, ..., 0.443631, 0.281865, 0.839317]

That's a simple scalar, d is a data frame, I can subset to get columns:

>>> robjects.r['d'][0]
<IntVector - Python:0x24c9248 / R:0xbbc6c0>
[       1,        2,        3, ...,        8,        9,       10]
>>> robjects.r['d'][1]
<FloatVector - Python:0x24c93b0 / R:0xf1f230>
[0.975648, 0.597036, 0.254840, ..., 0.891975, 0.824879, 0.870136]

As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.

It is a wrapper around the C library librdata, so it is very fast.

You can install it easily with pip:

pip install pyreadr

As an example you would do:

import pyreadr

result = pyreadr.read_r('/path/to/file.RData') # also works for Rds

# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1

The repo is here: https://github.com/ofajardo/pyreadr

Disclaimer: I am the developer of this package.


Jupyter Notebook Users

If you are using Jupyter notebook, you need to do 2 steps:

Step 1: go to http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2 and download Python interface to the R language (embedded R) in my case I will use rpy2-2.8.6-cp36-cp36m-win_amd64.whl

Put this file in the same working directory you are currently in.

Step 2: Go to your Jupyter notebook and write the following commands

# This is to install rpy2 library in Anaconda
!pip install rpy2-2.8.6-cp36-cp36m-win_amd64.whl

and then

# This is important if you will be using rpy2
import os
os.environ['R_USER'] = 'D:\Anaconda3\Lib\site-packages\rpy2'

and then

import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()

This should allow you to use R functions in python. Now you have to import the readRDS as follow

readRDS = robjects.r['readRDS']
df = readRDS('Data1.rds')
df = pandas2ri.ri2py(df)
df.head()

Congratulations! now you have the Dataframe you wanted

However, I advise you to save it in pickle file for later time usage in python as

 df.to_pickle('Data1') 

So next time you may simply use it by

df1=pd.read_pickle('Data1')

Tags:

Python

R

Rdata