How do Rpy2, pyrserve and PypeR compare?

From a developer's prospective, we used to use rpy/rpy2 to provide statistical and drawing functions to our Python-based application. It has caused huge problems in delivering our application because rpy/rpy2 needs to be compiled for specific combinations of Python and R, which makes it infeasible for us to provide binary distributions that work out of box unless we bundle R as well. Because rpy/rpy2 are not particularly easy to install, we ended up replacing relevant parts with native Python modules such as matplotlib. We would have switched to pyrserve if we had to use R because we could start a R server locally and connect to it without worrying about the version of R.


in pyper, i can't pass large matrix from python to r instance with assign(). however, i don't have issue with rpy2. it is just my experience.


From the paper in the Journal of Statistical Software on PypeR:

RPy presents a simple and efficient way of accessing R from Python. It is robust and very convenient for frequent interaction operations between Python and R. This package allows Python programs to pass Python objects of basic data types to R functions and return the results in Python objects. Such features make it an attractive solution for the cases in which Python and R interact frequently. However, there are still limitations of this package as listed below.
Performance:
RPy may not behave very well for large-size data sets or for computation-intensive duties. A lot of time and memory are inevitably consumed in producing the Python copy of the R data because in every round of a conversation RPy converts the returned value of an R expression into a Python object of basic types or NumPy array. RPy2, a recently developed branch of RPy, uses Python objects to refer to R objects instead of copying them back into Python objects. This strategy avoids frequent data conversions and improves speed. However, memory consumption remains a problem. [...] When we were implementing WebArray (Xia et al. 2005), an online platform for microarray data analysis, a job consumed roughly one quarter more computational time if running R through RPy instead of through R's command-line user interface. Therefore, we decided to run R in Python through pipes in subsequent developments, e.g., WebArrayDB (Xia et al. 2009), which retained the same performance as achieved when running R independently. We do not know the exact reason for such a difference in performance, but we noticed that RPy directly uses the shared library of R to run R scripts. In contrast, running R through pipes means running the R interpreter directly.
Memory:
R has been denounced for its uneconomical use of memory. The memory used by large- size R objects is rarely released after these objects are deleted. Sometimes the only way to release memory from R is to quit R. RPy module wraps R in a Python object. However, the R library will stay in memory even if the Python object is deleted. In other words, memory used by R cannot be released until the host Python script is terminated.
Portability:
As a module with extensions written in C, the RPy source package has to be compiled with a specific R version on POSIX (Portable Operating System Interface for Unix) systems, and the R must be compiled with the shared library enabled. Also, the binary distributions for Windows are bound to specic combinations of different versions of Python/R, so it is quite frequent that a user has difficulty in finding a distribution that ts the user's software environment.


I know one of the 3 better than the others, but in the order given in the question:

rpy2:

  • C-level interface between Python and R (R running as an embedded process)
  • R objects exposed to Python without the need to copy the data over
  • Conversely, Python's numpy arrays can be exposed to R without making a copy
  • Low-level interface (close to the R C-API) and high-level interface (for convenience)
  • In-place modification for vectors and arrays possible
  • R callback functions can be implemented in Python
  • Possible to have anonymous R objects with a Python label
  • Python pickling possible
  • Full customization of R's behavior with its console (so possible to implement a full R GUI)
  • MSWindows with limited support

pyrserve:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use R's Rserve
  • advantages and inconveniences linked to remote computation and to RServe

pyper:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use of pipes to have Python communicate with R (with the advantages and inconveniences linked to it)

edit: Windows support for rpy2