rpy2 / R interface

Warning

Up to pandas 0.19, a pandas.rpy module existed with functionality to convert between pandas and rpy2 objects. This functionality now lives in the rpy2 project itself. See the updating section of the previous documentation for a guide to port your code from the removed pandas.rpy to rpy2 functions.

rpy2 is an interface to R running embedded in a Python process, and also includes functionality to deal with pandas DataFrames. Converting data frames back and forth between rpy2 and pandas should be largely automated (no need to convert explicitly, it will be done on the fly in most rpy2 functions). To convert explicitly, the functions are pandas2ri.py2ri() and pandas2ri.ri2py().

See also the documentation of the rpy2 project: https://rpy2.readthedocs.io.

In the remainder of this page, a few examples of explicit conversion is given. The pandas conversion of rpy2 needs first to be activated:

In [1]: from rpy2.robjects import r, pandas2ri
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-79b90b86f23f> in <module>()
----> 1 from rpy2.robjects import r, pandas2ri

/opt/conda/envs/pandas/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py in <module>()
     20 import numpy
     21 import pytz
---> 22 import tzlocal
     23 import warnings
     24 

ModuleNotFoundError: No module named 'tzlocal'

In [2]: pandas2ri.activate()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-0531561cbbe9> in <module>()
----> 1 pandas2ri.activate()

NameError: name 'pandas2ri' is not defined

Transferring R data sets into Python

Once the pandas conversion is activated (pandas2ri.activate()), many conversions of R to pandas objects will be done automatically. For example, to obtain the ‘iris’ dataset as a pandas DataFrame:

In [3]: r.data('iris')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-8bdc5639fb0c> in <module>()
----> 1 r.data('iris')

/pandas/pandas/core/window.py in __getattr__(self, attr)
    161 
    162         raise AttributeError("%r object has no attribute %r" %
--> 163                              (type(self).__name__, attr))
    164 
    165     def _dir_additions(self):

AttributeError: 'Rolling' object has no attribute 'data'

In [4]: r['iris'].head()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-b9fbcc010df6> in <module>()
----> 1 r['iris'].head()

/pandas/pandas/core/base.py in __getitem__(self, key)
    260         elif not getattr(self, 'as_index', False):
    261             if key not in self.obj.columns:
--> 262                 raise KeyError("Column not found: {key}".format(key=key))
    263             return self._gotitem(key, ndim=2)
    264 

KeyError: 'Column not found: iris'

If the pandas conversion was not activated, the above could also be accomplished by explicitly converting it with the pandas2ri.ri2py function (pandas2ri.ri2py(r['iris'])).

Converting DataFrames into R objects

The pandas2ri.py2ri function support the reverse operation to convert DataFrames into the equivalent R object (that is, data.frame):

In [5]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
   ...:                   index=["one", "two", "three"])
   ...: 

In [6]: r_dataframe = pandas2ri.py2ri(df)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-7620636bf651> in <module>()
----> 1 r_dataframe = pandas2ri.py2ri(df)

NameError: name 'pandas2ri' is not defined

In [7]: print(type(r_dataframe))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-4ab82496598b> in <module>()
----> 1 print(type(r_dataframe))

NameError: name 'r_dataframe' is not defined

In [8]: print(r_dataframe)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-714489a67f8e> in <module>()
----> 1 print(r_dataframe)

NameError: name 'r_dataframe' is not defined

The DataFrame’s index is stored as the rownames attribute of the data.frame instance.

Scroll To Top