Trellis plotting interface

We import the rplot API:

In [1482]: import pandas.tools.rplot as rplot

Examples

RPlot is a flexible API for producing Trellis plots. These plots allow you to arrange data in a rectangular grid by values of certain attributes.

In [1483]: plt.figure()
Out[1483]: <matplotlib.figure.Figure at 0x9ea7450>

In [1484]: plot = rplot.RPlot(tips_data, x='totbill', y='tip')

In [1485]: plot.add(rplot.TrellisGrid(['sex', 'smoker']))

In [1486]: plot.add(rplot.GeomHistogram())

In [1487]: plot.render(plt.gcf())
Out[1487]: <matplotlib.figure.Figure at 0x9ea7450>
_images/rplot1_tips.png

In the example above, data from the tips data set is arranged by the attributes ‘sex’ and ‘smoker’. Since both of those attributes can take on one of two values, the resulting grid has two columns and two rows. A histogram is displayed for each cell of the grid.

In [1488]: plt.figure()
Out[1488]: <matplotlib.figure.Figure at 0x9ea7150>

In [1489]: plot = rplot.RPlot(tips_data, x='totbill', y='tip')

In [1490]: plot.add(rplot.TrellisGrid(['sex', 'smoker']))

In [1491]: plot.add(rplot.GeomDensity())

In [1492]: plot.render(plt.gcf())
Out[1492]: <matplotlib.figure.Figure at 0x9ea7150>
_images/rplot2_tips.png

Example above is the same as previous except the plot is set to kernel density estimation. This shows how easy it is to have different plots for the same Trellis structure.

In [1493]: plt.figure()
Out[1493]: <matplotlib.figure.Figure at 0x10e164d0>

In [1494]: plot = rplot.RPlot(tips_data, x='totbill', y='tip')

In [1495]: plot.add(rplot.TrellisGrid(['sex', 'smoker']))

In [1496]: plot.add(rplot.GeomScatter())

In [1497]: plot.add(rplot.GeomPolyFit(degree=2))

In [1498]: plot.render(plt.gcf())
Out[1498]: <matplotlib.figure.Figure at 0x10e164d0>
_images/rplot3_tips.png

The plot above shows that it is possible to have two or more plots for the same data displayed on the same Trellis grid cell.

In [1499]: plt.figure()
Out[1499]: <matplotlib.figure.Figure at 0x10c77810>

In [1500]: plot = rplot.RPlot(tips_data, x='totbill', y='tip')

In [1501]: plot.add(rplot.TrellisGrid(['sex', 'smoker']))

In [1502]: plot.add(rplot.GeomScatter())

In [1503]: plot.add(rplot.GeomDensity2D())

In [1504]: plot.render(plt.gcf())
Out[1504]: <matplotlib.figure.Figure at 0x10c77810>
_images/rplot4_tips.png

Above is a similar plot but with 2D kernel desnity estimation plot superimposed.

In [1505]: plt.figure()
Out[1505]: <matplotlib.figure.Figure at 0x11a1a190>

In [1506]: plot = rplot.RPlot(tips_data, x='totbill', y='tip')

In [1507]: plot.add(rplot.TrellisGrid(['sex', '.']))

In [1508]: plot.add(rplot.GeomHistogram())

In [1509]: plot.render(plt.gcf())
Out[1509]: <matplotlib.figure.Figure at 0x11a1a190>
_images/rplot5_tips.png

It is possible to only use one attribute for grouping data. The example above only uses ‘sex’ attribute. If the second grouping attribute is not specified, the plots will be arranged in a column.

In [1510]: plt.figure()
Out[1510]: <matplotlib.figure.Figure at 0x11f7f450>

In [1511]: plot = rplot.RPlot(tips_data, x='totbill', y='tip')

In [1512]: plot.add(rplot.TrellisGrid(['.', 'smoker']))

In [1513]: plot.add(rplot.GeomHistogram())

In [1514]: plot.render(plt.gcf())
Out[1514]: <matplotlib.figure.Figure at 0x11f7f450>
_images/rplot6_tips.png

If the first grouping attribute is not specified the plots will be arranged in a row.

In [1515]: plt.figure()
Out[1515]: <matplotlib.figure.Figure at 0x121ced90>

In [1516]: plot = rplot.RPlot(tips_data, x='totbill', y='tip')

In [1517]: plot.add(rplot.TrellisGrid(['.', 'smoker']))

In [1518]: plot.add(rplot.GeomHistogram())

In [1519]: plot = rplot.RPlot(tips_data, x='tip', y='totbill')

In [1520]: plot.add(rplot.TrellisGrid(['sex', 'smoker']))

In [1521]: plot.add(rplot.GeomPoint(size=80.0, colour=rplot.ScaleRandomColour('day'), shape=rplot.ScaleShape('size'), alpha=1.0))

In [1522]: plot.render(plt.gcf())
Out[1522]: <matplotlib.figure.Figure at 0x121ced90>
_images/rplot7_tips.png

As shown above, scatter plots are also possible. Scatter plots allow you to map various data attributes to graphical properties of the plot. In the example above the colour and shape of the scatter plot graphical objects is mapped to ‘day’ and ‘size’ attributes respectively. You use scale objects to specify these mappings. The list of scale classes is given below with initialization arguments for quick reference.

Scales

ScaleGradient(column, colour1, colour2)

This one allows you to map an attribute (specified by parameter column) value to the colour of a graphical object. The larger the value of the attribute the closer the colour will be to colour2, the smaller the value, the closer it will be to colour1.

ScaleGradient2(column, colour1, colour2, colour3)

The same as ScaleGradient but interpolates linearly between three colours instead of two.

ScaleSize(column, min_size, max_size, transform)

Map attribute value to size of the graphical object. Parameter min_size (default 5.0) is the minimum size of the graphical object, max_size (default 100.0) is the maximum size and transform is a one argument function that will be used to transform the attribute value (defaults to lambda x: x).

ScaleShape(column)

Map the shape of the object to attribute value. The attribute has to be categorical.

ScaleRandomColour(column)

Assign a random colour to a value of categorical attribute specified by column.