.. _10min_tut_07_reshape: {{ header }} .. ipython:: python import pandas as pd .. raw:: html
Data used for this tutorial:
How to reshape the layout of tables ----------------------------------- Sort table rows ~~~~~~~~~~~~~~~ .. raw:: html .. raw:: html .. raw:: html
To user guide More details about sorting of tables is provided in the user guide section on :ref:`sorting data `. .. raw:: html
Long to wide table format ~~~~~~~~~~~~~~~~~~~~~~~~~ Let’s use a small subset of the air quality data set. We focus on :math:`NO_2` data and only use the first two measurements of each location (i.e. the head of each group). The subset of data will be called ``no2_subset``. .. ipython:: python # filter for no2 data only no2 = air_quality[air_quality["parameter"] == "no2"] .. ipython:: python # use 2 measurements (head) for each location (groupby) no2_subset = no2.sort_index().groupby(["location"]).head(2) no2_subset .. image:: ../../_static/schemas/07_pivot.svg :align: center .. raw:: html As pandas supports plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from *long* to *wide* table format enables the plotting of the different time series at the same time: .. ipython:: python no2.head() .. ipython:: python @savefig 7_reshape_columns.png no2.pivot(columns="location", values="value").plot() .. note:: When the ``index`` parameter is not defined, the existing index (row labels) is used. .. raw:: html
To user guide For more information about :meth:`~DataFrame.pivot`, see the user guide section on :ref:`pivoting DataFrame objects `. .. raw:: html
Pivot table ~~~~~~~~~~~ .. image:: ../../_static/schemas/07_pivot_table.svg :align: center .. raw:: html Pivot table is a well known concept in spreadsheet software. When interested in the row/column margins (subtotals) for each variable, set the ``margins`` parameter to ``True``: .. ipython:: python air_quality.pivot_table( values="value", index="location", columns="parameter", aggfunc="mean", margins=True, ) .. raw:: html
To user guide For more information about :meth:`~DataFrame.pivot_table`, see the user guide section on :ref:`pivot tables `. .. raw:: html
.. note:: In case you are wondering, :meth:`~DataFrame.pivot_table` is indeed directly linked to :meth:`~DataFrame.groupby`. The same result can be derived by grouping on both ``parameter`` and ``location``: :: air_quality.groupby(["parameter", "location"]).mean() .. raw:: html
To user guide Have a look at :meth:`~DataFrame.groupby` in combination with :meth:`~DataFrame.unstack` at the user guide section on :ref:`combining stats and groupby `. .. raw:: html
Wide to long format ~~~~~~~~~~~~~~~~~~~ Starting again from the wide format table created in the previous section, we add a new index to the ``DataFrame`` with :meth:`~DataFrame.reset_index`. .. ipython:: python no2_pivoted = no2.pivot(columns="location", values="value").reset_index() no2_pivoted.head() .. image:: ../../_static/schemas/07_melt.svg :align: center .. raw:: html The solution is the short version on how to apply :func:`pandas.melt`. The method will *melt* all columns NOT mentioned in ``id_vars`` together into two columns: A column with the column header names and a column with the values itself. The latter column gets by default the name ``value``. The parameters passed to :func:`pandas.melt` can be defined in more detail: .. ipython:: python no_2 = no2_pivoted.melt( id_vars="date.utc", value_vars=["BETR801", "FR04014", "London Westminster"], value_name="NO_2", var_name="id_location", ) no_2.head() The additional parameters have the following effects: - ``value_vars`` defines which columns to *melt* together - ``value_name`` provides a custom column name for the values column instead of the default column name ``value`` - ``var_name`` provides a custom column name for the column collecting the column header names. Otherwise it takes the index name or a default ``variable`` Hence, the arguments ``value_name`` and ``var_name`` are just user-defined names for the two generated columns. The columns to melt are defined by ``id_vars`` and ``value_vars``. .. raw:: html
To user guide Conversion from wide to long format with :func:`pandas.melt` is explained in the user guide section on :ref:`reshaping by melt `. .. raw:: html
.. raw:: html

REMEMBER

- Sorting by one or more columns is supported by ``sort_values``. - The ``pivot`` function is purely restructuring of the data, ``pivot_table`` supports aggregations. - The reverse of ``pivot`` (long to wide format) is ``melt`` (wide to long format). .. raw:: html
.. raw:: html
To user guide A full overview is available in the user guide on the pages about :ref:`reshaping and pivoting `. .. raw:: html