.. _10min_tut_08_combine: {{ header }} .. ipython:: python import pandas as pd .. raw:: html
Data used for this tutorial:
How to combine data from multiple tables ---------------------------------------- Concatenating objects ~~~~~~~~~~~~~~~~~~~~~ .. image:: ../../_static/schemas/08_concat_row.svg :align: center .. raw:: html By default concatenation is along axis 0, so the resulting table combines the rows of the input tables. Let’s check the shape of the original and the concatenated tables to verify the operation: .. ipython:: python print('Shape of the ``air_quality_pm25`` table: ', air_quality_pm25.shape) print('Shape of the ``air_quality_no2`` table: ', air_quality_no2.shape) print('Shape of the resulting ``air_quality`` table: ', air_quality.shape) Hence, the resulting table has 3178 = 1110 + 2068 rows. .. note:: The **axis** argument will return in a number of pandas methods that can be applied **along an axis**. A ``DataFrame`` has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). Most operations like concatenation or summary statistics are by default across rows (axis 0), but can be applied across columns as well. Sorting the table on the datetime information illustrates also the combination of both tables, with the ``parameter`` column defining the origin of the table (either ``no2`` from table ``air_quality_no2`` or ``pm25`` from table ``air_quality_pm25``): .. ipython:: python air_quality = air_quality.sort_values("date.utc") air_quality.head() In this specific example, the ``parameter`` column provided by the data ensures that each of the original tables can be identified. This is not always the case. The ``concat`` function provides a convenient solution with the ``keys`` argument, adding an additional (hierarchical) row index. For example: .. ipython:: python air_quality_ = pd.concat([air_quality_pm25, air_quality_no2], keys=["PM25", "NO2"]) air_quality_.head() .. note:: The existence of multiple row/column indices at the same time has not been mentioned within these tutorials. *Hierarchical indexing* or *MultiIndex* is an advanced and powerful pandas feature to analyze higher dimensional data. Multi-indexing is out of scope for this pandas introduction. For the moment, remember that the function ``reset_index`` can be used to convert any level of an index to a column, e.g. ``air_quality.reset_index(level=0)`` .. raw:: html
To user guide Feel free to dive into the world of multi-indexing at the user guide section on :ref:`advanced indexing `. .. raw:: html
.. raw:: html
To user guide More options on table concatenation (row and column wise) and how ``concat`` can be used to define the logic (union or intersection) of the indexes on the other axes is provided at the section on :ref:`object concatenation `. .. raw:: html
Join tables using a common identifier ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. image:: ../../_static/schemas/08_merge_left.svg :align: center .. raw:: html .. raw:: html .. raw:: html
To user guide pandas supports also inner, outer, and right joins. More information on join/merge of tables is provided in the user guide section on :ref:`database style merging of tables `. Or have a look at the :ref:`comparison with SQL` page. .. raw:: html
.. raw:: html

REMEMBER

- Multiple tables can be concatenated both column-wise and row-wise using the ``concat`` function. - For database-like merging/joining of tables, use the ``merge`` function. .. raw:: html
.. raw:: html
To user guide See the user guide for a full description of the various :ref:`facilities to combine data tables `. .. raw:: html