pandas.api.types.union_categoricals#

pandas.api.types.union_categoricals(to_union, sort_categories=False, ignore_order=False)[source]#

Combine list-like of Categorical-like, unioning categories.

All categories must have the same dtype.

Parameters:
to_unionlist-like

Categorical, CategoricalIndex, or Series with dtype=’category’.

sort_categoriesbool, default False

If true, resulting categories will be lexsorted, otherwise they will be ordered as they appear in the data.

ignore_orderbool, default False

If true, the ordered attribute of the Categoricals will be ignored. Results in an unordered categorical.

Returns:
Categorical

The union of categories being combined.

Raises:
TypeError
  • all inputs do not have the same dtype

  • all inputs do not have the same ordered property

  • all inputs are ordered and their categories are not identical

  • sort_categories=True and Categoricals are ordered

ValueError

Empty list of categoricals passed

See also

CategoricalDtype

Type for categorical data with the categories and orderedness.

Categorical

Represent a categorical variable in classic R / S-plus fashion.

Notes

To learn more about categories, see link

Examples

If you want to combine categoricals that do not necessarily have the same categories, union_categoricals will combine a list-like of categoricals. The new categories will be the union of the categories being combined.

>>> a = pd.Categorical(["b", "c"])
>>> b = pd.Categorical(["a", "b"])
>>> pd.api.types.union_categoricals([a, b])
['b', 'c', 'a', 'b']
Categories (3, object): ['b', 'c', 'a']

By default, the resulting categories will be ordered as they appear in the categories of the data. If you want the categories to be lexsorted, use sort_categories=True argument.

>>> pd.api.types.union_categoricals([a, b], sort_categories=True)
['b', 'c', 'a', 'b']
Categories (3, object): ['a', 'b', 'c']

union_categoricals also works with the case of combining two categoricals of the same categories and order information (e.g. what you could also append for).

>>> a = pd.Categorical(["a", "b"], ordered=True)
>>> b = pd.Categorical(["a", "b", "a"], ordered=True)
>>> pd.api.types.union_categoricals([a, b])
['a', 'b', 'a', 'b', 'a']
Categories (2, object): ['a' < 'b']

Raises TypeError because the categories are ordered and not identical.

>>> a = pd.Categorical(["a", "b"], ordered=True)
>>> b = pd.Categorical(["a", "b", "c"], ordered=True)
>>> pd.api.types.union_categoricals([a, b])
Traceback (most recent call last):
    ...
TypeError: to union ordered Categoricals, all categories must be the same

Ordered categoricals with different categories or orderings can be combined by using the ignore_ordered=True argument.

>>> a = pd.Categorical(["a", "b", "c"], ordered=True)
>>> b = pd.Categorical(["c", "b", "a"], ordered=True)
>>> pd.api.types.union_categoricals([a, b], ignore_order=True)
['a', 'b', 'c', 'c', 'b', 'a']
Categories (3, object): ['a', 'b', 'c']

union_categoricals also works with a CategoricalIndex, or Series containing categorical data, but note that the resulting array will always be a plain Categorical

>>> a = pd.Series(["b", "c"], dtype="category")
>>> b = pd.Series(["a", "b"], dtype="category")
>>> pd.api.types.union_categoricals([a, b])
['b', 'c', 'a', 'b']
Categories (3, object): ['b', 'c', 'a']