pandas.api.types.
union_categoricals
Combine list-like of Categorical-like, unioning categories.
All categories must have the same dtype.
Categorical, CategoricalIndex, or Series with dtype=’category’.
If true, resulting categories will be lexsorted, otherwise they will be ordered as they appear in the data.
If true, the ordered attribute of the Categoricals will be ignored. Results in an unordered categorical.
all inputs do not have the same dtype
all inputs do not have the same ordered property
all inputs are ordered and their categories are not identical
sort_categories=True and Categoricals are ordered
Empty list of categoricals passed
Notes
To learn more about categories, see link
Examples
>>> from pandas.api.types import union_categoricals
If you want to combine categoricals that do not necessarily have the same categories, union_categoricals will combine a list-like of categoricals. The new categories will be the union of the categories being combined.
>>> a = pd.Categorical(["b", "c"]) >>> b = pd.Categorical(["a", "b"]) >>> union_categoricals([a, b]) ['b', 'c', 'a', 'b'] Categories (3, object): ['b', 'c', 'a']
By default, the resulting categories will be ordered as they appear in the categories of the data. If you want the categories to be lexsorted, use sort_categories=True argument.
>>> union_categoricals([a, b], sort_categories=True) ['b', 'c', 'a', 'b'] Categories (3, object): ['a', 'b', 'c']
union_categoricals also works with the case of combining two categoricals of the same categories and order information (e.g. what you could also append for).
>>> a = pd.Categorical(["a", "b"], ordered=True) >>> b = pd.Categorical(["a", "b", "a"], ordered=True) >>> union_categoricals([a, b]) ['a', 'b', 'a', 'b', 'a'] Categories (2, object): ['a' < 'b']
Raises TypeError because the categories are ordered and not identical.
>>> a = pd.Categorical(["a", "b"], ordered=True) >>> b = pd.Categorical(["a", "b", "c"], ordered=True) >>> union_categoricals([a, b]) Traceback (most recent call last): ... TypeError: to union ordered Categoricals, all categories must be the same
New in version 0.20.0
Ordered categoricals with different categories or orderings can be combined by using the ignore_ordered=True argument.
>>> a = pd.Categorical(["a", "b", "c"], ordered=True) >>> b = pd.Categorical(["c", "b", "a"], ordered=True) >>> union_categoricals([a, b], ignore_order=True) ['a', 'b', 'c', 'c', 'b', 'a'] Categories (3, object): ['a', 'b', 'c']
union_categoricals also works with a CategoricalIndex, or Series containing categorical data, but note that the resulting array will always be a plain Categorical
>>> a = pd.Series(["b", "c"], dtype='category') >>> b = pd.Series(["a", "b"], dtype='category') >>> union_categoricals([a, b]) ['b', 'c', 'a', 'b'] Categories (3, object): ['b', 'c', 'a']