pandas.Categorical#

class pandas.Categorical(values, categories=None, ordered=None, dtype=None, copy=True)[source]#

Represent a categorical variable in classic R / S-plus fashion.

Categoricals can only take on a limited, and usually fixed, number of possible values (categories). In contrast to statistical categorical variables, a Categorical might have an order, but numerical operations (additions, divisions, …) are not possible.

All values of the Categorical are either in categories or np.nan. Assigning values outside of categories will raise a ValueError. Order is defined by the order of the categories, not lexical order of the values.

Parameters:
valueslist-like

The values of the categorical. If categories are given, values not in categories will be replaced with NaN.

categoriesIndex-like (unique), optional

The unique categories for this categorical. If not given, the categories are assumed to be the unique values of values (sorted, if possible, otherwise in the order in which they appear).

orderedbool, default False

Whether or not this categorical is treated as a ordered categorical. If True, the resulting categorical will be ordered. An ordered categorical respects, when sorted, the order of its categories attribute (which in turn is the categories argument, if provided).

dtypeCategoricalDtype

An instance of CategoricalDtype to use for this categorical.

copybool, default True

Whether to copy if the codes are unchanged.

Raises:
ValueError

If the categories do not validate.

TypeError

If an explicit ordered=True is given but no categories and the values are not sortable.

See also

CategoricalDtype

Type for categorical data.

CategoricalIndex

An Index with an underlying Categorical.

Notes

See the user guide for more.

Examples

>>> pd.Categorical([1, 2, 3, 1, 2, 3])
[1, 2, 3, 1, 2, 3]
Categories (3, int64): [1, 2, 3]
>>> pd.Categorical(["a", "b", "c", "a", "b", "c"])
['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']

Missing values are not included as a category.

>>> c = pd.Categorical([1, 2, 3, 1, 2, 3, np.nan])
>>> c
[1, 2, 3, 1, 2, 3, NaN]
Categories (3, int64): [1, 2, 3]

However, their presence is indicated in the codes attribute by code -1.

>>> c.codes
array([ 0,  1,  2,  0,  1,  2, -1], dtype=int8)

Ordered Categoricals can be sorted according to the custom order of the categories and can have a min and max value.

>>> c = pd.Categorical(
...     ["a", "b", "c", "a", "b", "c"], ordered=True, categories=["c", "b", "a"]
... )
>>> c
['a', 'b', 'c', 'a', 'b', 'c']
Categories (3, object): ['c' < 'b' < 'a']
>>> c.min()
'c'

Attributes

categories

The categories of this categorical.

codes

The category codes of this categorical index.

ordered

Whether the categories have an ordered relationship.

dtype

The CategoricalDtype for this instance.

Methods

from_codes(codes[, categories, ordered, ...])

Make a Categorical type from codes and categories or dtype.

as_ordered()

Set the Categorical to be ordered.

as_unordered()

Set the Categorical to be unordered.

set_categories(new_categories[, ordered, rename])

Set the categories to the specified new categories.

rename_categories(new_categories)

Rename categories.

reorder_categories(new_categories[, ordered])

Reorder categories as specified in new_categories.

add_categories(new_categories)

Add new categories.

remove_categories(removals)

Remove the specified categories.

remove_unused_categories()

Remove categories which are not used.

map(mapper[, na_action])

Map categories using an input mapping or function.

__array__([dtype, copy])

The numpy array interface.