pandas.api.extensions.ExtensionDtype#

class pandas.api.extensions.ExtensionDtype[source]#

A custom data type, to be paired with an ExtensionArray.

This enables support for third-party and custom dtypes within the pandas ecosystem. By implementing this interface and pairing it with a custom ExtensionArray, users can create rich data types that integrate cleanly with pandas operations, such as grouping, joining, or aggregation.

Attributes

kind

A character code (one of 'biufcmMOSUV'), default 'O'

na_value

Default NA value to use for this type.

name

A string identifying the data type.

names

Ordered list of field names, or None if there are no fields.

type

The scalar type for the array, e.g. int.

Methods

construct_array_type()

Return the array type associated with this dtype.

construct_from_string(string)

Construct this type from a string.

is_dtype(dtype)

Check if we match 'dtype'.

See also

extensions.register_extension_dtype

Register an ExtensionType with pandas as class decorator.

extensions.ExtensionArray

Abstract base class for custom 1-D array types.

Notes

The interface includes the following abstract methods that must be implemented by subclasses:

  • type

  • name

  • construct_array_type

The following attributes and methods influence the behavior of the dtype in pandas operations

  • _is_numeric

  • _is_boolean

  • _get_common_dtype

The na_value class attribute can be used to set the default NA value for this type. numpy.nan is used by default.

ExtensionDtypes are required to be hashable. The base class provides a default implementation, which relies on the _metadata class attribute. _metadata should be a tuple containing the strings that define your data type. For example, with PeriodDtype that’s the freq attribute.

If you have a parametrized dtype you should set the ``_metadata`` class property.

Ideally, the attributes in _metadata will match the parameters to your ExtensionDtype.__init__ (if any). If any of the attributes in _metadata don’t implement the standard __eq__ or __hash__, the default implementations here will not work.

Examples

For interaction with Apache Arrow (pyarrow), a __from_arrow__ method can be implemented: this method receives a pyarrow Array or ChunkedArray as only argument and is expected to return the appropriate pandas ExtensionArray for this dtype and the passed values:

>>> import pyarrow
>>> from pandas.api.extensions import ExtensionArray
>>> class ExtensionDtype:
...     def __from_arrow__(
...         self, array: pyarrow.Array | pyarrow.ChunkedArray
...     ) -> ExtensionArray: ...

This class does not inherit from ‘abc.ABCMeta’ for performance reasons. Methods and properties required by the interface raise pandas.errors.AbstractMethodError and no register method is provided for registering virtual subclasses.