Internals#
This section will provide a look into some of pandas internals. It’s primarily intended for developers of pandas itself.
Indexing#
In pandas there are a few objects implemented which can serve as valid containers for the axis labels:
Index
: the generic “ordered set” object, an ndarray of object dtype assuming nothing about its contents. The labels must be hashable (and likely immutable) and unique. Populates a dict of label to location in Cython to doO(1)
lookups.MultiIndex
: the standard hierarchical index objectDatetimeIndex
: An Index object withTimestamp
boxed elements (impl are the int64 values)TimedeltaIndex
: An Index object withTimedelta
boxed elements (impl are the in64 values)PeriodIndex
: An Index object with Period elements
There are functions that make the creation of a regular index easy:
date_range()
: fixed frequency date range generated from a time rule or DateOffset. An ndarray of Python datetime objectsperiod_range()
: fixed frequency date range generated from a time rule or DateOffset. An ndarray ofPeriod
objects, representing timespans
Warning
Custom Index
subclasses are not supported, custom behavior should be implemented using the ExtensionArray
interface instead.
MultiIndex#
Internally, the MultiIndex
consists of a few things: the levels, the
integer codes, and the level names:
In [1]: index = pd.MultiIndex.from_product(
...: [range(3), ["one", "two"]], names=["first", "second"]
...: )
...:
In [2]: index
Out[2]:
MultiIndex([(0, 'one'),
(0, 'two'),
(1, 'one'),
(1, 'two'),
(2, 'one'),
(2, 'two')],
names=['first', 'second'])
In [3]: index.levels
Out[3]: FrozenList([[0, 1, 2], ['one', 'two']])
In [4]: index.codes
Out[4]: FrozenList([[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
In [5]: index.names
Out[5]: FrozenList(['first', 'second'])
You can probably guess that the codes determine which unique element is
identified with that location at each layer of the index. It’s important to
note that sortedness is determined solely from the integer codes and does
not check (or care) whether the levels themselves are sorted. Fortunately, the
constructors from_tuples()
and from_arrays()
ensure
that this is true, but if you compute the levels and codes yourself, please be careful.
Values#
pandas extends NumPy’s type system with custom types, like Categorical
or
datetimes with a timezone, so we have multiple notions of “values”. For 1-D
containers (Index
classes and Series
) we have the following convention:
cls._values
refers is the “best possible” array. This could be anndarray
orExtensionArray
.
So, for example, Series[category]._values
is a Categorical
.
Subclassing pandas data structures#
This section has been moved to Subclassing pandas data structures.