# Internals#

This section will provide a look into some of pandas internals. It’s primarily intended for developers of pandas itself.

## Indexing#

In pandas there are a few objects implemented which can serve as valid containers for the axis labels:

`Index`

: the generic “ordered set” object, an ndarray of object dtype assuming nothing about its contents. The labels must be hashable (and likely immutable) and unique. Populates a dict of label to location in Cython to do`O(1)`

lookups.`MultiIndex`

: the standard hierarchical index object`DatetimeIndex`

: An Index object with`Timestamp`

boxed elements (impl are the int64 values)`TimedeltaIndex`

: An Index object with`Timedelta`

boxed elements (impl are the in64 values)`PeriodIndex`

: An Index object with Period elements

There are functions that make the creation of a regular index easy:

`date_range()`

: fixed frequency date range generated from a time rule or DateOffset. An ndarray of Python datetime objects`period_range()`

: fixed frequency date range generated from a time rule or DateOffset. An ndarray of`Period`

objects, representing timespans

The motivation for having an `Index`

class in the first place was to enable
different implementations of indexing. This means that it’s possible for you,
the user, to implement a custom `Index`

subclass that may be better suited to
a particular application than the ones provided in pandas.

From an internal implementation point of view, the relevant methods that an
`Index`

must define are one or more of the following (depending on how
incompatible the new object internals are with the `Index`

functions):

`get_loc()`

: returns an “indexer” (an integer, or in some cases a slice object) for a label`slice_locs()`

: returns the “range” to slice between two labels`get_indexer()`

: Computes the indexing vector for reindexing / data alignment purposes. See the source / docstrings for more on this`get_indexer_non_unique()`

: Computes the indexing vector for reindexing / data alignment purposes when the index is non-unique. See the source / docstrings for more on this`reindex()`

: Does any pre-conversion of the input index then calls`get_indexer`

`union()`

,`intersection()`

: computes the union or intersection of two Index objects`insert()`

: Inserts a new label into an Index, yielding a new object`delete()`

: Delete a label, yielding a new object`drop()`

: Deletes a set of labels`take()`

: Analogous to ndarray.take

### MultiIndex#

Internally, the `MultiIndex`

consists of a few things: the **levels**, the
integer **codes**, and the level **names**:

```
In [1]: index = pd.MultiIndex.from_product(
...: [range(3), ["one", "two"]], names=["first", "second"]
...: )
...:
In [2]: index
Out[2]:
MultiIndex([(0, 'one'),
(0, 'two'),
(1, 'one'),
(1, 'two'),
(2, 'one'),
(2, 'two')],
names=['first', 'second'])
In [3]: index.levels
Out[3]: FrozenList([[0, 1, 2], ['one', 'two']])
In [4]: index.codes
Out[4]: FrozenList([[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
In [5]: index.names
Out[5]: FrozenList(['first', 'second'])
```

You can probably guess that the codes determine which unique element is
identified with that location at each layer of the index. It’s important to
note that sortedness is determined **solely** from the integer codes and does
not check (or care) whether the levels themselves are sorted. Fortunately, the
constructors `from_tuples()`

and `from_arrays()`

ensure
that this is true, but if you compute the levels and codes yourself, please be careful.

### Values#

pandas extends NumPy’s type system with custom types, like `Categorical`

or
datetimes with a timezone, so we have multiple notions of “values”. For 1-D
containers (`Index`

classes and `Series`

) we have the following convention:

`cls._values`

refers is the “best possible” array. This could be an`ndarray`

or`ExtensionArray`

.

So, for example, `Series[category]._values`

is a `Categorical`

.

## Subclassing pandas data structures#

This section has been moved to Subclassing pandas data structures.