.. currentmodule:: pandas
.. ipython:: python
:suppress:
import pandas as pd
import numpy as np
.. _boolean:
**************************
Nullable Boolean data type
**************************
.. note::
BooleanArray is currently experimental. Its API or implementation may
change without warning.
.. _boolean.indexing:
Indexing with NA values
-----------------------
pandas allows indexing with ``NA`` values in a boolean array, which are treated as ``False``.
.. ipython:: python
:okexcept:
s = pd.Series([1, 2, 3])
mask = pd.array([True, False, pd.NA], dtype="boolean")
s[mask]
If you would prefer to keep the ``NA`` values you can manually fill them with ``fillna(True)``.
.. ipython:: python
s[mask.fillna(True)]
If you create a column of ``NA`` values (for example to fill them later)
with ``df['new_col'] = pd.NA``, the ``dtype`` would be set to ``object`` in the
new column. The performance on this column will be worse than with
the appropriate type. It's better to use
``df['new_col'] = pd.Series(pd.NA, dtype="boolean")``
(or another ``dtype`` that supports ``NA``).
.. ipython:: python
df = pd.DataFrame()
df['objects'] = pd.NA
df.dtypes
.. _boolean.kleene:
Kleene logical operations
-------------------------
:class:`arrays.BooleanArray` implements `Kleene Logic`_ (sometimes called three-value logic) for
logical operations like ``&`` (and), ``|`` (or) and ``^`` (exclusive-or).
This table demonstrates the results for every combination. These operations are symmetrical,
so flipping the left- and right-hand side makes no difference in the result.
================= =========
Expression Result
================= =========
``True & True`` ``True``
``True & False`` ``False``
``True & NA`` ``NA``
``False & False`` ``False``
``False & NA`` ``False``
``NA & NA`` ``NA``
``True | True`` ``True``
``True | False`` ``True``
``True | NA`` ``True``
``False | False`` ``False``
``False | NA`` ``NA``
``NA | NA`` ``NA``
``True ^ True`` ``False``
``True ^ False`` ``True``
``True ^ NA`` ``NA``
``False ^ False`` ``False``
``False ^ NA`` ``NA``
``NA ^ NA`` ``NA``
================= =========
When an ``NA`` is present in an operation, the output value is ``NA`` only if
the result cannot be determined solely based on the other input. For example,
``True | NA`` is ``True``, because both ``True | True`` and ``True | False``
are ``True``. In that case, we don't actually need to consider the value
of the ``NA``.
On the other hand, ``True & NA`` is ``NA``. The result depends on whether
the ``NA`` really is ``True`` or ``False``, since ``True & True`` is ``True``,
but ``True & False`` is ``False``, so we can't determine the output.
This differs from how ``np.nan`` behaves in logical operations. pandas treated
``np.nan`` is *always false in the output*.
In ``or``
.. ipython:: python
pd.Series([True, False, np.nan], dtype="object") | True
pd.Series([True, False, np.nan], dtype="boolean") | True
In ``and``
.. ipython:: python
pd.Series([True, False, np.nan], dtype="object") & True
pd.Series([True, False, np.nan], dtype="boolean") & True
.. _Kleene Logic: https://en.wikipedia.org/wiki/Three-valued_logic#Kleene_and_Priest_logics