Nullable Boolean Data Type¶
New in version 1.0.0.
Indexing with NA values¶
pandas does not allow indexing with NA values. Attempting to do so
will raise a ValueError
.
In [1]: s = pd.Series([1, 2, 3])
In [2]: mask = pd.array([True, False, pd.NA], dtype="boolean")
In [3]: s[mask]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-b9628bf46992> in <module>
----> 1 s[mask]
/pandas/pandas/core/series.py in __getitem__(self, key)
905 key = list(key)
906
--> 907 if com.is_bool_indexer(key):
908 key = check_bool_indexer(self.index, key)
909
/pandas/pandas/core/common.py in is_bool_indexer(key)
142 if is_extension_array_dtype(key.dtype):
143 if np.any(key.isna()):
--> 144 raise ValueError(na_msg)
145 return True
146 elif isinstance(key, list):
ValueError: cannot mask with array containing NA / NaN values
The missing values will need to be explicitly filled with True or False prior to using the array as a mask.
In [4]: s[mask.fillna(False)]
Out[4]:
0 1
dtype: int64
Kleene Logical Operations¶
arrays.BooleanArray
implements Kleene Logic (sometimes called three-value logic) for
logical operations like &
(and), |
(or) and ^
(exclusive-or).
This table demonstrates the results for every combination. These operations are symmetrical, so flipping the left- and right-hand side makes no difference in the result.
Expression |
Result |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When an NA
is present in an operation, the output value is NA
only if
the result cannot be determined solely based on the other input. For example,
True | NA
is True
, because both True | True
and True | False
are True
. In that case, we don’t actually need to consider the value
of the NA
.
On the other hand, True & NA
is NA
. The result depends on whether
the NA
really is True
or False
, since True & True
is True
,
but True & False
is False
, so we can’t determine the output.
This differs from how np.nan
behaves in logical operations. Pandas treated
np.nan
is always false in the output.
In or
In [5]: pd.Series([True, False, np.nan], dtype="object") | True
Out[5]:
0 True
1 True
2 False
dtype: bool
In [6]: pd.Series([True, False, np.nan], dtype="boolean") | True
Out[6]:
0 True
1 True
2 True
dtype: boolean
In and
In [7]: pd.Series([True, False, np.nan], dtype="object") & True
Out[7]:
0 True
1 False
2 False
dtype: bool
In [8]: pd.Series([True, False, np.nan], dtype="boolean") & True
Out[8]:
0 True
1 False
2 <NA>
dtype: boolean