Nullable Boolean data type#

Note

BooleanArray is currently experimental. Its API or implementation may change without warning.

Indexing with NA values#

pandas allows indexing with NA values in a boolean array, which are treated as False.

In [1]: s = pd.Series([1, 2, 3])

In [2]: mask = pd.array([True, False, pd.NA], dtype="boolean")

In [3]: s[mask]
Out[3]: 
0    1
dtype: int64

If you would prefer to keep the NA values you can manually fill them with fillna(True).

In [4]: s[mask.fillna(True)]
Out[4]: 
0    1
2    3
dtype: int64

If you create a column of NA values (for example to fill them later) with df['new_col'] = pd.NA, the dtype would be set to object in the new column. The performance on this column will be worse than with the appropriate type. It’s better to use df['new_col'] = pd.Series(pd.NA, dtype="boolean") (or another dtype that supports NA).

In [5]: df = pd.DataFrame()

In [6]: df['objects'] = pd.NA

In [7]: df.dtypes
Out[7]: 
objects    object
dtype: object

Kleene logical operations#

arrays.BooleanArray implements Kleene Logic (sometimes called three-value logic) for logical operations like & (and), | (or) and ^ (exclusive-or).

This table demonstrates the results for every combination. These operations are symmetrical, so flipping the left- and right-hand side makes no difference in the result.

Expression

Result

True & True

True

True & False

False

True & NA

NA

False & False

False

False & NA

False

NA & NA

NA

True | True

True

True | False

True

True | NA

True

False | False

False

False | NA

NA

NA | NA

NA

True ^ True

False

True ^ False

True

True ^ NA

NA

False ^ False

False

False ^ NA

NA

NA ^ NA

NA

When an NA is present in an operation, the output value is NA only if the result cannot be determined solely based on the other input. For example, True | NA is True, because both True | True and True | False are True. In that case, we don’t actually need to consider the value of the NA.

On the other hand, True & NA is NA. The result depends on whether the NA really is True or False, since True & True is True, but True & False is False, so we can’t determine the output.

This differs from how np.nan behaves in logical operations. pandas treated np.nan is always false in the output.

In or

In [8]: pd.Series([True, False, np.nan], dtype="object") | True
Out[8]: 
0     True
1     True
2    False
dtype: bool

In [9]: pd.Series([True, False, np.nan], dtype="boolean") | True
Out[9]: 
0    True
1    True
2    True
dtype: boolean

In and

In [10]: pd.Series([True, False, np.nan], dtype="object") & True
Out[10]: 
0     True
1    False
2    False
dtype: bool

In [11]: pd.Series([True, False, np.nan], dtype="boolean") & True
Out[11]: 
0     True
1    False
2     <NA>
dtype: boolean