pandas.DataFrame.select_dtypes#

DataFrame.select_dtypes(include=None, exclude=None)[source]#

Return a subset of the DataFrame’s columns based on the column dtypes.

This method allows for filtering columns based on their data types. It is useful when working with heterogeneous DataFrames where operations need to be performed on a specific subset of data types.

Parameters:

include, excludescalar or list-like: A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

Returns:

DataFrame: The subset of the frame including the dtypes in include and excluding the dtypes in exclude.

Raises:

ValueError

If both of include and exclude are empty
If include and exclude have overlapping elements

TypeError

If any kind of string dtype is passed in.

See also

DataFrame.dtypes: Return Series with the data type of each column.

Notes

To select all numeric types, use np.number or 'number'
To select strings you must use the object dtype, but note that this will return all object dtype columns. With pd.options.future.infer_string enabled, using "str" will work to select all string columns.
See the numpy dtype hierarchy
A dtype instance (e.g. np.dtype("int32") or pd.CategoricalDtype(["a", "b"])) selects only columns with exactly that dtype, whereas a class or string selects a family of dtypes. Under-specified instances like a unitless np.dtype("datetime64"), a bare pd.CategoricalDtype(), or a pd.IntervalDtype("int64") without a closed select their whole family
To select datetimes, use np.datetime64, 'datetime' or 'datetime64'
To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'
To select datetimes or timedeltas of a specific resolution, pass a unit-qualified dtype such as 'datetime64[us]' or 'timedelta64[ms]'; this matches only columns with exactly that resolution, whereas an unqualified spec matches every resolution
To select Pandas categorical dtypes, use 'category'
To select Pandas datetimetz dtypes, use 'datetimetz' or 'datetime64[ns, tz]'
An ExtensionDtype subclass matches every instance of that subclass regardless of parametrization, e.g. pd.ArrowDtype selects all pyarrow-backed columns and pd.CategoricalDtype selects all categorical columns

Examples

>>> df = pd.DataFrame(
...     {"a": [1, 2] * 3, "b": [True, False] * 3, "c": [1.0, 2.0] * 3}
... )
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0

>>> df.select_dtypes(include="bool")
   b
True
False
True
False
True
False

>>> df.select_dtypes(include=["float64"])
   c
1.0
2.0
1.0
2.0
1.0
2.0

>>> df.select_dtypes(exclude=["int64"])
       b    c
 True  1.0
False  2.0
 True  1.0
False  2.0
 True  1.0
False  2.0