What’s new in 2.3.1 (July 7, 2025)#
These are the changes in pandas 2.3.1. See Release notes for a full changelog including other versions of pandas.
Improvements and fixes for the StringDtype#
Most changes in this release are related to StringDtype which will
become the default string dtype in pandas 3.0. See
Upcoming changes in pandas 3.0 for more details.
Comparisons between different string dtypes#
In previous versions, comparing Series of different string dtypes (e.g. pd.StringDtype("pyarrow", na_value=pd.NA) against pd.StringDtype("python", na_value=np.nan)) would result in inconsistent resulting dtype or incorrectly raise (GH 60639). pandas will now use the hierarchy
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
in determining the result dtype when there are different string dtypes compared. Some examples:
- When - pd.StringDtype("pyarrow", na_value=pd.NA)is compared against any other string dtype, the result will always be- boolean[pyarrow].
- When - pd.StringDtype("python", na_value=pd.NA)is compared against- pd.StringDtype("pyarrow", na_value=np.nan), the result will be- boolean, the NumPy-backed nullable extension array.
- When - pd.StringDtype("python", na_value=pd.NA)is compared against- pd.StringDtype("python", na_value=np.nan), the result will be- boolean, the NumPy-backed nullable extension array.
Index set operations ignore empty RangeIndex and object dtype Index#
When enabling the future.infer_string option, Index set operations (like
union or intersection) will now ignore the dtype of an empty RangeIndex or
empty Index with object dtype when determining the dtype of the resulting
Index (GH 60797).
This ensures that combining such empty Index with strings will infer the string dtype
correctly, rather than defaulting to object dtype. For example:
>>> pd.options.future.infer_string = True
>>> df = pd.DataFrame()
>>> df.columns.dtype
dtype('int64')               # default RangeIndex for empty columns
>>> df["a"] = [1, 2, 3]
>>> df.columns.dtype
<StringDtype(na_value=nan)>  # new columns use string dtype instead of object dtype
Bug fixes#
- Bug in - DataFrameGroupBy.min(),- DataFrameGroupBy.max(),- Resampler.min(),- Resampler.max()where all NA values of string dtype would return float instead of string dtype (GH 60810)
- Bug in - DataFrame.join()incorrectly downcasting object-dtype indexes (GH 61771)
- Bug in - DataFrame.sum()with- axis=1,- DataFrameGroupBy.sum()or- SeriesGroupBy.sum()with- skipna=True, and- Resampler.sum()with all NA values of- StringDtyperesulted in- 0instead of the empty string- ""(GH 60229)
- Fixed bug in - DataFrame.explode()and- Series.explode()where methods would fail with- dtype="str"(GH 61623)
- Fixed bug in unpickling objects pickled in pandas versions pre-2.3.0 that used - StringDtype(GH 61763)
Contributors#
A total of 10 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- David Krych 
- Irv Lustig 
- Joris Van den Bossche 
- Lumberbot (aka Jack) 
- Marc Garcia 
- Matthew Roeschke 
- Pandas Development Team 
- Ralf Gommers 
- Richard Shadrach 
- jbrockmendel