What’s new in 3.1.0 (Month XX, 2026)#

These are the changes in pandas 3.1.0. See Release notes for a full changelog including other versions of pandas.

Enhancements#

enhancement1#

enhancement2#

Other enhancements#

  • DataFrameGroupBy.agg() now allows for the provided func to return a NumPy array (GH 63957)

  • Added ExtensionArray.count() (GH 64450)

  • Display formatting for float sequences in DataFrame cells now respects the display.precision option (GH 60503).

  • Improved the precision of float parsing in read_csv() (GH 64395)

  • Improved the string repr of pd.core.arrays.SparseArray (GH 64547)

Notable bug fixes#

These are bug fixes that might have notable behavior changes.

notable_bug_fix1#

notable_bug_fix2#

Backwards incompatible API changes#

Increased minimum versions for dependencies#

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package

Minimum Version

Required

Changed

X

X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package

Minimum Version

Changed

X

See Dependencies and Optional dependencies for more.

Other API changes#

  • APIs that accept an engine="numba" parameter with engine_kwargs will no longer pass through a nopython argument to numba.jit. This argument has had no effect since numba 0.59.0 (GH 64483).

Deprecations#

Performance improvements#

  • Performance improvement in casting integer and boolean dtypes to string[pyarrow] by using PyArrow’s native cast instead of element-wise conversion (GH 56505)

  • Performance improvement in DataFrame.__getitem__() when selecting a single column by label on a DataFrame with duplicate column names. (GH 64126).

  • Performance improvement in Series.is_monotonic_increasing and Series.is_monotonic_decreasing for ArrowDtype and masked dtypes by dispatching to the ExtensionArray (GH 56619)

  • Performance improvement in GroupBy reductions and transformations for SparseDtype columns, which now use Cython instead of falling back to slow Python aggregation (GH 36123)

  • Performance improvement in bdate_range() and date_range() with freq="B" or freq="C" (business day frequencies) (GH 16463)

  • Performance improvement in infer_freq() (GH 64463)

  • Performance improvement in merge() and DataFrame.join() for many-to-many joins with sort=False (GH 56564)

  • Performance improvement in merge() with how="cross" (GH 38082)

  • Performance improvement in merge() with how="left" (GH 64370)

  • Performance improvement in read_csv() with engine="c" when reading from binary file-like objects (e.g. PyArrow S3 file handles) by avoiding unnecessary TextIOWrapper wrapping (GH 46823)

  • Performance improvement in read_html() and the Python CSV parser when thousands is set, fixing catastrophic regex backtracking on cells with many comma-separated digit groups followed by non-numeric text (GH 52619)

  • Performance improvement in read_sas() for compressed SAS7BDAT files by reusing the decompression buffer instead of allocating per row (GH 47339)

  • Performance improvement in util.hash_pandas_object() for PyArrow-backed string and binary types by using PyArrow’s dictionary_encode instead of converting to NumPy for factorization (GH 48964)

  • Performance improvement in DataFrame.insert() when the number of blocks is small (GH 57641)

  • Performance improvement in DataFrame.loc() with non-unique masked index (GH 56759)

  • Performance improvement in DataFrame.query() and DataFrame.eval() when the DataFrame contains PeriodDtype or IntervalDtype columns (GH 35247)

  • Performance improvement in DataFrame.to_stata() when writing object-dtype datetime columns with date formats that require year/month extraction (GH 64555)

  • Performance improvement in GroupBy.any() and GroupBy.all() for boolean-dtype columns (GH 37850)

  • Performance improvement in GroupBy.first() and GroupBy.last() for Extension Array dtypes, which no longer fall back to a slow apply-based implementation (GH 57591)

  • Performance improvement in GroupBy.quantile() (GH 64330)

  • Performance improvement in Index.get_indexer() for large monotonic indexes, which now uses binary search instead of building a hash table when the number of targets is small (GH 14273)

  • Performance improvement in NDFrame.__finalize__(), Series.to_numpy(), DataFrame.dtypes, and DataFrame.__getitem__() by reducing overhead from metadata propagation, memory sharing checks, and attribute setting (GH 57431)

  • Performance improvement in arrays.SparseArray.isna() by avoiding a dense-then-resparsify round-trip (GH 41023)

  • Performance improvement in datetime/timedelta unit conversion (e.g. datetime64[s] to datetime64[ns]) (GH 35025)

  • Performance improvement in indexing a DataFrame with a CategoricalIndex of Interval categories (GH 61928)

  • Performance improvement in indexing a MultiIndex with a list-like indexer (GH 55786)

  • Performance improvement in partial-string indexing on a monotonic decreasing DatetimeIndex or PeriodIndex (GH 64811)

  • Performance improvement in plotting DatetimeIndex with multiplied frequencies (e.g. "1000ms", "100s") (GH 50355)

  • Performance improvement in reading zip-compressed files (e.g. read_pickle(), read_csv()) on Python < 3.12 (GH 59279)

  • Performance improvement in repr of Series and DataFrame containing third-party array-like objects (e.g. xarray DataArray) in object dtype columns (GH 61809)

  • Performance improvement in DataFrame.loc() and DataFrame.iloc() setitem with a 2D list-of-lists value by avoiding a wasteful round-trip through an intermediate object array (GH 64229).

Bug fixes#

Categorical#

Datetimelike#

  • Bug in Timestamp constructor where passing np.str_ objects would fail in Cython string parsing (GH 48974)

  • Bug in Timestamp constructor, Timedelta constructor, to_datetime(), and to_timedelta() with non-round float input and unit failing to raise when the value is just outside the representable bounds (GH 57366)

  • Bug in date_range() where inclusive parameter failed to filter endpoints when only start and periods or end and periods were specified (GH 46331)

  • Bug in date_range() where calendar-based offsets (e.g. MS, ME, QS, YS) could exclude the last offset boundary when end’s time-of-day was earlier than start’s (GH 35342)

  • Bug in to_datetime() and to_timedelta() on ARM platforms where round float values outside the int64 domain (e.g. float(2**63)) could silently produce incorrect results instead of raising (GH 64619)

  • Bug in to_datetime() and to_timedelta() where uint64 values greater than int64 max silently overflowed instead of raising OutOfBoundsDatetime or OutOfBoundsTimedelta (GH 60677)

  • Bug in DatetimeArray.isin() and TimedeltaArray.isin() where mismatched resolutions could silently truncate finer-resolution values, leading to false matches (GH 64545)

  • Bug in adding non-nano DatetimeIndex with non-vectorized offsets (e.g. CustomBusinessDay, CustomBusinessMonthEnd) having a sub-unit offset parameter incorrectly truncating the result or raising AttributeError (GH 56586)

Timedelta#

  • Bug in DateOffset where DateOffset(1) and DateOffset(days=1) returned different results near daylight saving time transitions (GH 61862)

  • Bug in to_timedelta() where passing np.str_ objects would fail in Cython string parsing (GH 48974)

Timezones#

  • Bug in DatetimeIndex addition with a DateOffset that has only timedelta components (e.g. DateOffset(hours=-2)) raising ValueError near DST transitions, while scalar Timestamp addition worked correctly (GH 28610)

Numeric#

Conversion#

Strings#

Interval#

Indexing#

Missing#

MultiIndex#

I/O#

  • Fixed bug in read_csv() with the c engine where an embedded \r followed by a space in an unquoted field could cause an infinite re-parsing loop, producing spurious rows or a buffer overflow (GH 51141)

  • Fixed bug in read_excel() where usage of skiprows could lead to an infinite loop (GH 64027)

  • Fixed bug in HDFStore.put() where string extension dtype columns raised errors when using compression (GH 64180)

  • Fixed read_json() with lines=True and chunksize to respect nrows when the requested row count is not a multiple of the chunk size (GH 64025)

  • Bug in DataFrame.to_stata() raising KeyError when column names require renaming and convert_dates is specified for a different column (GH 60536)

  • Fixed read_json() with lines=True and nrows=0 to return an empty DataFrame (GH 64025)

  • Fixed bug in HDFStore.select() where passing where as a list of conditions referencing caller-scope variables failed on Python 3.12+ due to PEP 709 inlining list comprehension stack frames (GH 64881)

Period#

  • Bug in Period constructor where passing np.str_ objects would fail in Cython string parsing (GH 48974)

Plotting#

Groupby/resample/rolling#

Reshaping#

  • Bug in merge() where merging on a MultiIndex containing NaN values mapped NaN keys to the last level value instead of NaN (GH 64492)

  • In pivot_table(), when values is empty, the aggregation will be computed on a Series of all NA values (GH 46475)

Sparse#

  • Bug in indexing a SparseArray with an out-of-bounds integer with the value of the length of the array returning the fill value instead of raising an IndexError (GH 64183).

ExtensionArray#

  • Fixed bug in Series.apply() and Series.map() where nullable integer dtypes were converted to float, causing precision loss for large integers; now the nullable dtype will be preserved (GH 63903).

Styler#

Other#

Contributors#