What’s new in 3.1.0 (Month XX, 2026)#

These are the changes in pandas 3.1.0. See Release notes for a full changelog including other versions of pandas.

Enhancements#

enhancement1#

enhancement2#

Other enhancements#

Period now supports f-string formatting via __format__, e.g. f"{period:%Y-%m}" (GH 48536)
DataFrameGroupBy.agg() now allows for the provided func to return a NumPy array (GH 63957)
DataFrameGroupBy.transform() now accepts list-like and dict arguments similar to GroupBy.agg(), and supports NamedFunc (GH 58318)
Series.to_json() now supports serializing custom ExtensionArrays (by correctly using the _values_for_json method of an ExtensionArray) (GH 65047)
Timestamp.round(), Timestamp.floor(), and Timestamp.ceil() now officially accept Timedelta arguments (GH 63687)
Added NamedFunc, an alias to NamedAgg for a more semantically accurate name when used with non-aggregation functions; either can accept arbitrary functions (GH 65164)
ExtensionArray.map() now calls ExtensionArray._cast_pointwise_result() to retain the dtype backend, e.g. Arrow-backed arrays now preserve their Arrow dtype through map (GH 57189, GH 62164)
read_csv() now supports dtype="complex64" and dtype="complex128" with the C engine, enabling round-tripping of complex-number columns written by DataFrame.to_csv() (GH 9379)
Added ExtensionArray.count() (GH 64450)
Added ExtensionArray.sort() for in-place sorting of ExtensionArray (GH 64977)
Added Index.replace() method to support value replacement functionality similar to Series.replace() (GH 19495)
Added reduction methods as public API on pandas-implemented extension arrays where applicable (GH 63512)
Display formatting for float sequences in DataFrame cells now respects the display.precision option (GH 60503).
Improved the precision of float parsing in read_csv() (GH 64395)
Improved the string repr of pd.core.arrays.SparseArray (GH 64547)
Improved type inference of comparison and arithmetic operators on Series and DataFrame for static type checkers (e.g. ser == "a" is now inferred as Series instead of Any) (GH 40762)
MSVC is no longer required to build on Windows, and build errors when using the MinGW compiler have been fixed (GH 63160)

Notable bug fixes#

These are bug fixes that might have notable behavior changes.

notable_bug_fix1#

notable_bug_fix2#

Backwards incompatible API changes#

Increased minimum versions for dependencies#

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package	Minimum Version	Required	Changed
		X	X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package	Minimum Version	Changed
		X

See Dependencies and Optional dependencies for more.

Other API changes#

DataFrameGroupBy.sum() and DataFrameGroupBy.mean() on float dtypes with sorted grouping keys may differ from prior versions in the last few floating-point digits, due to a faster summation algorithm that does not use Kahan compensation (GH 65103)
DataFrame.to_hdf() no longer writes a pandas_version attribute into the HDF5 file. The value was hardcoded to "0.15.2" and was only used internally to detect files written by pandas versions older than 0.10.1 (GH 62792)
APIs that accept an engine="numba" parameter with engine_kwargs will no longer pass through a nopython argument to numba.jit. This argument has had no effect since numba 0.59.0 (GH 64483).
Removed the freq and freqstr attributes from DatetimeArray and TimedeltaArray. Frequency is now stored only on DatetimeIndex and TimedeltaIndex; access Series.dt.freq or wrap the array in an Index to retrieve a frequency. The check_freq keyword on testing.assert_extension_array_equal() for these array types has also been removed (GH 24566).

Deprecations#

Deprecated Timestamp.dayofweek, Timestamp.dayofyear, Timestamp.daysinmonth in favor of Timestamp.day_of_week, Timestamp.day_of_year, Timestamp.days_in_month, respectively. The same deprecation applies to the corresponding attributes on Period, DatetimeIndex, PeriodIndex, and Series.dt (GH 46768)
Deprecated PeriodIndex and PeriodArray inferring the frequency from a Series of datetime64 data when freq is not provided. Pass freq explicitly instead (GH 64241)
Deprecated infer_freq(), DatetimeIndex.inferred_freq, and TimedeltaIndex.inferred_freq returning a string; in a future version these will return a BaseOffset instead. Use pd.set_option('future.infer_freq_returns_offset', True) to opt in to the future behavior (GH 55504)
Deprecated set_eng_float_format(). Use pd.set_option("display.precision", N) to control decimal precision, or pass a custom callable to pd.set_option("display.float_format", func) (GH 64460)
Deprecated DataFrameGroupBy.agg() and Resampler.agg() unpacking a scalar when the provided func returns a Series or array of length 1; in the future this will result in the Series or array being in the result. Users should unpack the scalar in func itself (GH 64014)
Deprecated ExcelFile.parse(), use read_excel() instead (GH 58247)
Deprecated engine="fastparquet" and engine="auto" in read_parquet() and DataFrame.to_parquet(). The fastparquet library has been retired; use engine="pyarrow" or do not pass engine to use the default. (GH 64597)
Deprecated arithmetic operations between pandas objects (DataFrame, Series, Index, and pandas-implemented ExtensionArray subclasses) and list-likes other than list, np.ndarray, ExtensionArray, Index, Series, DataFrame. For e.g. tuple or range, explicitly cast these to a supported object instead. In a future version, these will be treated as scalar-like for pointwise operation (GH 62423)
Deprecated automatic dtype promotion when reindexing with a fill_value that cannot be held by the original dtype. Explicitly cast to a common dtype instead (GH 53910)
Deprecated grouping by an index level name when the name matches multiple levels of a MultiIndex. Use the level number instead (GH 49434)
Deprecated implicit conversion of datetime.date objects to Timestamp when indexing or joining a DatetimeIndex. Use to_datetime() to explicitly convert to DatetimeIndex instead (GH 62158)
Deprecated passing a dict to DataFrame.from_records(), use the DataFrame constructor or DataFrame.from_dict() instead (GH 22025)
Deprecated passing a non-dict (e.g. a list of dicts) to DataFrame.from_dict(). Use the DataFrame constructor instead (GH 58862)
Deprecated passing unnecessary *args and **kwargs to GroupBy.cumsum(), GroupBy.cumprod(), GroupBy.cummin(), GroupBy.cummax(), SeriesGroupBy.skew(), DataFrameGroupBy.skew(), SeriesGroupBy.take(), and DataFrameGroupBy.take(). The skipna parameter for the cum* methods is now an explicit keyword argument (GH 50407)
Deprecated setting values with DataFrame.at() and Series.at() when the key does not exist in the index, which previously expanded the object. Use .loc instead (GH 48323)
Deprecated the %n directive in Period.strftime() for nanoseconds; use %N instead. %n is a newline directive in C strftime (and Python’s time.strftime / datetime.strftime) (GH 65432)
Deprecated the .name property of offset objects (e.g., Day, Hour). Use .rule_code instead (GH 64207)
Deprecated the dropna keyword in DataFrame.to_hdf(), HDFStore.put(), HDFStore.append(), and HDFStore.append_to_multiple(), and the io.hdf.dropna_table option. Use DataFrame.dropna() before writing instead (GH 32038)
Deprecated the float_precision argument in read_csv(), read_table(), and read_fwf(). All float precision modes now use the same converter (GH 64395)
Deprecated the include and exclude arguments of Series.describe(). They had no effect on a Series; filter dtypes upstream of the call instead (GH 54193)
Deprecated the weekday property on DatetimeIndex, DatetimeArray, PeriodIndex, PeriodArray, and Period. Use day_of_week instead. Timestamp.weekday() remains a method consistent with datetime.datetime.weekday() (GH 12816)
Deprecated the xlrd and pyxlsb engines in read_excel(). Use engine="calamine" instead (GH 56542)
Deprecated the default value of exact in assert_index_equal(); in a future version this will default to True instead of “equiv” (GH 57436)
Deprecated the default value of track_times in HDFStore.put(). In a future version, the default will change from True to False so that HDF5 files are deterministic by default (GH 51456)
Deprecated the keyword by_blocks in testing.assert_frame_equal() (GH 65911)
The default date_format="epoch" deprecation warning in DataFrame.to_json() and Series.to_json() is now also emitted when the datetime-like values are in the index or column labels rather than in the column values (GH 65868)
Deprecated passing a non-boolean value for numeric_only to DataFrame.mean(), DataFrame.min(), DataFrame.max(), DataFrame.median(), DataFrame.skew(), DataFrame.kurt(), DataFrame.std(), DataFrame.var(), DataFrame.sem(), DataFrame.sum(), DataFrame.prod() (and their Series equivalents); this will raise in a future version of pandas (GH 53098)
Deprecated the inference of datetime64 dtype from data containing datetime.date objects when used in comparisons or Index.equals() with DatetimeIndex. Use pandas.to_datetime() to explicitly convert to datetime64 instead (GH 65056)
Deprecated inplace keyword for DataFrame.rename() and DataFrame.drop(). This keyword will be removed in a future version (GH 63207); see also PDEP-8

Performance improvements#

Performance improvement in DataFrameGroupBy aggregations (sum, mean, min, max, prod) when the grouping keys are already sorted (GH 65103)
Performance improvement in casting integer and boolean dtypes to string[pyarrow] by using PyArrow’s native cast instead of element-wise conversion (GH 56505)
Performance improvement in DataFrame.__getitem__() when selecting a single column by label on a DataFrame with duplicate column names. (GH 64126).
Performance improvement in Series.is_monotonic_increasing and Series.is_monotonic_decreasing for ArrowDtype and masked dtypes by dispatching to the ExtensionArray (GH 56619)
Performance improvement in DataFrame repr by avoiding redundant formatting when columns exceed terminal width (GH 64863)
Performance improvement in GroupBy reductions and transformations for SparseDtype columns (GH 36123)
Performance improvement in bdate_range() and date_range() with freq="B" or freq="C" (business day frequencies) (GH 16463)
Performance improvement in concat() and DataFrame.astype() to extension dtypes (GH 65672)
Performance improvement in infer_freq() (GH 64463)
Performance improvement in merge() and DataFrame.join() for many-to-many joins with sort=False (GH 56564)
Performance improvement in merge() for many-to-one joins with unique right keys (GH 38418)
Performance improvement in merge() with how="cross" (GH 38082)
Performance improvement in merge() with how="left" (GH 64370)
Performance improvement in merge() with how="left" and sort=False when joining on a right index with unique keys (GH 65160)
Performance improvement in merge() with sort=False for single-key how="left"/how="right" joins when the opposite join key is sorted, unique, and range-like (GH 64146)
Performance improvement in read_csv() with engine="c" when parsing integer columns (GH 65347)
Performance improvement in read_csv() with engine="c" when reading from binary file-like objects (e.g. PyArrow S3 file handles) by avoiding unnecessary TextIOWrapper wrapping (GH 46823)
Performance improvement in read_hdf() for the fixed (default) format, especially on large frames (GH 47726)
Performance improvement in read_html() and the Python CSV parser when thousands is set, fixing catastrophic regex backtracking on cells with many comma-separated digit groups followed by non-numeric text (GH 52619)
Performance improvement in read_sas() by reading page header fields directly in Cython instead of falling back to Python (GH 47339)
Performance improvement in read_sas() for SAS7BDAT files by pre-computing date/datetime column classification once during metadata parsing instead of per chunk (GH 47339)
Performance improvement in read_sas() for SAS7BDAT files with full-precision (8-byte) numeric columns, with up to ~2x speedup on bulk reads (GH 47339)
Performance improvement in read_sas() for compressed SAS7BDAT files by reusing the decompression buffer instead of allocating per row (GH 47339)
Performance improvement in read_sas() when decoding strings (GH 47339)
Performance improvement in read_sql() with ADBC connections by requesting only table metadata when checking whether an input string names a table (GH 65652)
Performance improvement in to_datetime() with the default cache=True for inputs that are already datetime-typed or use a unit (GH 65380)
Performance improvement in tseries.frequencies.to_offset() parsing of frequency strings, especially for tick-resolution offsets (e.g. "h", "5min", "3s") and compound expressions (e.g. "1D1h") (GH 65395)
Performance improvement in util.hash_pandas_object() for PyArrow-backed string and binary types by using PyArrow’s dictionary_encode instead of converting to NumPy for factorization (GH 48964)
Performance improvement in DataFrameGroupBy.agg() and SeriesGroupBy.agg() with user-defined functions (GH 46505)
Performance improvement in DataFrame.apply() with axis=1 when the DataFrame has ExtensionDtype columns (e.g. ArrowDtype) (GH 61747)
Performance improvement in DataFrame.corr() and DataFrame.cov() when data contains no NaN values (GH 64857)
Performance improvement in DataFrame.equals(), DataFrame.select_dtypes(), and other operations performing shallow column slicing on Arrow-backed columns (GH 58966)
Performance improvement in DataFrame.fillna() and Series.fillna() with scalar fill value for float, object, nullable, and datetime-like dtypes (GH 42147)
Performance improvement in DataFrame.from_records() when passing a 2D numpy.ndarray (GH 22025)
Performance improvement in DataFrame.insert() when the number of blocks is small (GH 57641)
Performance improvement in DataFrame.loc() with non-unique masked index (GH 56759)
Performance improvement in DataFrame.query() and DataFrame.eval() when the DataFrame contains PeriodDtype or IntervalDtype columns (GH 35247)
Performance improvement in DataFrame.rank() and Series.rank() for non-nullable numeric dtypes (GH 65054)
Performance improvement in DataFrame.sort_index() and Series.sort_index() with the level parameter when the index is already sorted and not a MultiIndex (GH 64883)
Performance improvement in DataFrame.sort_values() with multiple numeric columns by avoiding unnecessary Categorical conversion (GH 15389)
Performance improvement in DataFrame.sum(), DataFrame.prod(), DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.any(), and DataFrame.all() with axis=1 for multi-block DataFrames by avoiding a transpose (GH 51474)
Performance improvement in DataFrame.take(), Series.take(), DataFrame.reindex(), Series.reindex(), and boolean-array indexing for NumPy-backed dtypes (GH 65295)
Performance improvement in DataFrame.to_excel() with the openpyxl engine when using engine_kwargs={"write_only": True}, reducing memory consumption (GH 41681)
Performance improvement in DataFrame.to_hdf() and HDFStore.append() for table format when appending to an existing wide table with many data_columns (GH 25839)
Performance improvement in DataFrame.to_stata() when writing object-dtype datetime columns with date formats that require year/month extraction (GH 64555)
Performance improvement in DataFrame.unstack() and Series.unstack() when the MultiIndex is already sorted and the unstacked level is the last level (GH 65107)
Performance improvement in DataFrame.xs() and Series.xs() with a partial key on a MultiIndex (GH 38650)
Performance improvement in DatetimeIndex.month_name() and DatetimeIndex.day_name() when using the default string dtype by using PyArrow compute instead of going through an intermediate object array (GH 65104)
Performance improvement in DatetimeIndex.strftime() and Series.dt.strftime() for formats composed of common directives (%Y, %m, %d, %H, %M, %S, %f) (GH 44764)
Performance improvement in GroupBy.any() and GroupBy.all() for boolean-dtype columns (GH 37850)
Performance improvement in GroupBy.first() and GroupBy.last() for Extension Array dtypes, which no longer fall back to a slow apply-based implementation (GH 57591)
Performance improvement in GroupBy.quantile() (GH 64330)
Performance improvement in GroupBy.size() (GH 51750)
Performance improvement in HDFStore.select_as_multiple() when no where clause is given, by avoiding a coordinate-based read (GH 26771)
Performance improvement in Index.get_indexer() for large monotonic indexes, which now uses binary search instead of building a hash table when the number of targets is small (GH 14273)
Performance improvement in Index.join() and Index.union() for RangeIndex by avoiding unnecessary memory allocation in the libjoin fastpath (GH 54646)
Performance improvement in IntervalIndex.get_indexer() for monotonic non-overlapping indexes, which now uses binary search instead of the interval tree (GH 47614)
Performance improvement in NDFrame.__finalize__(), Series.to_numpy(), DataFrame.dtypes, and DataFrame.__getitem__() (GH 57431)
Performance improvement in Timedelta.total_seconds() (GH 65388)
Performance improvement in arrays.SparseArray.isna() by avoiding a dense-then-resparsify round-trip (GH 41023)
Performance improvement in datetime/timedelta unit conversion (e.g. datetime64[s] to datetime64[ns]) (GH 35025)
Performance improvement in indexing a DataFrame with a CategoricalIndex of Interval categories (GH 61928)
Performance improvement in indexing a MultiIndex with a list-like indexer (GH 55786)
Performance improvement in partial-string indexing on a monotonic decreasing DatetimeIndex or PeriodIndex (GH 64811)
Performance improvement in plotting DatetimeIndex with multiplied frequencies (e.g. "1000ms", "100s") (GH 50355)
Performance improvement in reading zip-compressed files (e.g. read_pickle(), read_csv()) on Python < 3.12 (GH 59279)
Performance improvement in reductions along axis=1 and other operations on DataFrames produced by DataFrame.copy() (GH 60469)
Performance improvement in repr of Series and DataFrame containing third-party array-like objects (e.g. xarray DataArray) in object dtype columns (GH 61809)
Performance improvement in DataFrame.loc() and DataFrame.iloc() setitem with a 2D list-of-lists value by avoiding a wasteful round-trip through an intermediate object array (GH 64229).
Performance improvement in Series.reindex() and DataFrame.reindex() for non-nanosecond datetime64 and timedelta64 dtypes (GH 24566)
Performance improvement in Series.iloc() and DataFrame.iloc() when setting datetimelike values into object-dtype data with list-like indexers (GH 64250).
Performance improvement in the Series.dt duration component accessors (days, seconds, microseconds, nanoseconds, components, etc.) for ArrowDtype durations by using PyArrow compute instead of converting to TimedeltaArray (GH 63470)
Performance improvement in tab completion and DataFrame.__dir__() for DataFrame and Series with a large string-valued index or large number of columns (GH 18587).

Bug fixes#

Fixed bug in Index repr where attributes were not wrapped to respect display.width (GH 11552)
Fixed bug in to_timedelta() and Timedelta not accepting Day offsets (GH 64240)

Categorical#

Bug in Categorical.__repr__() where the values and categories lines could exceed display.width (GH 12066)
Bug in Categorical.map() where mapping with a defaultdict and na_action=None would bypass the default factory by using dict.get, causing NA values to be replaced with NaN instead of the mapper’s default value (GH 62710)
Bug in CategoricalIndex.union() and CategoricalIndex.intersection() giving incorrect results when the two indexes have the same unordered categories in different orders (GH 55335)
Bug in Index.fillna() raising TypeError when filling with a tuple value (e.g. on object-dtype or CategoricalIndex with tuple categories) (GH 37681)

Datetimelike#

Bug in ArrowExtensionArray where adding a DateOffset to a date32[pyarrow] or date64[pyarrow] Series raised an ArrowTypeError (GH 57168)
Bug in DatetimeIndex constructor raising ValueError when passing equivalent but not equal frequencies (e.g. QS-FEB vs QS-MAY) (GH 61086)
Bug in DatetimeIndex raising AttributeError when comparing against Arrow date types (date32, date64) (GH 62051)
Bug in Timestamp constructor where passing np.str_ objects would fail in Cython string parsing (GH 48974)
Bug in Timestamp constructor where strings with a negative year of fewer than 4 digits (e.g. "-111-01-01") silently dropped the leading "-" and were parsed as a positive year; BC dates with 1-4 digit years now parse correctly, matching numpy.datetime64 (GH 55954)
Bug in Timestamp constructor, Timedelta constructor, to_datetime(), and to_timedelta() with non-round float input and unit failing to raise when the value is just outside the representable bounds (GH 57366)
Bug in api.types.infer_dtype() returning "date" or "mixed" instead of "datetime" / "timedelta" for lists of Timestamp/Timedelta values mixed with pd.NA (GH 53023)
Bug in date_range() where inclusive="left" and inclusive="right" returned a single-element result instead of empty when start equals end (GH 55293)
Bug in date_range() where inclusive parameter failed to filter endpoints when only start and periods or end and periods were specified (GH 46331)
Bug in date_range() where periods=1 with offsets that disallow n=0 (e.g. offsets.LastWeekOfMonth, offsets.FY5253) raised ValueError (GH 41563)
Bug in date_range() where calendar-based offsets (e.g. MS, ME, QS, YS) could exclude the last offset boundary when end’s time-of-day was earlier than start’s (GH 35342)
Bug in to_datetime() and to_timedelta() on ARM platforms where round float values outside the int64 domain (e.g. float(2**63)) could silently produce incorrect results instead of raising (GH 64619)
Bug in to_datetime() and to_timedelta() where uint64 values greater than int64 max silently overflowed instead of raising OutOfBoundsDatetime or OutOfBoundsTimedelta (GH 60677)
Bug in to_datetime() when using a low time resolution unit, higher resolution in origin is now preserved instead of silently dropped (e.g. unit="D" with microsecond precision origin) (GH 63419)
Bug in DataFrame.replace() and Series.replace() raising AssertionError instead of OutOfBoundsDatetime when replacing with a datetime value outside the datetime64[ns] range (GH 61671)
Bug in DataFrame.to_string() and Series.to_string() where na_rep was ignored for datetime and timedelta columns, always displaying NaT (GH 55426)
Bug in DatetimeArray.isin() and TimedeltaArray.isin() where mismatched resolutions could silently truncate finer-resolution values, leading to false matches (GH 64545)
Bug in Series.dt.isocalendar() with a pyarrow-backed datetime or date dtype not preserving the original index, resetting it to a default RangeIndex (GH 65894)
Bug in adding non-nano DatetimeIndex with non-vectorized offsets (e.g. CustomBusinessDay, CustomBusinessMonthEnd) having a sub-unit offset parameter incorrectly truncating the result or raising AttributeError (GH 56586)
Bug in subtracting BusinessHour (or CustomBusinessHour) from a Timestamp giving incorrect results when the subtraction would land exactly on the business-hour opening time (GH 33682)

Timedelta#

Bug in TimedeltaIndex.resolution raising when the index has no frequency (GH 65186)
Bug in DateOffset where DateOffset(1) and DateOffset(days=1) returned different results near daylight saving time transitions (GH 61862)
Bug in Timedelta constructor where keyword arguments (e.g. days=365000) that exceeded nanosecond int64 bounds raised OutOfBoundsTimedelta instead of falling back to a coarser resolution (GH 46587)
Bug in to_timedelta() where passing np.str_ objects would fail in Cython string parsing (GH 48974)
Bug in Series.sum() on an overflowing timedelta64 series raising a plain ValueError instead of OutOfBoundsTimedelta (GH 43178)
Bug in Series.dt.seconds and Series.dt.microseconds with ArrowDtype durations returning the Series.dt.components field values (e.g. 0-59 for seconds, or 0 for microseconds on a "ms" unit) instead of the totals within each day and second respectively, inconsistent with NumPy-backed timedeltas (GH 63470)

Timezones#

Bug in DatetimeIndex addition with a DateOffset that has only timedelta components (e.g. DateOffset(hours=-2)) raising ValueError near DST transitions, while scalar Timestamp addition worked correctly (GH 28610)
Bug in Timestamp.to_julian_date() and DatetimeIndex.to_julian_date() returning the Julian date of the local wall clock for timezone-aware inputs instead of the underlying UTC instant (GH 54763)

Numeric#

Fixed bug in read_excel() where having a column with mixture of numeric and boolean values will typecast the values based on the first appearance data type since 1==True and 0==False (GH 60088)
Fixed bug in Series.clip() where passing a scalar numpy array (e.g. np.array(0)) would raise a TypeError (GH 59053)
Fixed bug in Series.mean() and Series.sum() (and their DataFrame counterparts) overflowing for float16 dtypes instead of upcasting to float64 (GH 43929)
Fixed bug in Series.skew() and Series.kurt() (and their DataFrame counterparts) returning 0.0 for degenerate distributions; these now return NaN (GH 62864)
Fixed bug in complex-dtype Series.duplicated() and Series.unique() (and related hashtable-backed methods) raising TypeError when backed by a NumpyExtensionArray (GH 54761)
Fixed bug where DataFrame arithmetic operations with Series did not support the fill_value parameter(GH 61581)

Conversion#

Bug in DataFrame constructor where NaT in a TimedeltaIndex row was incorrectly inferred as datetime64 instead of timedelta64 (GH 23985)
Bug in DataFrame constructor where constructing from a list of uniform-dtype arrays (e.g. pyarrow, CategoricalDtype, nullable dtypes) lost the dtype (GH 49593)
Bug in pd.array() silently converting NaN to a nonsensical integer when given float data containing NaN and a NumPy integer dtype (GH 41724)
Fixed pandas.array() to preserve mask information when converting NumPy masked arrays, converting masked values to missing values (GH 63879)
Fixed bug in DataFrame constructor where mutating the result could corrupt the source Series or Index when built with dtype="str" and infer_string=False (GH 63936)
Fixed bug in DataFrame.from_records() where exclude was ignored when data was an iterator and nrows=0 (GH 63774)

Strings#

Bug in DataFrame.replace() with regex=True mutating the underlying StringArray when the replacement value was not a string (GH 57733)
Bug in Series.memory_usage() with deep=True raising TypeError on PyPy for str dtype with Python storage (GH 46176)
Bug in Series.str.rsplit() and Index.str.rsplit() silently accepting a compiled regex and returning incorrect results (GH 29633)

Interval#

Bug in IntervalArray and IntervalIndex constructors unnecessarily upcasting sub-64-bit numeric dtypes (e.g. float32, int32) to 64-bit (GH 45412)
Bug in cut() and other operations building an IntervalIndex engine raising TypeError on 32-bit platforms when there were more than 100 intervals (GH 44075, GH 23440)

Indexing#

Bug in DataFrame.loc() and Series.loc() replacing the index name with the key’s name when indexing with an Index (GH 17110)
Bug in DataFrame.loc() raising ValueError when setting a row on a DataFrame with no columns and the label is not in the index (GH 17895)
Bug in DataFrame.loc() returning incorrect dtype when the column key is a slice (GH 63071)
Bug in Index.get_indexer() where method="pad", "backfill", or "nearest" returned incorrect results when the target contained NaT or NaN instead of -1 (GH 32572)
Bugs in setitem-with-expansion when adding new rows failing to keep the original dtype in some cases (GH 32346, GH 15231, GH 47503, GH 6485, GH 25383, GH 52235, GH 17026, GH 56010)
Bug in DataFrame.__getitem__() raising InvalidIndexError when indexing with a tuple containing a slice on a DataFrame with MultiIndex columns (e.g., df[:, "t1"]) (GH 26511)
Bug in DataFrame.at() raising TypeError when accessing a MultiIndex with a partial date string on a DatetimeIndex level (GH 43395)
Bug in DataFrame.duplicated() returning an empty Series without the DataFrame’s index when the DataFrame had no columns (GH 61191)
Bug in DataFrame.iloc() setitem raising AttributeError when assigning a Series or Index with a nullable EA dtype (e.g. Int64, Float64, boolean) into a column with a NumPy dtype (GH 47776)
Bug in DataFrame.loc() raising ValueError when assigning a list of tuples to an object-dtype column with a boolean mask on a mixed-dtype DataFrame (GH 37629)
Bug in DataFrame.loc() raising ValueError when setting a row with a list-like value on a single-column DataFrame with ExtensionArray dtype (GH 44103)
Bug in DataFrame.loc() setitem-with-expansion writing the truncated string "n" (or raising TypeError) into rows outside the indexer when adding a new column from a list of strings or booleans (GH 42099)
Bug in DataFrame.loc() with a MultiIndex returning wrong results instead of raising KeyError when passing string keys for numeric index levels (GH 60104)
Bug in DataFrame.mask() with inplace=True where incorrect values were produced when other was a Series with ExtensionArray values (GH 64635)
Bug in DataFrame.rename() and Series.rename() not preserving nullable extension dtype (e.g. Int64, Float64) when relabeling index or column labels (GH 65315)
Bug in DataFrame.where() and DataFrame.mask() raising TypeError when cond is a Series and axis=1 (GH 58190)
Bug in DataFrame.xs() where drop_level=False was ignored for fully specified MultiIndex keys when level was not explicitly provided (GH 6507)
Bug in Index.get_indexer_non_unique() raising ZeroDivisionError instead of returning an all-missing result when called on an empty index with a non-empty target (GH 54746)
Bug in Index.get_level_values() mishandling boolean, NA-like (np.nan, pd.NA, pd.NaT) and integer index names (GH 62169)
Bug in Index.get_loc() raising KeyError when looking up a tuple in an object-dtype Index with duplicates (GH 37800)
Bug in Index.insert() silently casting booleans to numeric when used with nullable numeric dtypes like Float64 or Int64 (GH 61709)
Bug in Index.take() where fill_value was silently ignored and integer-dtype indexes raised ValueError instead of filling with the provided value. Passing fill_value=None now fills -1 entries with the Index’s NA value (matching the ExtensionArray convention); omit fill_value to retain the previous behavior where negative indices wrap (GH 65210)
Bug in Index.where() and Index.putmask() preserving numpy.datetime64 / numpy.timedelta64 NaT scalars in the object-dtype result for mismatched-dtype inputs, instead of normalizing to pandas.NaT as Series.where() does (GH 55174)
Bug in MultiIndex.get_loc() returning a slice instead of an integer for a unique key when the MultiIndex contained duplicates elsewhere, causing .loc to return a Series instead of a scalar (GH 42102)
Bug in RangeIndex.memory_usage() and RangeIndex.nbytes raising TypeError on PyPy (GH 46176)
Bug in Series.where() and Series.mask() raising ValueError when other is a tuple on object-dtype Series (GH 37681)
Fixed bug in DataFrame.loc() where assigning an iterable to a single cell in an object dtype column incorrectly raised a ValueError (GH 26333, GH 57962)
Fixed bug in DataFrame.loc() where assigning with duplicate column names and new columns corrupted unrelated columns (GH 58317)
Fixed segfault in DataFrame.loc() when repeatedly adding new rows to an object-dtype-indexed DataFrame (GH 21968)

Missing#

Bug in DataFrame.fillna() with a dict value raising RecursionError when columns are a MultiIndex with duplicate entries (GH 53498)
Bug in Series.combine_first() crashing when Series names are Timestamp objects (GH 65333)

MultiIndex#

Bug in MultiIndex where pickling a DataFrame with a datetime64[ns] level raised NotImplementedError (GH 63078)
Bug in DataFrame.loc() with a MultiIndex where using a tuple indexer with a scalar and a list (e.g., (scalar, list)) did not drop the scalar-indexed level (GH 18631)
Bug in MultiIndex.get_loc() where looking up a tuple key containing a scalar inside an IntervalIndex level with overlapping intervals raised KeyError or returned incorrect results (GH 27456)
Bug in MultiIndex.sortlevel() not raising TypeError when sorting a level with incomparable types (e.g., Timestamp and str) (GH 21136)

I/O#

read_csv() with memory_map=True and an in-memory buffer (e.g. BytesIO) now raises a clear ValueError instead of a cryptic UnsupportedOperation: fileno (GH 45630)
read_sql_table() now raises an informative NotImplementedError (instead of one with no message) when passed a DBAPI connection such as sqlite3, and reading from a URI string without a usable sqlalchemy install now raises a clearer ImportError (GH 41237)
Fixed bug in read_csv() with the c engine where an embedded \r followed by a space in an unquoted field could cause an infinite re-parsing loop, producing spurious rows or a buffer overflow (GH 51141)
Fixed bug in read_excel() where usage of skiprows could lead to an infinite loop (GH 64027)
Fixed bug where read_html() parsed nested tables incorrectly when using html5lib or bs4 flavors (GH 64524)
Fixed memory leak in read_csv() (GH 19941)
Fixed segfault when instantiating the internal pandas._libs.parsers.TextReader with no arguments; it now raises TypeError (GH 53131)
Fixed read_json() with lines=True and chunksize to respect nrows when the requested row count is not a multiple of the chunk size (GH 64025)
HDFStore.put() and HDFStore.append() now support storing Series and DataFrame columns with PeriodDtype in both "fixed" and "table" formats (GH 41978)
Bug in DataFrame.__repr__() raising TypeError for a column with a NumPy structured dtype (e.g. produced by DataFrame.from_records() from a structured ndarray) (GH 55011)
Bug in DataFrame.__repr__() where horizontally truncated output could exceed the terminal width by up to 4 characters (GH 32461)
Bug in DataFrame.to_stata() raising KeyError when column names require renaming and convert_dates is specified for a different column (GH 60536)
Bug in DataFrame.to_string() where formatters dict was applied to wrong columns when output was horizontally truncated via max_cols (GH 35410)
Fixed read_json() with lines=True and nrows=0 to return an empty DataFrame (GH 64025)
DataFrame.to_hdf() now raises a clear NotImplementedError when writing a column or Index of an unsupported extension dtype (such as IntervalDtype, SparseDtype, or the nullable integer/float/boolean dtypes), instead of a low-level AttributeError or PyTables TypeError (GH 26144, GH 38305, GH 42070)
read_hdf() can again read fixed-format files written by very old pandas versions (<=0.15.x) that stored a freq attribute on non-datetimelike indexes, which previously failed with a TypeError or ValueError (GH 33186)
DataFrame.to_hdf() with format="fixed" now compresses object dtype (e.g. string) columns when complib/complevel are given; previously the compression settings were silently ignored for these columns, producing much larger files (GH 45286)
HDFStore.put(), HDFStore.append(), and DataFrame.to_hdf() now emit a UserWarning instead of silently doing nothing when writing an empty DataFrame or Series with format='table' or append=True (GH 13016)
HDFStore.select() and read_hdf() now warn when a nested where of the form "(A & B) | (C & D)" over indexed columns may return incorrect results because of an upstream PyTables bug, suggesting writing with index=False or running the OR branches as separate queries (GH 50598)
HDFStore.select() now raises a clear ValueError with a workaround, instead of an opaque too many inputs error, when a where expression has too many comparisons for a query against indexed columns (GH 39752)
HDFStore.select() now raises an informative NotImplementedError instead of a cryptic KeyError when a column selection such as where="columns=['A']" is combined with iterator=True or chunksize; pass the columns argument instead (GH 12953)
HDFStore.select() now raises an informative NotImplementedError when a where clause contains an arithmetic expression such as "(A % 3) == 0", instead of an opaque PyTables TypeError; arithmetic in where filters is not supported (GH 41100)
Bug in HDFStore.select() where a where query on a categorical data column for a value that is not one of the categories incorrectly matched rows with missing (NaN) values (GH 22977)
Fixed MemoryError in HDFStore.select() when iterating large tables with chunksize and no where filter (GH 15937)
Fixed bug in read_hdf() raising on files written by older pandas versions whose freq index attribute could not be decoded; the freq is now dropped with a warning instead of corrupting the index (GH 35917)
Fixed bug in read_hdf() where a categorical column containing a category equal to the nan_rep string (e.g. the default "nan") raised ValueError: operands could not be broadcast together instead of reading that category back as NaN (GH 21741)
Fixed bug in read_hdf() where the literal string "nan" in a string Index was incorrectly converted to NaN on read, even when a custom nan_rep was supplied (GH 9604)
Fixed bug in DataFrame.to_hdf() and HDFStore.put() where writing an object to a key silently deleted any nested keys stored beneath it (GH 17267)
Fixed bug in DataFrame.to_hdf() raising TypeError when the index had a non-tick DateOffset freq (e.g. DateOffset(years=1)) (GH 45790)
Fixed bug in DataFrame.to_hdf() with format="table" where a TimedeltaIndex was reconstructed as a PeriodIndex (when freq was set) or an integer Index (otherwise) on read-back (GH 21466)
Fixed DataFrame.to_hdf() and Series.to_hdf() to round-trip a CategoricalIndex in both "fixed" and "table" formats; previously raised AssertionError (GH 33909, GH 16118)
Bug in Series.to_json() with date_format="iso" where a timezone-aware datetime Series was serialized without the trailing Z marker, losing the timezone information that is retained for an equivalent DatetimeIndex or DataFrame column (GH 65744)
Fixed bug in DataFrame.to_parquet() (pyarrow engine) where a local file path was opened twice, once by pandas and again by pyarrow, wasting a syscall and silently truncating output to 0 bytes on filesystems that finalize a file’s contents on close (GH 65810)
Fixed bug in HDFStore.get_storer() where .shape reported a phantom row for a fixed-format Series or DataFrame stored with no rows (GH 37235)
Fixed bug in HDFStore.remove() where a where clause selecting on more than 31 values (e.g. "index in [...]") deleted every row in the table instead of only the matching rows (GH 17567)
Fixed bug in HDFStore.select() where passing where as a list of conditions referencing caller-scope variables failed on Python 3.12+ due to PEP 709 inlining list comprehension stack frames (GH 64881)
Fixed bug in HDFStore.select() with format="table" where reading a frame with a string Index could crash with a bus error on strict-alignment platforms such as 32-bit ARM (GH 54396)
Storing a DataFrame or Series with a MultiIndex level named 'index' via HDFStore.put() or HDFStore.append() with format='table' now raises a clear ValueError instead of an opaque reshape error (GH 6208)
The PerformanceWarning emitted by DataFrame.to_hdf() for object columns now names only the columns that cannot be mapped to a c-type, instead of every object column sharing the same block (GH 28460)
Writing a DataFrame with format='table' and a column named 'index' as a data_columns entry (including data_columns=True) now raises a clear ValueError instead of an opaque reshape error (GH 41437)

Period#

Bug in Period constructor where passing np.str_ objects would fail in Cython string parsing (GH 48974)
Bug in Period.strftime() where unknown format directives (e.g. "%Q") silently produced platform-dependent output and crashed the Python process on Windows; an Invalid format string ValueError is now raised on all platforms (GH 53562)

Plotting#

Bug in DataFrame.plot.hexbin() ignoring rcParams["image.cmap"] and always defaulting to "BuGn" when no colormap was specified (GH 31871)

Groupby/resample/rolling#

Bug in DataFrameGroupBy.agg() when there are no groups, multiple keys, and group_keys=False (GH 51445)
Bug in DataFrameGroupBy.agg() would operate on the group as a whole when args or kwargs are supplied for the provided func; now this method only operates on each Series of the group (GH 39169)
Bug in DataFrameGroupBy.apply() with as_index=False where applying on an empty DataFrame returned inconsistent index metadata compared to non-empty results (GH 48135)
Bug in DataFrameGroupBy.cumprod(), DataFrameGroupBy.cummin(), and DataFrameGroupBy.cummax() (and Series variants) returning Float64 instead of preserving the nullable integer dtype (e.g. Int64) when the group key contains NA (GH 65550)
Bug in GroupBy.any() and GroupBy.all() returning NaN with float64 dtype for unobserved categorical groups on NumPy bool data instead of the boolean identity value with bool dtype (GH 65100)
Bug in Resampler.agg() raising ValueError with a dict of aggregations when applied to a DataFrame.groupby() with as_index=False (GH 52397)
Bug in Rolling.corr() and Rolling.cov() computing incorrect results on degenerate windows (GH 24019)
Bug in Rolling.skew() and Rolling.kurt() (and their GroupBy counterparts) returning 0.0 and -3.0 respectively for degenerate windows or groups; these now return NaN (GH 62864)
Bug in Rolling.skew() and Rolling.kurt() returning NaN for low-variance windows (GH 62946)
Bug in Rolling.sum(), Rolling.mean(), Rolling.median(), Rolling.min(), and Rolling.max() with method="table", engine="numba", and engine_kwargs={"parallel": True} could cause a segfault (GH 40454)
Bug in SeriesGroupBy.ohlc() ignoring as_index=False (GH 65140)
Bug in DataFrame.groupby() with a Grouper with freq raising AttributeError when all grouping keys are NaT (GH 43486)
Bug in Series.resample() and DataFrame.resample() where same-frequency resampling with monthly, quarterly, or annual frequencies bypassed aggregation, returning the original values instead of the aggregation result (GH 18553)

Reshaping#

Bug in concat() raising InvalidIndexError when keys or the concatenated objects’ index was an overlapping IntervalIndex (GH 64825)
Bug in merge() where merging on a MultiIndex containing NaN values mapped NaN keys to the last level value instead of NaN (GH 64492)
Bug in DataFrame.melt() where var_name colliding with an id_vars column or value_name silently overwrote the affected column data instead of raising (GH 65654)
Bug in DataFrame.pivot_table() with margins=True raising TypeError when values has an ExtensionDtype that cannot hold NA (e.g. IntervalDtype with an integer subtype) and no columns were specified (GH 55484)
Bug in Index.union() where the result could be unsorted when both inputs were monotonic increasing but disjoint, when sort was not False (GH 54646)
Fixed bug in Series.sort_values() where ignore_index=True had no effect on an already-sorted Series (GH 65833)
In pivot_table(), when values is empty, the aggregation will be computed on a Series of all NA values (GH 46475)

Sparse#

Bug in Series.mean() with skipna=False ignoring missing values for SparseDtype-backed Series (GH 65478)
Bug in SparseArray.astype() where converting a datetime64 SparseArray with NaT fill value to "Sparse[int64]" silently replaced the fill value with 0 instead of iNaT (GH 49631)
Bug in SparseArray.mean() raising a TypeError when called with the skipna argument (GH 65478)
Bug in indexing a SparseArray with an out-of-bounds integer with the value of the length of the array returning the fill value instead of raising an IndexError (GH 64183).

ExtensionArray#

Bug in DataFrame.any() and DataFrame.all() with skipna=False not propagating pd.NA on numpy-nullable columns (boolean, Int*, UInt*, Float*); axis=0 raised ValueError and axis=1 returned a concrete True/False (GH 65710)
Bug in numpy ufuncs like numpy.isnan() raising TypeError on Series or Index backed by PyArrow dtypes when future.distinguish_nan_and_na is True (GH 62506)
Fixed bug in Series.apply() and Series.map() where nullable integer dtypes were converted to float, causing precision loss for large integers; now the nullable dtype will be preserved (GH 63903).
Fixed the is_monotonic_increasing and is_monotonic_decreasing properties to dispatch to the underlying ExtensionArray implementation (GH 65585)

Styler#

Other#

Bug in DataFrame.eval() where a duplicate column name was resolved to a single column (yielding a Series), inconsistent with pandas.eval() using resolvers=(df,) and with DataFrame.__getitem__(), which include every column with that label (GH 65588)
Bug in Series.transform() and DataFrame.transform() where passing a list of duplicate function names did not raise errors.SpecificationError (GH 54929)