What’s new in 3.1.0 (Month XX, 2026)#
These are the changes in pandas 3.1.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
enhancement1#
enhancement2#
Other enhancements#
Periodnow supports f-string formatting via__format__, e.g.f"{period:%Y-%m}"(GH 48536)DataFrameGroupBy.agg()now allows for the providedfuncto return a NumPy array (GH 63957)Timestamp.round(),Timestamp.floor(), andTimestamp.ceil()now officially acceptTimedeltaarguments (GH 63687)ExtensionArray.map()now callsExtensionArray._cast_pointwise_result()to retain the dtype backend, e.g. Arrow-backed arrays now preserve their Arrow dtype throughmap(GH 57189, GH 62164)read_csv()now supportsdtype="complex64"anddtype="complex128"with the C engine, enabling round-tripping of complex-number columns written byDataFrame.to_csv()(GH 9379)Added
ExtensionArray.count()(GH 64450)Added
Index.replace()method to support value replacement functionality similar toSeries.replace()(GH 19495)Display formatting for float sequences in DataFrame cells now respects the
display.precisionoption (GH 60503).Improved the precision of float parsing in
read_csv()(GH 64395)Improved the string
reprofpd.core.arrays.SparseArray(GH 64547)Improved type inference of comparison and arithmetic operators on
SeriesandDataFramefor static type checkers (e.g.ser == "a"is now inferred asSeriesinstead ofAny) (GH 40762)MSVC is no longer required to build on Windows, and build errors when using the MinGW compiler have been fixed (GH 63160)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
notable_bug_fix1#
notable_bug_fix2#
Backwards incompatible API changes#
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
|---|---|---|---|
X |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
|---|---|---|
X |
See Dependencies and Optional dependencies for more.
Other API changes#
DataFrameGroupBy.sum()andDataFrameGroupBy.mean()on float dtypes with sorted grouping keys may differ from prior versions in the last few floating-point digits, due to a faster summation algorithm that does not use Kahan compensation (GH 65103)APIs that accept an
engine="numba"parameter withengine_kwargswill no longer pass through anopythonargument tonumba.jit. This argument has had no effect since numba 0.59.0 (GH 64483).Removed the
freqandfreqstrattributes fromDatetimeArrayandTimedeltaArray. Frequency is now stored only onDatetimeIndexandTimedeltaIndex; accessSeries.dt.freqor wrap the array in an Index to retrieve a frequency. Thecheck_freqkeyword ontesting.assert_extension_array_equal()for these array types has also been removed (GH 24566).
Deprecations#
Deprecated
Timestamp.dayofweek,Timestamp.dayofyear,Timestamp.daysinmonthin favor ofTimestamp.day_of_week,Timestamp.day_of_year,Timestamp.days_in_month, respectively. The same deprecation applies to the corresponding attributes onPeriod,DatetimeIndex,PeriodIndex, andSeries.dt(GH 46768)Deprecated
infer_freq(),DatetimeIndex.inferred_freq, andTimedeltaIndex.inferred_freqreturning a string; in a future version these will return aBaseOffsetinstead. Usepd.set_option('future.infer_freq_returns_offset', True)to opt in to the future behavior (GH 55504)Deprecated
set_eng_float_format(). Usepd.set_option("display.precision", N)to control decimal precision, or pass a custom callable topd.set_option("display.float_format", func)(GH 64460)Deprecated
DataFrameGroupBy.agg()andResampler.agg()unpacking a scalar when the providedfuncreturns a Series or array of length 1; in the future this will result in the Series or array being in the result. Users should unpack the scalar infuncitself (GH 64014)Deprecated
ExcelFile.parse(), useread_excel()instead (GH 58247)Deprecated
engine="fastparquet"andengine="auto"inread_parquet()andDataFrame.to_parquet(). Thefastparquetlibrary has been retired; useengine="pyarrow"or do not passengineto use the default. (GH 64597)Deprecated arithmetic operations between pandas objects (
DataFrame,Series,Index, and pandas-implementedExtensionArraysubclasses) and list-likes other thanlist,np.ndarray,ExtensionArray,Index,Series,DataFrame. For e.g.tupleorrange, explicitly cast these to a supported object instead. In a future version, these will be treated as scalar-like for pointwise operation (GH 62423)Deprecated automatic dtype promotion when reindexing with a
fill_valuethat cannot be held by the original dtype. Explicitly cast to a common dtype instead (GH 53910)Deprecated grouping by an index level name when the name matches multiple levels of a
MultiIndex. Use the level number instead (GH 49434)Deprecated implicit conversion of
datetime.dateobjects toTimestampwhen indexing or joining aDatetimeIndex. Useto_datetime()to explicitly convert toDatetimeIndexinstead (GH 62158)Deprecated passing a
dicttoDataFrame.from_records(), use theDataFrameconstructor orDataFrame.from_dict()instead (GH 22025)Deprecated passing a non-dict (e.g. a list of dicts) to
DataFrame.from_dict(). Use theDataFrameconstructor instead (GH 58862)Deprecated passing unnecessary
*argsand**kwargstoGroupBy.cumsum(),GroupBy.cumprod(),GroupBy.cummin(),GroupBy.cummax(),SeriesGroupBy.skew(),DataFrameGroupBy.skew(),SeriesGroupBy.take(), andDataFrameGroupBy.take(). Theskipnaparameter for the cum* methods is now an explicit keyword argument (GH 50407)Deprecated setting values with
DataFrame.at()andSeries.at()when the key does not exist in the index, which previously expanded the object. Use.locinstead (GH 48323)Deprecated the
%ndirective inPeriod.strftime()for nanoseconds; use%Ninstead.%nis a newline directive in Cstrftime(and Python’stime.strftime/datetime.strftime) (GH 65432)Deprecated the
.nameproperty of offset objects (e.g.,Day,Hour). Use.rule_codeinstead (GH 64207)Deprecated the
dropnakeyword inDataFrame.to_hdf(),HDFStore.put(),HDFStore.append(), andHDFStore.append_to_multiple(), and theio.hdf.dropna_tableoption. UseDataFrame.dropna()before writing instead (GH 32038)Deprecated the
float_precisionargument inread_csv(),read_table(), andread_fwf(). All float precision modes now use the same converter (GH 64395)Deprecated the
includeandexcludearguments ofSeries.describe(). They had no effect on a Series; filter dtypes upstream of the call instead (GH 54193)Deprecated the
weekdayproperty onDatetimeIndex,DatetimeArray,PeriodIndex,PeriodArray, andPeriod. Useday_of_weekinstead.Timestamp.weekday()remains a method consistent withdatetime.datetime.weekday()(GH 12816)Deprecated the
xlrdandpyxlsbengines inread_excel(). Useengine="calamine"instead (GH 56542)Deprecated the default value of
exactinassert_index_equal(); in a future version this will default toTrueinstead of “equiv” (GH 57436)Deprecated the default value of
track_timesinHDFStore.put(). In a future version, the default will change fromTruetoFalseso that HDF5 files are deterministic by default (GH 51456)Deprecated passing a non-boolean value for
numeric_onlytoDataFrame.mean(),DataFrame.min(),DataFrame.max(),DataFrame.median(),DataFrame.skew(),DataFrame.kurt(),DataFrame.std(),DataFrame.var(),DataFrame.sem(),DataFrame.sum(),DataFrame.prod()(and theirSeriesequivalents); this will raise in a future version of pandas (GH 53098)Deprecated the inference of
datetime64dtype from data containingdatetime.dateobjects when used in comparisons orIndex.equals()withDatetimeIndex. Usepandas.to_datetime()to explicitly convert todatetime64instead (GH 65056)Deprecated
inplacekeywordDataFrame.rename(). This keyword will be removed in a future version (GH 63207); see also PDEP-8
Performance improvements#
Performance improvement in
DataFrameGroupByaggregations (sum,mean,min,max,prod) when the grouping keys are already sorted (GH 65103)Performance improvement in casting integer and boolean dtypes to
string[pyarrow]by using PyArrow’s native cast instead of element-wise conversion (GH 56505)Performance improvement in
DataFrame.__getitem__()when selecting a single column by label on aDataFramewith duplicate column names. (GH 64126).Performance improvement in
Series.is_monotonic_increasingandSeries.is_monotonic_decreasingforArrowDtypeand masked dtypes by dispatching to theExtensionArray(GH 56619)Performance improvement in
DataFramerepr by avoiding redundant formatting when columns exceed terminal width (GH 64863)Performance improvement in
GroupByreductions and transformations forSparseDtypecolumns (GH 36123)Performance improvement in
bdate_range()anddate_range()withfreq="B"orfreq="C"(business day frequencies) (GH 16463)Performance improvement in
concat()andDataFrame.astype()to extension dtypes (GH 65672)Performance improvement in
infer_freq()(GH 64463)Performance improvement in
merge()andDataFrame.join()for many-to-many joins withsort=False(GH 56564)Performance improvement in
merge()for many-to-one joins with unique right keys (GH 38418)Performance improvement in
merge()withhow="cross"(GH 38082)Performance improvement in
merge()withhow="left"(GH 64370)Performance improvement in
merge()withhow="left"andsort=Falsewhen joining on a right index with unique keys (GH 65160)Performance improvement in
merge()withsort=Falsefor single-keyhow="left"/how="right"joins when the opposite join key is sorted, unique, and range-like (GH 64146)Performance improvement in
read_csv()withengine="c"when reading from binary file-like objects (e.g. PyArrow S3 file handles) by avoiding unnecessaryTextIOWrapperwrapping (GH 46823)Performance improvement in
read_html()and the Python CSV parser whenthousandsis set, fixing catastrophic regex backtracking on cells with many comma-separated digit groups followed by non-numeric text (GH 52619)Performance improvement in
read_sas()by reading page header fields directly in Cython instead of falling back to Python (GH 47339)Performance improvement in
read_sas()for SAS7BDAT files by pre-computing date/datetime column classification once during metadata parsing instead of per chunk (GH 47339)Performance improvement in
read_sas()for SAS7BDAT files with full-precision (8-byte) numeric columns, with up to ~2x speedup on bulk reads (GH 47339)Performance improvement in
read_sas()for compressed SAS7BDAT files by reusing the decompression buffer instead of allocating per row (GH 47339)Performance improvement in
read_sas()when decoding strings (GH 47339)Performance improvement in
tseries.frequencies.to_offset()parsing of frequency strings, especially for tick-resolution offsets (e.g."h","5min","3s") and compound expressions (e.g."1D1h") (GH 65395)Performance improvement in
util.hash_pandas_object()for PyArrow-backed string and binary types by using PyArrow’sdictionary_encodeinstead of converting to NumPy for factorization (GH 48964)Performance improvement in
DataFrameGroupBy.agg()andSeriesGroupBy.agg()with user-defined functions (GH 46505)Performance improvement in
DataFrame.corr()andDataFrame.cov()when data contains no NaN values (GH 64857)Performance improvement in
DataFrame.equals(),DataFrame.select_dtypes(), and other operations performing shallow column slicing on Arrow-backed columns (GH 58966)Performance improvement in
DataFrame.fillna()andSeries.fillna()with scalar fill value for float, object, nullable, and datetime-like dtypes (GH 42147)Performance improvement in
DataFrame.from_records()when passing a 2Dnumpy.ndarray(GH 22025)Performance improvement in
DataFrame.insert()when the number of blocks is small (GH 57641)Performance improvement in
DataFrame.loc()with non-unique masked index (GH 56759)Performance improvement in
DataFrame.query()andDataFrame.eval()when theDataFramecontainsPeriodDtypeorIntervalDtypecolumns (GH 35247)Performance improvement in
DataFrame.rank()andSeries.rank()for non-nullable numeric dtypes (GH 65054)Performance improvement in
DataFrame.sort_index()andSeries.sort_index()with thelevelparameter when the index is already sorted and not aMultiIndex(GH 64883)Performance improvement in
DataFrame.sort_values()with multiple numeric columns by avoiding unnecessaryCategoricalconversion (GH 15389)Performance improvement in
DataFrame.sum(),DataFrame.prod(),DataFrame.min(),DataFrame.max(),DataFrame.mean(),DataFrame.any(), andDataFrame.all()withaxis=1for multi-block DataFrames by avoiding a transpose (GH 51474)Performance improvement in
DataFrame.take(),Series.take(),DataFrame.reindex(),Series.reindex(), and boolean-array indexing for NumPy-backed dtypes (GH 65295)Performance improvement in
DataFrame.to_excel()with theopenpyxlengine when usingengine_kwargs={"write_only": True}, reducing memory consumption (GH 41681)Performance improvement in
DataFrame.to_stata()when writing object-dtype datetime columns with date formats that require year/month extraction (GH 64555)Performance improvement in
DataFrame.unstack()andSeries.unstack()when theMultiIndexis already sorted and the unstacked level is the last level (GH 65107)Performance improvement in
DatetimeIndex.month_name()andDatetimeIndex.day_name()when using the default string dtype by using PyArrow compute instead of going through an intermediate object array (GH 65104)Performance improvement in
DatetimeIndex.strftime()andSeries.dt.strftime()for formats composed of common directives (%Y,%m,%d,%H,%M,%S,%f) (GH 44764)Performance improvement in
GroupBy.any()andGroupBy.all()for boolean-dtype columns (GH 37850)Performance improvement in
GroupBy.first()andGroupBy.last()for Extension Array dtypes, which no longer fall back to a slowapply-based implementation (GH 57591)Performance improvement in
GroupBy.quantile()(GH 64330)Performance improvement in
GroupBy.size()(GH 51750)Performance improvement in
Index.get_indexer()for large monotonic indexes, which now uses binary search instead of building a hash table when the number of targets is small (GH 14273)Performance improvement in
Index.join()andIndex.union()forRangeIndexby avoiding unnecessary memory allocation in the libjoin fastpath (GH 54646)Performance improvement in
IntervalIndex.get_indexer()for monotonic non-overlapping indexes, which now uses binary search instead of the interval tree (GH 47614)Performance improvement in
NDFrame.__finalize__(),Series.to_numpy(),DataFrame.dtypes, andDataFrame.__getitem__()(GH 57431)Performance improvement in
Timedelta.total_seconds()(GH 65388)Performance improvement in
arrays.SparseArray.isna()by avoiding a dense-then-resparsify round-trip (GH 41023)Performance improvement in datetime/timedelta unit conversion (e.g.
datetime64[s]todatetime64[ns]) (GH 35025)Performance improvement in indexing a
DataFramewith aCategoricalIndexofIntervalcategories (GH 61928)Performance improvement in indexing a
MultiIndexwith a list-like indexer (GH 55786)Performance improvement in partial-string indexing on a monotonic decreasing
DatetimeIndexorPeriodIndex(GH 64811)Performance improvement in plotting
DatetimeIndexwith multiplied frequencies (e.g."1000ms","100s") (GH 50355)Performance improvement in reading zip-compressed files (e.g.
read_pickle(),read_csv()) on Python < 3.12 (GH 59279)Performance improvement in reductions along
axis=1and other operations on DataFrames produced byDataFrame.copy()(GH 60469)Performance improvement in repr of
SeriesandDataFramecontaining third-party array-like objects (e.g. xarrayDataArray) in object dtype columns (GH 61809)Performance improvement in
DataFrame.loc()andDataFrame.iloc()setitem with a 2D list-of-lists value by avoiding a wasteful round-trip through an intermediate object array (GH 64229).Performance improvement in
Series.reindex()andDataFrame.reindex()for non-nanoseconddatetime64andtimedelta64dtypes (GH 24566)Performance improvement in
Series.iloc()andDataFrame.iloc()when setting datetimelike values into object-dtype data with list-like indexers (GH 64250).
Bug fixes#
Fixed bug in
Indexrepr where attributes were not wrapped to respectdisplay.width(GH 11552)Fixed bug in
to_timedelta()andTimedeltanot accepting Day offsets (GH 64240)
Categorical#
Bug in
Categorical.__repr__()where the values and categories lines could exceeddisplay.width(GH 12066)Bug in
Categorical.map()where mapping with adefaultdictandna_action=Nonewould bypass the default factory by usingdict.get, causingNAvalues to be replaced withNaNinstead of the mapper’s default value (GH 62710)Bug in
CategoricalIndex.union()andCategoricalIndex.intersection()giving incorrect results when the two indexes have the same unordered categories in different orders (GH 55335)Bug in
Index.fillna()raisingTypeErrorwhen filling with a tuple value (e.g. on object-dtype orCategoricalIndexwith tuple categories) (GH 37681)
Datetimelike#
Bug in
DatetimeIndexconstructor raisingValueErrorwhen passing equivalent but not equal frequencies (e.g.QS-FEBvsQS-MAY) (GH 61086)Bug in
DatetimeIndexraisingAttributeErrorwhen comparing against Arrow date types (date32, date64) (GH 62051)Bug in
Timestampconstructor where passingnp.str_objects would fail in Cython string parsing (GH 48974)Bug in
Timestampconstructor where strings with a negative year of fewer than 4 digits (e.g."-111-01-01") silently dropped the leading"-"and were parsed as a positive year; BC dates with 1-4 digit years now parse correctly, matchingnumpy.datetime64(GH 55954)Bug in
Timestampconstructor,Timedeltaconstructor,to_datetime(), andto_timedelta()with non-roundfloatinput andunitfailing to raise when the value is just outside the representable bounds (GH 57366)Bug in
api.types.infer_dtype()returning"date"or"mixed"instead of"datetime"/"timedelta"for lists ofTimestamp/Timedeltavalues mixed withpd.NA(GH 53023)Bug in
date_range()whereinclusive="left"andinclusive="right"returned a single-element result instead of empty whenstartequalsend(GH 55293)Bug in
date_range()whereinclusiveparameter failed to filter endpoints when onlystartandperiodsorendandperiodswere specified (GH 46331)Bug in
date_range()whereperiods=1with offsets that disallown=0(e.g.offsets.LastWeekOfMonth,offsets.FY5253) raisedValueError(GH 41563)Bug in
date_range()where calendar-based offsets (e.g.MS,ME,QS,YS) could exclude the last offset boundary whenend’s time-of-day was earlier thanstart’s (GH 35342)Bug in
to_datetime()andto_timedelta()on ARM platforms where roundfloatvalues outside the int64 domain (e.g.float(2**63)) could silently produce incorrect results instead of raising (GH 64619)Bug in
to_datetime()andto_timedelta()whereuint64values greater thanint64max silently overflowed instead of raisingOutOfBoundsDatetimeorOutOfBoundsTimedelta(GH 60677)Bug in
to_datetime()when using a low time resolutionunit, higher resolution inoriginis now preserved instead of silently dropped (e.g.unit="D"with microsecond precision origin) (GH 63419)Bug in
DataFrame.replace()andSeries.replace()raisingAssertionErrorinstead ofOutOfBoundsDatetimewhen replacing with adatetimevalue outside thedatetime64[ns]range (GH 61671)Bug in
DataFrame.to_string()andSeries.to_string()wherena_repwas ignored for datetime and timedelta columns, always displayingNaT(GH 55426)Bug in
DatetimeArray.isin()andTimedeltaArray.isin()where mismatched resolutions could silently truncate finer-resolution values, leading to false matches (GH 64545)Bug in adding non-nano
DatetimeIndexwith non-vectorized offsets (e.g.CustomBusinessDay,CustomBusinessMonthEnd) having a sub-unitoffsetparameter incorrectly truncating the result or raisingAttributeError(GH 56586)Bug in subtracting
BusinessHour(orCustomBusinessHour) from aTimestampgiving incorrect results when the subtraction would land exactly on the business-hour opening time (GH 33682)
Timedelta#
Bug in
TimedeltaIndex.resolutionraising when the index has no frequency (GH 65186)Bug in
DateOffsetwhereDateOffset(1)andDateOffset(days=1)returned different results near daylight saving time transitions (GH 61862)Bug in
Timedeltaconstructor where keyword arguments (e.g.days=365000) that exceeded nanosecond int64 bounds raisedOutOfBoundsTimedeltainstead of falling back to a coarser resolution (GH 46587)Bug in
to_timedelta()where passingnp.str_objects would fail in Cython string parsing (GH 48974)
Timezones#
Bug in
DatetimeIndexaddition with aDateOffsetthat has only timedelta components (e.g.DateOffset(hours=-2)) raisingValueErrornear DST transitions, while scalarTimestampaddition worked correctly (GH 28610)Bug in
Timestamp.to_julian_date()andDatetimeIndex.to_julian_date()returning the Julian date of the local wall clock for timezone-aware inputs instead of the underlying UTC instant (GH 54763)
Numeric#
Fixed bug in
read_excel()where having a column with mixture of numeric and boolean values will typecast the values based on the first appearance data type since 1==True and 0==False (GH 60088)Fixed bug in
Series.clip()where passing a scalar numpy array (e.g.np.array(0)) would raise aTypeError(GH 59053)Fixed bug in
Series.mean()andSeries.sum()(and theirDataFramecounterparts) overflowing forfloat16dtypes instead of upcasting tofloat64(GH 43929)Fixed bug in
Series.skew()andSeries.kurt()(and theirDataFramecounterparts) returning0.0for degenerate distributions; these now returnNaN(GH 62864)Fixed bug in complex-dtype
Series.duplicated()andSeries.unique()(and related hashtable-backed methods) raisingTypeErrorwhen backed by aNumpyExtensionArray(GH 54761)Fixed bug where
DataFramearithmetic operations withSeriesdid not support the fill_value parameter(GH 61581)
Conversion#
Bug in
DataFrameconstructor whereNaTin aTimedeltaIndexrow was incorrectly inferred asdatetime64instead oftimedelta64(GH 23985)Bug in
DataFrameconstructor where constructing from a list of uniform-dtype arrays (e.g. pyarrow,CategoricalDtype, nullable dtypes) lost the dtype (GH 49593)Bug in
pd.array()silently converting NaN to a nonsensical integer when given float data containing NaN and a NumPy integer dtype (GH 41724)Fixed
pandas.array()to preserve mask information when converting NumPy masked arrays, converting masked values to missing values (GH 63879).Fixed bug in
DataFrame.from_records()whereexcludewas ignored whendatawas an iterator andnrows=0(GH 63774)
Strings#
Bug in
DataFrame.replace()withregex=Truemutating the underlyingStringArraywhen the replacement value was not a string (GH 57733)
Interval#
Bug in
IntervalArrayandIntervalIndexconstructors unnecessarily upcasting sub-64-bit numeric dtypes (e.g.float32,int32) to 64-bit (GH 45412)
Indexing#
Bug in
DataFrame.loc()andSeries.loc()replacing the index name with the key’s name when indexing with anIndex(GH 17110)Bug in
DataFrame.loc()raisingValueErrorwhen setting a row on aDataFramewith no columns and the label is not in the index (GH 17895)Bug in
DataFrame.loc()returning incorrect dtype when the column key is aslice(GH 63071)Bug in
Index.get_indexer()wheremethod="pad","backfill", or"nearest"returned incorrect results when the target containedNaTorNaNinstead of-1(GH 32572)Bugs in setitem-with-expansion when adding new rows failing to keep the original dtype in some cases (GH 32346, GH 15231, GH 47503, GH 6485, GH 25383, GH 52235, GH 17026, GH 56010)
Bug in
DataFrame.__getitem__()raisingInvalidIndexErrorwhen indexing with a tuple containing asliceon aDataFramewithMultiIndexcolumns (e.g.,df[:, "t1"]) (GH 26511)Bug in
DataFrame.at()raisingTypeErrorwhen accessing aMultiIndexwith a partial date string on aDatetimeIndexlevel (GH 43395)Bug in
DataFrame.duplicated()returning an emptySerieswithout the DataFrame’s index when the DataFrame had no columns (GH 61191)Bug in
DataFrame.iloc()setitem raisingAttributeErrorwhen assigning aSeriesorIndexwith a nullable EA dtype (e.g.Int64,Float64,boolean) into a column with a NumPy dtype (GH 47776)Bug in
DataFrame.loc()raisingValueErrorwhen assigning a list of tuples to an object-dtype column with a boolean mask on a mixed-dtype DataFrame (GH 37629)Bug in
DataFrame.loc()raisingValueErrorwhen setting a row with a list-like value on a single-columnDataFramewithExtensionArraydtype (GH 44103)Bug in
DataFrame.loc()setitem-with-expansion writing the truncated string"n"(or raisingTypeError) into rows outside the indexer when adding a new column from a list of strings or booleans (GH 42099)Bug in
DataFrame.loc()with aMultiIndexreturning wrong results instead of raisingKeyErrorwhen passing string keys for numeric index levels (GH 60104)Bug in
DataFrame.mask()withinplace=Truewhere incorrect values were produced whenotherwas aSerieswithExtensionArrayvalues (GH 64635)Bug in
DataFrame.rename()andSeries.rename()not preserving nullable extension dtype (e.g.Int64,Float64) when relabeling index or column labels (GH 65315)Bug in
DataFrame.where()andDataFrame.mask()raisingTypeErrorwhencondis aSeriesandaxis=1(GH 58190)Bug in
DataFrame.xs()wheredrop_level=Falsewas ignored for fully specifiedMultiIndexkeys whenlevelwas not explicitly provided (GH 6507)Bug in
Index.get_level_values()mishandling boolean, NA-like (np.nan,pd.NA,pd.NaT) and integer index names (GH 62169)Bug in
Index.get_loc()raisingKeyErrorwhen looking up a tuple in an object-dtypeIndexwith duplicates (GH 37800)Bug in
Index.insert()silently casting booleans to numeric when used with nullable numeric dtypes likeFloat64orInt64(GH 61709)Bug in
Index.take()wherefill_valuewas silently ignored and integer-dtype indexes raisedValueErrorinstead of filling with the provided value. Passingfill_value=Nonenow fills-1entries with the Index’s NA value (matching theExtensionArrayconvention); omitfill_valueto retain the previous behavior where negative indices wrap (GH 65210)Bug in
Index.where()andIndex.putmask()preservingnumpy.datetime64/numpy.timedelta64NaTscalars in the object-dtype result for mismatched-dtype inputs, instead of normalizing topandas.NaTasSeries.where()does (GH 55174)Bug in
MultiIndex.get_loc()returning a slice instead of an integer for a unique key when theMultiIndexcontained duplicates elsewhere, causing.locto return aSeriesinstead of a scalar (GH 42102)Bug in
Series.where()andSeries.mask()raisingValueErrorwhenotheris a tuple on object-dtypeSeries(GH 37681)Fixed bug in
DataFrame.loc()where assigning an iterable to a single cell in anobjectdtype column incorrectly raised aValueError(GH 57962)Fixed bug in
DataFrame.loc()where assigning with duplicate column names and new columns corrupted unrelated columns (GH 58317)Fixed segfault in
DataFrame.loc()when repeatedly adding new rows to an object-dtype-indexedDataFrame(GH 21968)
Missing#
Bug in
DataFrame.fillna()with a dict value raisingRecursionErrorwhen columns are aMultiIndexwith duplicate entries (GH 53498)Bug in
Series.combine_first()crashing when Series names areTimestampobjects (GH 65333)
MultiIndex#
Bug in
DataFrame.loc()with aMultiIndexwhere using a tuple indexer with a scalar and a list (e.g.,(scalar, list)) did not drop the scalar-indexed level (GH 18631)Bug in
MultiIndex.get_loc()where looking up a tuple key containing a scalar inside anIntervalIndexlevel with overlapping intervals raisedKeyErroror returned incorrect results (GH 27456)Bug in
MultiIndex.sortlevel()not raisingTypeErrorwhen sorting a level with incomparable types (e.g.,Timestampandstr) (GH 21136)
I/O#
read_csv()withmemory_map=Trueand an in-memory buffer (e.g.BytesIO) now raises a clearValueErrorinstead of a crypticUnsupportedOperation: fileno(GH 45630)Fixed bug in
read_csv()with thecengine where an embedded\rfollowed by a space in an unquoted field could cause an infinite re-parsing loop, producing spurious rows or a buffer overflow (GH 51141)Fixed bug in
read_excel()where usage ofskiprowscould lead to an infinite loop (GH 64027)Fixed bug where
read_html()parsed nested tables incorrectly when usinghtml5liborbs4flavors (GH 64524)Fixed
read_json()withlines=Trueandchunksizeto respectnrowswhen the requested row count is not a multiple of the chunk size (GH 64025)HDFStore.put()andHDFStore.append()now support storingSeriesandDataFramecolumns withPeriodDtypein both"fixed"and"table"formats (GH 41978)Bug in
DataFrame.__repr__()raisingTypeErrorfor a column with a NumPy structured dtype (e.g. produced byDataFrame.from_records()from a structuredndarray) (GH 55011)Bug in
DataFrame.__repr__()where horizontally truncated output could exceed the terminal width by up to 4 characters (GH 32461)Bug in
DataFrame.to_stata()raisingKeyErrorwhen column names require renaming andconvert_datesis specified for a different column (GH 60536)Bug in
DataFrame.to_string()whereformattersdict was applied to wrong columns when output was horizontally truncated viamax_cols(GH 35410)Fixed
read_json()withlines=Trueandnrows=0to return an empty DataFrame (GH 64025)DataFrame.to_hdf()now raises a clearNotImplementedErrorwhen writing a column orIndexof an unsupported extension dtype (such asIntervalDtype,SparseDtype, or the nullable integer/float/boolean dtypes), instead of a low-levelAttributeErroror PyTablesTypeError(GH 26144, GH 38305, GH 42070)Fixed
MemoryErrorinHDFStore.select()when iterating large tables withchunksizeand nowherefilter (GH 15937)Fixed bug in
DataFrame.to_hdf()withformat="table"where aTimedeltaIndexwas reconstructed as aPeriodIndex(whenfreqwas set) or an integerIndex(otherwise) on read-back (GH 21466)Fixed bug in
HDFStore.select()where passingwhereas a list of conditions referencing caller-scope variables failed on Python 3.12+ due to PEP 709 inlining list comprehension stack frames (GH 64881)Storing a
DataFrameorSerieswith aMultiIndexlevel named'index'viaHDFStore.put()orHDFStore.append()withformat='table'now raises a clearValueErrorinstead of an opaque reshape error (GH 6208)
Period#
Bug in
Periodconstructor where passingnp.str_objects would fail in Cython string parsing (GH 48974)Bug in
Period.strftime()where unknown format directives (e.g."%Q") silently produced platform-dependent output and crashed the Python process on Windows; anInvalid format stringValueErroris now raised on all platforms (GH 53562)
Plotting#
Bug in
DataFrame.plot.hexbin()ignoringrcParams["image.cmap"]and always defaulting to"BuGn"when no colormap was specified (GH 31871)
Groupby/resample/rolling#
Bug in
DataFrameGroupBy.agg()when there are no groups, multiple keys, andgroup_keys=False(GH 51445)Bug in
DataFrameGroupBy.agg()would operate on the group as a whole whenargsorkwargsare supplied for the providedfunc; now this method only operates on each Series of the group (GH 39169)Bug in
DataFrameGroupBy.apply()withas_index=Falsewhere applying on an emptyDataFramereturned inconsistent index metadata compared to non-empty results (GH 48135)Bug in
DataFrameGroupBy.cumprod(),DataFrameGroupBy.cummin(), andDataFrameGroupBy.cummax()(and Series variants) returningFloat64instead of preserving the nullable integer dtype (e.g.Int64) when the group key containsNA(GH 65550)Bug in
GroupBy.any()andGroupBy.all()returningNaNwithfloat64dtype for unobserved categorical groups on NumPybooldata instead of the boolean identity value withbooldtype (GH 65100)Bug in
Resampler.agg()raisingValueErrorwith a dict of aggregations when applied to aDataFrame.groupby()withas_index=False(GH 52397)Bug in
Rolling.skew()andRolling.kurt()(and theirGroupBycounterparts) returning0.0and-3.0respectively for degenerate windows or groups; these now returnNaN(GH 62864)Bug in
Rolling.skew()andRolling.kurt()returningNaNfor low-variance windows (GH 62946)Bug in
Rolling.sum(),Rolling.mean(),Rolling.median(),Rolling.min(), andRolling.max()withmethod="table",engine="numba", andengine_kwargs={"parallel": True}could cause a segfault (GH 40454)Bug in
SeriesGroupBy.ohlc()ignoringas_index=False(GH 65140)Bug in
DataFrame.groupby()with aGrouperwithfreqraisingAttributeErrorwhen all grouping keys areNaT(GH 43486)Bug in
Series.resample()andDataFrame.resample()where same-frequency resampling with monthly, quarterly, or annual frequencies bypassed aggregation, returning the original values instead of the aggregation result (GH 18553)
Reshaping#
Bug in
merge()where merging on aMultiIndexcontainingNaNvalues mappedNaNkeys to the last level value instead ofNaN(GH 64492)Bug in
DataFrame.pivot_table()withmargins=TrueraisingTypeErrorwhenvalueshas anExtensionDtypethat cannot holdNA(e.g.IntervalDtypewith an integer subtype) and nocolumnswere specified (GH 55484)Bug in
Index.union()where the result could be unsorted when both inputs were monotonic increasing but disjoint, whensortwas notFalse(GH 54646)In
pivot_table(), whenvaluesis empty, the aggregation will be computed on a Series of all NA values (GH 46475)
Sparse#
Bug in
SparseArray.astype()where converting a datetime64SparseArraywithNaTfill value to"Sparse[int64]"silently replaced the fill value with0instead ofiNaT(GH 49631)Bug in indexing a
SparseArraywith an out-of-bounds integer with the value of the length of the array returning the fill value instead of raising anIndexError(GH 64183).
ExtensionArray#
Bug in numpy ufuncs like
numpy.isnan()raisingTypeErroronSeriesorIndexbacked by PyArrow dtypes whenfuture.distinguish_nan_and_naisTrue(GH 62506)Fixed bug in
Series.apply()andSeries.map()where nullable integer dtypes were converted to float, causing precision loss for large integers; now the nullable dtype will be preserved (GH 63903).Fixed the
is_monotonic_increasingandis_monotonic_decreasingproperties to dispatch to the underlyingExtensionArrayimplementation (GH 65585)
Styler#
Other#
Bug in
Series.transform()andDataFrame.transform()where passing a list of duplicate function names did not raiseerrors.SpecificationError(GH 54929)