What’s new in 3.1.0 (Month XX, 2026)#
These are the changes in pandas 3.1.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
enhancement1#
enhancement2#
Other enhancements#
DataFrameGroupBy.agg()now allows for the providedfuncto return a NumPy array (GH 63957)Added
ExtensionArray.count()(GH 64450)Display formatting for float sequences in DataFrame cells now respects the
display.precisionoption (GH 60503).Improved the precision of float parsing in
read_csv()(GH 64395)Improved the string
reprofpd.core.arrays.SparseArray(GH 64547)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
notable_bug_fix1#
notable_bug_fix2#
Backwards incompatible API changes#
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
|---|---|---|---|
X |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
|---|---|---|
X |
See Dependencies and Optional dependencies for more.
Other API changes#
APIs that accept an
engine="numba"parameter withengine_kwargswill no longer pass through anopythonargument tonumba.jit. This argument has had no effect since numba 0.59.0 (GH 64483).
Deprecations#
Deprecated
Timestamp.dayofweek,Timestamp.dayofyear,Timestamp.daysinmonthin favor ofTimestamp.day_of_week,Timestamp.day_of_year,Timestamp.days_in_month, respectively. The same deprecation applies to the corresponding attributes onPeriod,DatetimeIndex,PeriodIndex, andSeries.dt(GH 46768)Deprecated
DataFrameGroupBy.agg()andResampler.agg()unpacking a scalar when the providedfuncreturns a Series or array of length 1; in the future this will result in the Series or array being in the result. Users should unpack the scalar infuncitself (GH 64014)Deprecated
ExcelFile.parse(), useread_excel()instead (GH 58247)Deprecated
engine="fastparquet"andengine="auto"inread_parquet()andDataFrame.to_parquet(). Thefastparquetlibrary has been retired; useengine="pyarrow"or do not passengineto use the default. (GH 64597)Deprecated arithmetic operations between pandas objects (
DataFrame,Series,Index, and pandas-implementedExtensionArraysubclasses) and list-likes other thanlist,np.ndarray,ExtensionArray,Index,Series,DataFrame. For e.g.tupleorrange, explicitly cast these to a supported object instead. In a future version, these will be treated as scalar-like for pointwise operation (GH 62423)Deprecated automatic dtype promotion when reindexing with a
fill_valuethat cannot be held by the original dtype. Explicitly cast to a common dtype instead (GH 53910)Deprecated passing a non-dict (e.g. a list of dicts) to
DataFrame.from_dict(). Use theDataFrameconstructor instead (GH 58862)Deprecated passing unnecessary
*argsand**kwargstoGroupBy.cumsum(),GroupBy.cumprod(),GroupBy.cummin(),GroupBy.cummax(),SeriesGroupBy.skew(),DataFrameGroupBy.skew(),SeriesGroupBy.take(), andDataFrameGroupBy.take(). Theskipnaparameter for the cum* methods is now an explicit keyword argument (GH 50407)Deprecated setting values with
DataFrame.at()andSeries.at()when the key does not exist in the index, which previously expanded the object. Use.locinstead (GH 48323)Deprecated the
.nameproperty of offset objects (e.g.,Day,Hour). Use.rule_codeinstead (GH 64207)Deprecated the
dropnakeyword inDataFrame.to_hdf(),HDFStore.put(),HDFStore.append(), andHDFStore.append_to_multiple(), and theio.hdf.dropna_tableoption. UseDataFrame.dropna()before writing instead (GH 32038)Deprecated the
float_precisionargument inread_csv(),read_table(), andread_fwf(). All float precision modes now use the same converter (GH 64395)Deprecated the
weekdayproperty onDatetimeIndex,DatetimeArray,PeriodIndex,PeriodArray, andPeriod. Useday_of_weekinstead.Timestamp.weekday()remains a method consistent withdatetime.datetime.weekday()(GH 12816)Deprecated the
xlrdandpyxlsbengines inread_excel(). Useengine="calamine"instead (GH 56542)Deprecated the default value of
exactinassert_index_equal(); in a future version this will default toTrueinstead of “equiv” (GH 57436)
Performance improvements#
Performance improvement in casting integer and boolean dtypes to
string[pyarrow]by using PyArrow’s native cast instead of element-wise conversion (GH 56505)Performance improvement in
DataFrame.__getitem__()when selecting a single column by label on aDataFramewith duplicate column names. (GH 64126).Performance improvement in
Series.is_monotonic_increasingandSeries.is_monotonic_decreasingforArrowDtypeand masked dtypes by dispatching to theExtensionArray(GH 56619)Performance improvement in
GroupByreductions and transformations forSparseDtypecolumns, which now use Cython instead of falling back to slow Python aggregation (GH 36123)Performance improvement in
bdate_range()anddate_range()withfreq="B"orfreq="C"(business day frequencies) (GH 16463)Performance improvement in
infer_freq()(GH 64463)Performance improvement in
merge()andDataFrame.join()for many-to-many joins withsort=False(GH 56564)Performance improvement in
merge()withhow="cross"(GH 38082)Performance improvement in
merge()withhow="left"(GH 64370)Performance improvement in
read_csv()withengine="c"when reading from binary file-like objects (e.g. PyArrow S3 file handles) by avoiding unnecessaryTextIOWrapperwrapping (GH 46823)Performance improvement in
read_html()and the Python CSV parser whenthousandsis set, fixing catastrophic regex backtracking on cells with many comma-separated digit groups followed by non-numeric text (GH 52619)Performance improvement in
read_sas()for compressed SAS7BDAT files by reusing the decompression buffer instead of allocating per row (GH 47339)Performance improvement in
util.hash_pandas_object()for PyArrow-backed string and binary types by using PyArrow’sdictionary_encodeinstead of converting to NumPy for factorization (GH 48964)Performance improvement in
DataFrame.insert()when the number of blocks is small (GH 57641)Performance improvement in
DataFrame.loc()with non-unique masked index (GH 56759)Performance improvement in
DataFrame.query()andDataFrame.eval()when theDataFramecontainsPeriodDtypeorIntervalDtypecolumns (GH 35247)Performance improvement in
DataFrame.to_stata()when writing object-dtype datetime columns with date formats that require year/month extraction (GH 64555)Performance improvement in
GroupBy.any()andGroupBy.all()for boolean-dtype columns (GH 37850)Performance improvement in
GroupBy.first()andGroupBy.last()for Extension Array dtypes, which no longer fall back to a slowapply-based implementation (GH 57591)Performance improvement in
GroupBy.quantile()(GH 64330)Performance improvement in
Index.get_indexer()for large monotonic indexes, which now uses binary search instead of building a hash table when the number of targets is small (GH 14273)Performance improvement in
NDFrame.__finalize__(),Series.to_numpy(),DataFrame.dtypes, andDataFrame.__getitem__()by reducing overhead from metadata propagation, memory sharing checks, and attribute setting (GH 57431)Performance improvement in
arrays.SparseArray.isna()by avoiding a dense-then-resparsify round-trip (GH 41023)Performance improvement in datetime/timedelta unit conversion (e.g.
datetime64[s]todatetime64[ns]) (GH 35025)Performance improvement in indexing a
DataFramewith aCategoricalIndexofIntervalcategories (GH 61928)Performance improvement in indexing a
MultiIndexwith a list-like indexer (GH 55786)Performance improvement in partial-string indexing on a monotonic decreasing
DatetimeIndexorPeriodIndex(GH 64811)Performance improvement in plotting
DatetimeIndexwith multiplied frequencies (e.g."1000ms","100s") (GH 50355)Performance improvement in reading zip-compressed files (e.g.
read_pickle(),read_csv()) on Python < 3.12 (GH 59279)Performance improvement in repr of
SeriesandDataFramecontaining third-party array-like objects (e.g. xarrayDataArray) in object dtype columns (GH 61809)Performance improvement in
DataFrame.loc()andDataFrame.iloc()setitem with a 2D list-of-lists value by avoiding a wasteful round-trip through an intermediate object array (GH 64229).
Bug fixes#
Fix bug in
to_datetime()that could give an unnecessaryRuntimeWarningwhen converting DataFrame containing missing values (GH 64141)Fixed bug in
to_timedelta()andTimedeltanot accepting Day offsets (GH 64240)
Categorical#
Datetimelike#
Bug in
Timestampconstructor where passingnp.str_objects would fail in Cython string parsing (GH 48974)Bug in
Timestampconstructor,Timedeltaconstructor,to_datetime(), andto_timedelta()with non-roundfloatinput andunitfailing to raise when the value is just outside the representable bounds (GH 57366)Bug in
date_range()whereinclusiveparameter failed to filter endpoints when onlystartandperiodsorendandperiodswere specified (GH 46331)Bug in
date_range()where calendar-based offsets (e.g.MS,ME,QS,YS) could exclude the last offset boundary whenend’s time-of-day was earlier thanstart’s (GH 35342)Bug in
to_datetime()andto_timedelta()on ARM platforms where roundfloatvalues outside the int64 domain (e.g.float(2**63)) could silently produce incorrect results instead of raising (GH 64619)Bug in
to_datetime()andto_timedelta()whereuint64values greater thanint64max silently overflowed instead of raisingOutOfBoundsDatetimeorOutOfBoundsTimedelta(GH 60677)Bug in
DatetimeArray.isin()andTimedeltaArray.isin()where mismatched resolutions could silently truncate finer-resolution values, leading to false matches (GH 64545)Bug in adding non-nano
DatetimeIndexwith non-vectorized offsets (e.g.CustomBusinessDay,CustomBusinessMonthEnd) having a sub-unitoffsetparameter incorrectly truncating the result or raisingAttributeError(GH 56586)
Timedelta#
Bug in
DateOffsetwhereDateOffset(1)andDateOffset(days=1)returned different results near daylight saving time transitions (GH 61862)Bug in
to_timedelta()where passingnp.str_objects would fail in Cython string parsing (GH 48974)
Timezones#
Bug in
DatetimeIndexaddition with aDateOffsetthat has only timedelta components (e.g.DateOffset(hours=-2)) raisingValueErrornear DST transitions, while scalarTimestampaddition worked correctly (GH 28610)
Numeric#
Fixed bug in
read_excel()where having a column with mixture of numeric and boolean values will typecast the values based on the first appearance data type since 1==True and 0==False (GH 60088)Fixed bug in
Series.clip()where passing a scalar numpy array (e.g.np.array(0)) would raise aTypeError(GH 59053)Fixed bug in
Series.mean()andSeries.sum()(and theirDataFramecounterparts) overflowing forfloat16dtypes instead of upcasting tofloat64(GH 43929)Fixed bug in
Series.skew()andSeries.kurt()(and theirDataFramecounterparts) returning0.0for degenerate distributions; these now returnNaN(GH 62864)
Conversion#
Fixed
pandas.array()to preserve mask information when converting NumPy masked arrays, converting masked values to missing values (GH 63879).Fixed bug in
DataFrame.convert_dtypes()where values were dropped from slicedDataFrameobjects with mixed dtypes when the internal block structure spanned multiple columns (GH 64702)Fixed bug in
DataFrame.from_records()whereexcludewas ignored whendatawas an iterator andnrows=0(GH 63774)
Strings#
Interval#
Indexing#
Bugs in setitem-with-expansion when adding new rows failing to keep the original dtype in some cases (GH 32346, GH 15231, GH 47503, GH 6485, GH 25383, GH 52235, GH 17026, GH 56010)
Bug in
DataFrame.iloc()setitem raisingAttributeErrorwhen assigning aSeriesorIndexwith a nullable EA dtype (e.g.Int64,Float64,boolean) into a column with a NumPy dtype (GH 47776)Bug in
DataFrame.mask()withinplace=Truewhere incorrect values were produced whenotherwas aSerieswithExtensionArrayvalues (GH 64635)Bug in
DataFrame.where()andDataFrame.mask()raisingTypeErrorwhencondis aSeriesandaxis=1(GH 58190)Bug in
DataFrame.xs()wheredrop_level=Falsewas ignored for fully specifiedMultiIndexkeys whenlevelwas not explicitly provided (GH 6507)Bug in
Index.get_level_values()mishandling boolean, NA-like (np.nan,pd.NA,pd.NaT) and integer index names (GH 62169)Bug in
Index.get_loc()raisingKeyErrorwhen looking up a tuple in an object-dtypeIndexwith duplicates (GH 37800)Bug in
Index.insert()silently casting booleans to numeric when used with nullable numeric dtypes likeFloat64orInt64(GH 61709)
Missing#
Bug in
DataFrame.fillna()with a dict value raisingRecursionErrorwhen columns are aMultiIndexwith duplicate entries (GH 53498)
MultiIndex#
Bug in
MultiIndex.sortlevel()not raisingTypeErrorwhen sorting a level with incomparable types (e.g.,Timestampandstr) (GH 21136)
I/O#
Fixed bug in
read_csv()with thecengine where an embedded\rfollowed by a space in an unquoted field could cause an infinite re-parsing loop, producing spurious rows or a buffer overflow (GH 51141)Fixed bug in
read_excel()where usage ofskiprowscould lead to an infinite loop (GH 64027)Fixed bug in
HDFStore.put()where string extension dtype columns raised errors when using compression (GH 64180)Fixed
read_json()withlines=Trueandchunksizeto respectnrowswhen the requested row count is not a multiple of the chunk size (GH 64025)Bug in
DataFrame.to_stata()raisingKeyErrorwhen column names require renaming andconvert_datesis specified for a different column (GH 60536)Fixed
read_json()withlines=Trueandnrows=0to return an empty DataFrame (GH 64025)Fixed bug in
HDFStore.select()where passingwhereas a list of conditions referencing caller-scope variables failed on Python 3.12+ due to PEP 709 inlining list comprehension stack frames (GH 64881)
Period#
Plotting#
Groupby/resample/rolling#
Bug in
DataFrameGroupBy.agg()when there are no groups, multiple keys, andgroup_keys=False(GH 51445)Bug in
DataFrameGroupBy.agg()would operate on the group as a whole whenargsorkwargsare supplied for the providedfunc; now this method only operates on each Series of the group (GH 39169)Bug in
Rolling.skew()andRolling.kurt()(and theirGroupBycounterparts) returning0.0and-3.0respectively for degenerate windows or groups; these now returnNaN(GH 62864)Bug in
Rolling.skew()andRolling.kurt()returningNaNfor low-variance windows (GH 62946)
Reshaping#
Bug in
merge()where merging on aMultiIndexcontainingNaNvalues mappedNaNkeys to the last level value instead ofNaN(GH 64492)In
pivot_table(), whenvaluesis empty, the aggregation will be computed on a Series of all NA values (GH 46475)
Sparse#
Bug in indexing a
SparseArraywith an out-of-bounds integer with the value of the length of the array returning the fill value instead of raising anIndexError(GH 64183).
ExtensionArray#
Fixed bug in
Series.apply()andSeries.map()where nullable integer dtypes were converted to float, causing precision loss for large integers; now the nullable dtype will be preserved (GH 63903).