What’s new in 3.0.0 (Month XX, 2024)#
These are the changes in pandas 3.0.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
enhancement1#
enhancement2#
Other enhancements#
pandas.api.typing.FrozenList
is available for typing the outputs ofMultiIndex.names
,MultiIndex.codes
andMultiIndex.levels
(GH 58237)pandas.api.typing.SASReader
is available for typing the output ofread_sas()
(GH 55689)DataFrame.to_excel()
now raises anUserWarning
when the character count in a cell exceeds Excel’s limitation of 32767 characters (GH 56954)pandas.merge()
now validates thehow
parameter input (merge type) (GH 59435)read_spss()
now supports kwargs to be passed to pyreadstat (GH 56356)read_stata()
now returnsdatetime64
resolutions better matching those natively stored in the stata format (GH 55642)DataFrame.agg()
called withaxis=1
and afunc
which relabels the result index now raises aNotImplementedError
(GH 58807).Index.get_loc()
now accepts also subclasses oftuple
as keys (GH 57922)Styler.set_tooltips()
provides alternative method to storing tooltips by using title attribute of td elements. (GH 56981)Added missing parameter
weights
inDataFrame.plot.kde()
for the estimation of the PDF (GH 59337)Allow dictionaries to be passed to
pandas.Series.str.replace()
viapat
parameter (GH 51748)Support passing a
Series
input tojson_normalize()
that retains theSeries
Index
(GH 51452)Support reading value labels from Stata 108-format (Stata 6) and earlier files (GH 58154)
Users can globally disable any
PerformanceWarning
by setting the optionmode.performance_warnings
toFalse
(GH 56920)Styler.format_index_names()
can now be used to format the index and column names (GH 48936 and GH 47489)errors.DtypeWarning
improved to include column names when mixed data types are detected (GH 58174)Series
now supports the Arrow PyCapsule Interface for export (GH 59518)DataFrame.to_excel()
argumentmerge_cells
now accepts a value of"columns"
to only mergeMultiIndex
column header header cells (GH 35384)DataFrame.corrwith()
now acceptsmin_periods
as optional arguments, as inDataFrame.corr()
andSeries.corr()
(GH 9490)DataFrame.cummin()
,DataFrame.cummax()
,DataFrame.cumprod()
andDataFrame.cumsum()
methods now have anumeric_only
parameter (GH 53072)DataFrame.ewm()
now allowsadjust=False
whentimes
is provided (GH 54328)DataFrame.fillna()
andSeries.fillna()
can now acceptvalue=None
; for non-object dtype the corresponding NA value will be used (GH 57723)DataFrame.pivot_table()
andpivot_table()
now allow the passing of keyword arguments toaggfunc
through**kwargs
(GH 57884)Series.cummin()
andSeries.cummax()
now supportsCategoricalDtype
(GH 52335)Series.plot()
now correctly handle theylabel
parameter for pie charts, allowing for explicit control over the y-axis label (GH 58239)DataFrame.plot.scatter()
argumentc
now accepts a column of strings, where rows with the same string are colored identically (GH 16827 and GH 16485)pandas.concat()
will raise aValueError
whenignore_index=True
andkeys
is notNone
(GH 59274)str.get_dummies()
now accepts adtype
parameter to specify the dtype of the resulting DataFrame (GH 47872)Multiplying two
DateOffset
objects will now raise aTypeError
instead of aRecursionError
(GH 59442)Restore support for reading Stata 104-format and enable reading 103-format dta files (GH 58554)
Support passing a
Iterable[Hashable]
input toDataFrame.drop_duplicates()
(GH 59237)Support reading Stata 102-format (Stata 1) dta files (GH 58978)
Support reading Stata 110-format (Stata 7) dta files (GH 47176)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
Improved behavior in groupby for observed=False
#
A number of bugs have been fixed due to improved handling of unobserved groups (GH 55738). All remarks in this section equally impact SeriesGroupBy
.
In previous versions of pandas, a single grouping with DataFrameGroupBy.apply()
or DataFrameGroupBy.agg()
would pass the unobserved groups to the provided function, resulting in 0
below.
In [1]: df = pd.DataFrame(
...: {
...: "key1": pd.Categorical(list("aabb"), categories=list("abc")),
...: "key2": [1, 1, 1, 2],
...: "values": [1, 2, 3, 4],
...: }
...: )
...:
In [2]: df
Out[2]:
key1 key2 values
0 a 1 1
1 a 1 2
2 b 1 3
3 b 2 4
In [3]: gb = df.groupby("key1", observed=False)
In [4]: gb[["values"]].apply(lambda x: x.sum())
Out[4]:
values
key1
a 3
b 7
c 0
However this was not the case when using multiple groupings, resulting in NaN
below.
In [1]: gb = df.groupby(["key1", "key2"], observed=False)
In [2]: gb[["values"]].apply(lambda x: x.sum())
Out[2]:
values
key1 key2
a 1 3.0
2 NaN
b 1 3.0
2 4.0
c 1 NaN
2 NaN
Now using multiple groupings will also pass the unobserved groups to the provided function.
In [5]: gb = df.groupby(["key1", "key2"], observed=False)
In [6]: gb[["values"]].apply(lambda x: x.sum())
Out[6]:
values
key1 key2
a 1 3
2 0
b 1 3
2 4
c 1 0
2 0
Similarly:
In previous versions of pandas the method
DataFrameGroupBy.sum()
would result in0
for unobserved groups, butDataFrameGroupBy.prod()
,DataFrameGroupBy.all()
, andDataFrameGroupBy.any()
would all result in NA values. Now these methods result in1
,True
, andFalse
respectively.DataFrameGroupBy.groups()
did not include unobserved groups and now does.
These improvements also fixed certain bugs in groupby:
DataFrameGroupBy.agg()
would fail when there are multiple groupings, unobserved groups, andas_index=False
(GH 36698)DataFrameGroupBy.groups()
withsort=False
would sort groups; they now occur in the order they are observed (GH 56966)DataFrameGroupBy.nunique()
would fail when there are multiple groupings, unobserved groups, andas_index=False
(GH 52848)DataFrameGroupBy.sum()
would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (GH 43891)DataFrameGroupBy.value_counts()
would produce incorrect results when used with some categorical and some non-categorical groupings andobserved=False
(GH 56016)
notable_bug_fix2#
Backwards incompatible API changes#
Datetime resolution inference#
Converting a sequence of strings, datetime
objects, or np.datetime64
objects to
a datetime64
dtype now performs inference on the appropriate resolution (AKA unit) for the output dtype. This affects Series
, DataFrame
, Index
, DatetimeIndex
, and to_datetime()
.
Previously, these would always give nanosecond resolution:
In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
In [2]: pd.to_datetime([dt]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.Index([dt]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.DatetimeIndex([dt]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.Series([dt]).dtype
Out[5]: dtype('<M8[ns]')
This now infers the unit microsecond unit “us” from the pydatetime object, matching the scalar Timestamp
behavior.
In [7]: In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
In [8]: In [2]: pd.to_datetime([dt]).dtype
Out[8]: dtype('<M8[us]')
In [9]: In [3]: pd.Index([dt]).dtype
Out[9]: dtype('<M8[us]')
In [10]: In [4]: pd.DatetimeIndex([dt]).dtype
Out[10]: dtype('<M8[us]')
In [11]: In [5]: pd.Series([dt]).dtype
Out[11]: dtype('<M8[us]')
Similar when passed a sequence of np.datetime64
objects, the resolution of the passed objects will be retained (or for lower-than-second resolution, second resolution will be used).
When passing strings, the resolution will depend on the precision of the string, again matching the Timestamp
behavior. Previously:
In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
Out[5]: dtype('<M8[ns]')
The inferred resolution now matches that of the input strings:
In [12]: In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
Out[12]: dtype('<M8[s]')
In [13]: In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
Out[13]: dtype('<M8[ms]')
In [14]: In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
Out[14]: dtype('<M8[us]')
In [15]: In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
Out[15]: dtype('<M8[ns]')
In cases with mixed-resolution inputs, the highest resolution is used:
In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
Out[2]: dtype('<M8[ns]')
Increased minimum version for Python#
pandas 3.0.0 supports Python 3.10 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
---|---|---|---|
numpy |
1.23.5 |
X |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
New Minimum Version |
---|---|
pytz |
2023.4 |
fastparquet |
2023.10.0 |
adbc-driver-postgresql |
0.10.0 |
mypy (dev) |
1.9.0 |
See Dependencies and Optional dependencies for more.
pytz
now an optional dependency#
pandas now uses zoneinfo
from the standard library as the default timezone implementation when passing a timezone
string to various methods. (GH 34916)
Old behavior:
In [1]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [2]: ts.tz
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>
New behavior:
In [16]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [17]: ts.tz
Out[17]: zoneinfo.ZoneInfo(key='US/Pacific')
pytz
timezone objects are still supported when passed directly, but they will no longer be returned by default
from string inputs. Moreover, pytz
is no longer a required dependency of pandas, but can be installed
with the pip extra pip install pandas[timezone]
.
Additionally, pandas no longer throws pytz
exceptions for timezone operations leading to ambiguous or nonexistent
times. These cases will now raise a ValueError
.
Other API changes#
3rd party
py.path
objects are no longer explicitly supported in IO methods. Usepathlib.Path
objects instead (GH 57091)read_table()
’sparse_dates
argument defaults toNone
to improve consistency withread_csv()
(GH 57476)All classes inheriting from builtin
tuple
(including types created withcollections.namedtuple()
) are now hashed and compared as builtintuple
during indexing operations (GH 57922)Made
dtype
a required argument inExtensionArray._from_sequence_of_strings()
(GH 56519)Passing a
Series
input tojson_normalize()
will now retain theSeries
Index
, previously output had a newRangeIndex
(GH 51452)Removed
Index.sort()
which always raised aTypeError
. This attribute is not defined and will raise anAttributeError
(GH 59283)Updated
DataFrame.to_excel()
so that the output spreadsheet has no styling. Custom styling can still be done usingStyler.to_excel()
(GH 54154)pickle and HDF (
.h5
) files created with Python 2 are no longer explicitly supported (GH 57387)pickled objects from pandas version less than
1.0.0
are no longer supported (GH 57155)when comparing the indexes in
testing.assert_series_equal()
, check_exact defaults to True if anIndex
is of integer dtypes. (GH 57386)
Deprecations#
Copy keyword#
The copy
keyword argument in the following methods is deprecated and
will be removed in a future version:
DataFrame.merge()
/pd.merge()
Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until
necessary. Use .copy
to trigger an eager copy. The copy keyword has no effect
starting with 3.0, so it can be safely removed from your code.
Other Deprecations#
Deprecated
core.internals.api.make_block()
, use public APIs instead (GH 56815)Deprecated
DataFrameGroupby.corrwith()
(GH 57158)Deprecated
Timestamp.utcfromtimestamp()
, useTimestamp.fromtimestamp(ts, "UTC")
instead (GH 56680)Deprecated
Timestamp.utcnow()
, useTimestamp.now("UTC")
instead (GH 56680)Deprecated allowing non-keyword arguments in
DataFrame.all()
,DataFrame.min()
,DataFrame.max()
,DataFrame.sum()
,DataFrame.prod()
,DataFrame.mean()
,DataFrame.median()
,DataFrame.sem()
,DataFrame.var()
,DataFrame.std()
,DataFrame.skew()
,DataFrame.kurt()
,Series.all()
,Series.min()
,Series.max()
,Series.sum()
,Series.prod()
,Series.mean()
,Series.median()
,Series.sem()
,Series.var()
,Series.std()
,Series.skew()
, andSeries.kurt()
. (GH 57087)Deprecated allowing non-keyword arguments in
Series.to_markdown()
exceptbuf
. (GH 57280)Deprecated allowing non-keyword arguments in
Series.to_string()
exceptbuf
. (GH 57280)Deprecated behavior of
DataFrameGroupBy.groups()
andSeriesGroupBy.groups()
, in a future versiongroups
by one element list will return tuple instead of scalar. (GH 58858)Deprecated behavior of
Series.dt.to_pytimedelta()
, in a future version this will return aSeries
containing pythondatetime.timedelta
objects instead of anndarray
of timedelta; this matches the behavior of otherSeries.dt()
properties. (GH 57463)Deprecated lowercase strings
d
,b
andc
denoting frequencies inDay
,BusinessDay
andCustomBusinessDay
in favour ofD
,B
andC
(GH 58998)Deprecated lowercase strings
w
,w-mon
,w-tue
, etc. denoting frequencies inWeek
in favour ofW
,W-MON
,W-TUE
, etc. (GH 58998)Deprecated parameter
method
inDataFrame.reindex_like()
/Series.reindex_like()
(GH 58667)Deprecated strings
w
,d
,MIN
,MS
,US
andNS
denoting units inTimedelta
in favour ofW
,D
,min
,ms
,us
andns
(GH 59051)Deprecated using
epoch
date format inDataFrame.to_json()
andSeries.to_json()
, useiso
instead. (GH 57063)
Removal of prior version deprecations/changes#
Enforced deprecation of aliases M
, Q
, Y
, etc. in favour of ME
, QE
, YE
, etc. for offsets#
Renamed the following offset aliases (GH 57986):
offset |
removed alias |
new alias |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other Removals#
DataFrameGroupBy.idxmin
,DataFrameGroupBy.idxmax
,SeriesGroupBy.idxmin
, andSeriesGroupBy.idxmax
will now raise aValueError
when used withskipna=False
and an NA value is encountered (GH 10694)concat()
no longer ignores empty objects when determining output dtypes (GH 39122)concat()
with all-NA entries no longer ignores the dtype of those entries when determining the result dtype (GH 40893)read_excel()
,read_json()
,read_html()
, andread_xml()
no longer accept raw string or byte representation of the data. That type of data must be wrapped in aStringIO
orBytesIO
(GH 53767)to_datetime()
with aunit
specified no longer parses strings into floats, instead parses them the same way as withoutunit
(GH 50735)DataFrame.groupby()
withas_index=False
and aggregation methods will no longer exclude from the result the groupings that do not arise from the input (GH 49519)ExtensionArray._reduce()
now requires akeepdims: bool = False
parameter in the signature (GH 52788)Series.dt.to_pydatetime()
now returns aSeries
ofdatetime.datetime
objects (GH 52459)SeriesGroupBy.agg()
no longer pins the name of the group to the input passed to the providedfunc
(GH 51703)All arguments except
name
inIndex.rename()
are now keyword only (GH 56493)All arguments except the first
path
-like argument in IO writers are now keyword only (GH 54229)Changed behavior of
Series.__getitem__()
andSeries.__setitem__()
to always treat integer keys as labels, never as positional, consistent withDataFrame
behavior (GH 50617)Changed behavior of
Series.__getitem__()
,Series.__setitem__()
,DataFrame.__getitem__()
,DataFrame.__setitem__()
with an integer slice on objects with a floating-dtype index. This is now treated as positional indexing (GH 49612)Disallow a callable argument to
Series.iloc()
to return atuple
(GH 53769)Disallow allowing logical operations (
||
,&
,^
) between pandas objects and dtype-less sequences (e.g.list
,tuple
); wrap the objects inSeries
,Index
, ornp.array
first instead (GH 52264)Disallow automatic casting to object in
Series
logical operations (&
,^
,||
) between series with mismatched indexes and dtypes other thanobject
orbool
(GH 52538)Disallow calling
Series.replace()
orDataFrame.replace()
without avalue
and with non-dict-liketo_replace
(GH 33302)Disallow constructing a
arrays.SparseArray
with scalar data (GH 53039)Disallow indexing an
Index
with a boolean indexer of length zero, it now raisesValueError
(GH 55820)Disallow non-standard (
np.ndarray
,Index
,ExtensionArray
, orSeries
) toisin()
,unique()
,factorize()
(GH 52986)Disallow passing a pandas type to
Index.view()
(GH 55709)Disallow units other than “s”, “ms”, “us”, “ns” for datetime64 and timedelta64 dtypes in
array()
(GH 53817)Removed “freq” keyword from
PeriodArray
constructor, use “dtype” instead (GH 52462)Removed ‘fastpath’ keyword in
Categorical
constructor (GH 20110)Removed ‘kind’ keyword in
Series.resample()
andDataFrame.resample()
(GH 58125)Removed
Block
,DatetimeTZBlock
,ExtensionBlock
,create_block_manager_from_blocks
frompandas.core.internals
andpandas.core.internals.api
(GH 55139)Removed alias
arrays.PandasArray
forarrays.NumpyExtensionArray
(GH 53694)Removed deprecated “method” and “limit” keywords from
Series.replace()
andDataFrame.replace()
(GH 53492)Removed extension test classes
BaseNoReduceTests
,BaseNumericReduceTests
,BaseBooleanReduceTests
(GH 54663)Removed the “closed” and “normalize” keywords in
DatetimeIndex.__new__()
(GH 52628)Removed the deprecated
delim_whitespace
keyword inread_csv()
andread_table()
, usesep=r"\s+"
instead (GH 55569)Require
SparseDtype.fill_value()
to be a valid value for theSparseDtype.subtype()
(GH 53043)Stopped automatically casting non-datetimelike values (mainly strings) in
Series.isin()
andIndex.isin()
withdatetime64
,timedelta64
, andPeriodDtype
dtypes (GH 53111)Stopped performing dtype inference in
Index
,Series
andDataFrame
constructors when given a pandas object (Series
,Index
,ExtensionArray
), call.infer_objects
on the input to keep the current behavior (GH 56012)Stopped performing dtype inference when setting a
Index
into aDataFrame
(GH 56102)Stopped performing dtype inference with in
Index.insert()
with object-dtype index; this often affects the index/columns that result when setting new entries into an emptySeries
orDataFrame
(GH 51363)Removed the “closed” and “unit” keywords in
TimedeltaIndex.__new__()
(GH 52628, GH 55499)All arguments in
Index.sort_values()
are now keyword only (GH 56493)All arguments in
Series.to_dict()
are now keyword only (GH 56493)Changed the default value of
na_action
inCategorical.map()
toNone
(GH 51645)Changed the default value of
observed
inDataFrame.groupby()
andSeries.groupby()
toTrue
(GH 51811)Enforce deprecation in
testing.assert_series_equal()
andtesting.assert_frame_equal()
with object dtype and mismatched null-like values, which are now considered not-equal (GH 18463)Enforce banning of upcasting in in-place setitem-like operations (GH 59007) (see PDEP6)
Enforced deprecation
all
andany
reductions withdatetime64
,DatetimeTZDtype
, andPeriodDtype
dtypes (GH 58029)Enforced deprecation disallowing
float
“periods” indate_range()
,period_range()
,timedelta_range()
,interval_range()
, (GH 56036)Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes
utc=True
toto_datetime()
(GH 57275)Enforced deprecation in
Series.value_counts()
andIndex.value_counts()
with object dtype performing dtype inference on the.index
of the result (GH 56161)Enforced deprecation of
DataFrameGroupBy.get_group()
andSeriesGroupBy.get_group()
allowing thename
argument to be a non-tuple when grouping by a list of length 1 (GH 54155)Enforced deprecation of
Series.interpolate()
andDataFrame.interpolate()
for object-dtype (GH 57820)Enforced deprecation of
offsets.Tick.delta()
, usepd.Timedelta(obj)
instead (GH 55498)Enforced deprecation of
axis=None
acting the same asaxis=0
in the DataFrame reductionssum
,prod
,std
,var
, andsem
, passingaxis=None
will now reduce over both axes; this is particularly the case when doing e.g.numpy.sum(df)
(GH 21597)Enforced deprecation of
core.internals
membersBlock
,ExtensionBlock
, andDatetimeTZBlock
(GH 58467)Enforced deprecation of
date_parser
inread_csv()
,read_table()
,read_fwf()
, andread_excel()
in favour ofdate_format
(GH 50601)Enforced deprecation of
keep_date_col
keyword inread_csv()
(GH 55569)Enforced deprecation of
quantile
keyword inRolling.quantile()
andExpanding.quantile()
, renamed toq
instead. (GH 52550)Enforced deprecation of argument
infer_datetime_format
inread_csv()
, as a strict version of it is now the default (GH 48621)Enforced deprecation of combining parsed datetime columns in
read_csv()
inparse_dates
(GH 55569)Enforced deprecation of non-standard (
np.ndarray
,ExtensionArray
,Index
, orSeries
) argument toapi.extensions.take()
(GH 52981)Enforced deprecation of parsing system timezone strings to
tzlocal
, which depended on system timezone, pass the ‘tz’ keyword instead (GH 50791)Enforced deprecation of passing a dictionary to
SeriesGroupBy.agg()
(GH 52268)Enforced deprecation of string
AS
denoting frequency inYearBegin
and stringsAS-DEC
,AS-JAN
, etc. denoting annual frequencies with various fiscal year starts (GH 57793)Enforced deprecation of string
A
denoting frequency inYearEnd
and stringsA-DEC
,A-JAN
, etc. denoting annual frequencies with various fiscal year ends (GH 57699)Enforced deprecation of string
BAS
denoting frequency inBYearBegin
and stringsBAS-DEC
,BAS-JAN
, etc. denoting annual frequencies with various fiscal year starts (GH 57793)Enforced deprecation of string
BA
denoting frequency inBYearEnd
and stringsBA-DEC
,BA-JAN
, etc. denoting annual frequencies with various fiscal year ends (GH 57793)Enforced deprecation of strings
H
,BH
, andCBH
denoting frequencies inHour
,BusinessHour
,CustomBusinessHour
(GH 59143)Enforced deprecation of strings
H
,BH
, andCBH
denoting units inTimedelta
(GH 59143)Enforced deprecation of strings
T
,L
,U
, andN
denoting frequencies inMinute
,Milli
,Micro
,Nano
(GH 57627)Enforced deprecation of strings
T
,L
,U
, andN
denoting units inTimedelta
(GH 57627)Enforced deprecation of the behavior of
concat()
whenlen(keys) != len(objs)
would truncate to the shorter of the two. Now this raises aValueError
(GH 43485)Enforced deprecation of the behavior of
DataFrame.replace()
andSeries.replace()
withCategoricalDtype
that would introduce new categories. (GH 58270)Enforced deprecation of the behavior of
Series.argsort()
in the presence of NA values (GH 58232)Enforced deprecation of values “pad”, “ffill”, “bfill”, and “backfill” for
Series.interpolate()
andDataFrame.interpolate()
(GH 57869)Enforced deprecation removing
Categorical.to_list()
, useobj.tolist()
instead (GH 51254)Enforced silent-downcasting deprecation for all relevant methods (GH 54710)
In
DataFrame.stack()
, the default value offuture_stack
is nowTrue
; specifyingFalse
will raise aFutureWarning
(GH 55448)Iterating over a
DataFrameGroupBy
orSeriesGroupBy
will return tuples of length 1 for the groups when grouping bylevel
a list of length 1 (GH 50064)Methods
apply
,agg
, andtransform
will no longer replace NumPy functions (e.g.np.sum
) and built-in functions (e.g.min
) with the equivalent pandas implementation; use string aliases (e.g."sum"
and"min"
) if you desire to use the pandas implementation (GH 53974)Passing both
freq
andfill_value
inDataFrame.shift()
andSeries.shift()
andDataFrameGroupBy.shift()
now raises aValueError
(GH 54818)Removed
DataFrameGroupBy.quantile()
andSeriesGroupBy.quantile()
supporting bool dtype (GH 53975)Removed
DateOffset.is_anchored()
andoffsets.Tick.is_anchored()
(GH 56594)Removed
DataFrame.applymap
,Styler.applymap
andStyler.applymap_index
(GH 52364)Removed
DataFrame.bool
andSeries.bool
(GH 51756)Removed
DataFrame.first
andDataFrame.last
(GH 53710)Removed
DataFrame.swapaxes
andSeries.swapaxes
(GH 51946)Removed
DataFrameGroupBy.grouper
andSeriesGroupBy.grouper
(GH 56521)Removed
DataFrameGroupby.fillna
andSeriesGroupBy.fillna`
(GH 55719)Removed
Index.format
, useIndex.astype()
withstr
orIndex.map()
with aformatter
function instead (GH 55439)Removed
Resample.fillna
(GH 55719)Removed
Series.__int__
andSeries.__float__
. Callint(Series.iloc[0])
orfloat(Series.iloc[0])
instead. (GH 51131)Removed
Series.ravel
(GH 56053)Removed
Series.view
(GH 56054)Removed
StataReader.close
(GH 49228)Removed
_data
fromDataFrame
,Series
,arrays.ArrowExtensionArray
(GH 52003)Removed
axis
argument fromDataFrame.groupby()
,Series.groupby()
,DataFrame.rolling()
,Series.rolling()
,DataFrame.resample()
, andSeries.resample()
(GH 51203)Removed
axis
argument from all groupby operations (GH 50405)Removed
convert_dtype
fromSeries.apply()
(GH 52257)Removed
method
,limit
fill_axis
andbroadcast_axis
keywords fromDataFrame.align()
(GH 51968)Removed
pandas.api.types.is_interval
andpandas.api.types.is_period
, useisinstance(obj, pd.Interval)
andisinstance(obj, pd.Period)
instead (GH 55264)Removed
pandas.io.sql.execute
(GH 50185)Removed
pandas.value_counts
, useSeries.value_counts()
instead (GH 53493)Removed
read_gbq
andDataFrame.to_gbq
. Usepandas_gbq.read_gbq
andpandas_gbq.to_gbq
instead https://pandas-gbq.readthedocs.io/en/latest/api.html (GH 55525)Removed
use_nullable_dtypes
fromread_parquet()
(GH 51853)Removed
year
,month
,quarter
,day
,hour
,minute
, andsecond
keywords in thePeriodIndex
constructor, usePeriodIndex.from_fields()
instead (GH 55960)Removed argument
limit
fromDataFrame.pct_change()
,Series.pct_change()
,DataFrameGroupBy.pct_change()
, andSeriesGroupBy.pct_change()
; the argumentmethod
must be set toNone
and will be removed in a future version of pandas (GH 53520)Removed deprecated argument
obj
inDataFrameGroupBy.get_group()
andSeriesGroupBy.get_group()
(GH 53545)Removed deprecated behavior of
Series.agg()
usingSeries.apply()
(GH 53325)Removed deprecated keyword
method
onSeries.fillna()
,DataFrame.fillna()
(GH 57760)Removed option
mode.use_inf_as_na
, convert inf entries toNaN
before instead (GH 51684)Removed support for
DataFrame
inDataFrame.from_records`(:issue:`51697()
)Removed support for
errors="ignore"
into_datetime()
,to_timedelta()
andto_numeric()
(GH 55734)Removed support for
slice
inDataFrame.take()
(GH 51539)Removed the
ArrayManager
(GH 55043)Removed the
fastpath
argument from theSeries
constructor (GH 55466)Removed the
is_boolean
,is_integer
,is_floating
,holds_integer
,is_numeric
,is_categorical
,is_object
, andis_interval
attributes ofIndex
(GH 50042)Removed the
ordinal
keyword inPeriodIndex
, usePeriodIndex.from_ordinals()
instead (GH 55960)Removed unused arguments
*args
and**kwargs
inResampler
methods (GH 50977)Unrecognized timezones when parsing strings to datetimes now raises a
ValueError
(GH 51477)Removed the
Grouper
attributesax
,groups
,indexer
, andobj
(GH 51206, GH 51182)Removed deprecated keyword
verbose
onread_csv()
andread_table()
(GH 56556)Removed the
method
keyword inExtensionArray.fillna
, implementExtensionArray._pad_or_backfill
instead (GH 53621)Removed the attribute
dtypes
fromDataFrameGroupBy
(GH 51997)Enforced deprecation of
argmin
,argmax
,idxmin
, andidxmax
returning a result whenskipna=False
and an NA value is encountered or all values are NA values; these operations will now raise in such cases (GH 33941, GH 51276)
Performance improvements#
Eliminated circular reference in to original pandas object in accessor attributes (e.g.
Series.str
). However, accessor instantiation is no longer cached (GH 47667, GH 41357)Categorical.categories
returns aRangeIndex
columns instead of anIndex
if the constructedvalues
was arange
. (GH 57787)DataFrame
returns aRangeIndex
columns when possible whendata
is adict
(GH 57943)Series
returns aRangeIndex
index when possible whendata
is adict
(GH 58118)concat()
returns aRangeIndex
column when possible whenobjs
containsSeries
andDataFrame
andaxis=0
(GH 58119)concat()
returns aRangeIndex
level in theMultiIndex
result whenkeys
is arange
orRangeIndex
(GH 57542)RangeIndex.append()
returns aRangeIndex
instead of aIndex
when appending values that could continue theRangeIndex
(GH 57467)Series.str.extract()
returns aRangeIndex
columns instead of anIndex
column when possible (GH 57542)Series.str.partition()
withArrowDtype
returns aRangeIndex
columns instead of anIndex
column when possible (GH 57768)Performance improvement in
DataFrame
whendata
is adict
andcolumns
is specified (GH 24368)Performance improvement in
MultiIndex
when settingMultiIndex.names
doesn’t invalidate all cached operations (GH 59578)Performance improvement in
DataFrame.join()
for sorted but non-unique indexes (GH 56941)Performance improvement in
DataFrame.join()
when left and/or right are non-unique andhow
is"left"
,"right"
, or"inner"
(GH 56817)Performance improvement in
DataFrame.join()
withhow="left"
orhow="right"
andsort=True
(GH 56919)Performance improvement in
DataFrame.to_csv()
whenindex=False
(GH 59312)Performance improvement in
DataFrameGroupBy.ffill()
,DataFrameGroupBy.bfill()
,SeriesGroupBy.ffill()
, andSeriesGroupBy.bfill()
(GH 56902)Performance improvement in
Index.join()
by propagating cached attributes in cases where the result matches one of the inputs (GH 57023)Performance improvement in
Index.take()
whenindices
is a full range indexer from zero to length of index (GH 56806)Performance improvement in
Index.to_frame()
returning aRangeIndex
columns of aIndex
when possible. (GH 58018)Performance improvement in
MultiIndex._engine()
to use smaller dtypes if possible (GH 58411)Performance improvement in
MultiIndex.equals()
for equal length indexes (GH 56990)Performance improvement in
MultiIndex.memory_usage()
to ignore the index engine when it isn’t already cached. (GH 58385)Performance improvement in
RangeIndex.__getitem__()
with a boolean mask or integers returning aRangeIndex
instead of aIndex
when possible. (GH 57588)Performance improvement in
RangeIndex.append()
when appending the same index (GH 57252)Performance improvement in
RangeIndex.argmin()
andRangeIndex.argmax()
(GH 57823)Performance improvement in
RangeIndex.insert()
returning aRangeIndex
instead of aIndex
when theRangeIndex
is empty. (GH 57833)Performance improvement in
RangeIndex.round()
returning aRangeIndex
instead of aIndex
when possible. (GH 57824)Performance improvement in
RangeIndex.searchsorted()
(GH 58376)Performance improvement in
RangeIndex.to_numpy()
when specifying anna_value
(GH 58376)Performance improvement in
RangeIndex.value_counts()
(GH 58376)Performance improvement in
RangeIndex.join()
returning aRangeIndex
instead of aIndex
when possible. (GH 57651, GH 57752)Performance improvement in
RangeIndex.reindex()
returning aRangeIndex
instead of aIndex
when possible. (GH 57647, GH 57752)Performance improvement in
RangeIndex.take()
returning aRangeIndex
instead of aIndex
when possible. (GH 57445, GH 57752)Performance improvement in
merge()
if hash-join can be used (GH 57970)Performance improvement in
CategoricalDtype.update_dtype()
whendtype
is aCategoricalDtype
with nonNone
categories and ordered (GH 59647)Performance improvement in
to_hdf()
avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (GH 58248)Performance improvement in
DataFrameGroupBy.__len__
andSeriesGroupBy.__len__
(GH 57595)Performance improvement in indexing operations for string dtypes (GH 56997)
Performance improvement in unary methods on a
RangeIndex
returning aRangeIndex
instead of aIndex
when possible. (GH 57825)
Bug fixes#
Categorical#
Datetimelike#
Bug in
is_year_start
where a DateTimeIndex constructed via a date_range with frequency ‘MS’ wouldn’t have the correct year or quarter start attributes (GH 57377)Bug in
Timestamp
constructor failing to raise whentz=None
is explicitly specified in conjunction with timezone-awaretzinfo
or data (GH 48688)Bug in
date_range()
where the last valid timestamp would sometimes not be produced (GH 56134)Bug in
date_range()
where using a negative frequency value would not include all points between the start and end values (GH 56147)Bug in
tseries.api.guess_datetime_format()
would fail to infer time format when “%Y” == “%H%M” (GH 57452)Bug in
tseries.frequencies.to_offset()
would fail to parse frequency strings starting with “LWOM” (GH 59218)Bug in
Dataframe.agg()
with df with missing values resulting in IndexError (GH 58810)Bug in
DatetimeIndex.is_year_start()
andDatetimeIndex.is_quarter_start()
does not raise on Custom business days frequencies bigger then “1C” (GH 58664)Bug in
DatetimeIndex.is_year_start()
andDatetimeIndex.is_quarter_start()
returningFalse
on double-digit frequencies (GH 58523)Bug in
DatetimeIndex.union()
andDatetimeIndex.intersection()
whenunit
was non-nanosecond (GH 59036)Bug in
Series.dt.microsecond()
producing incorrect results for pyarrow backedSeries
. (GH 59154)Bug in
to_datetime()
not respecting dayfirst if an uncommon date string was passed. (GH 58859)Bug in
to_datetime()
reports incorrect index in case of any failure scenario. (GH 58298)Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond
datetime64
,timedelta64
orDatetimeTZDtype
incorrectly truncating those scalars (GH 56410)
Timedelta#
Accuracy improvement in
Timedelta.to_pytimedelta()
to round microseconds consistently for large nanosecond based Timedelta (GH 57841)Bug in
DataFrame.cumsum()
which was raisingIndexError
if dtype istimedelta64[ns]
(GH 57956)
Timezones#
Numeric#
Bug in
DataFrame.quantile()
where the column type was not preserved whennumeric_only=True
with a list-likeq
produced an empty result (GH 59035)Bug in
np.matmul
withIndex
inputs raising aTypeError
(GH 57079)
Conversion#
Bug in
DataFrame.astype()
not castingvalues
for Arrow-based dictionary dtype correctly (GH 58479)Bug in
DataFrame.update()
bool dtype being converted to object (GH 55509)Bug in
Series.astype()
might modify read-only array inplace when casting to a string dtype (GH 57212)Bug in
Series.reindex()
not maintainingfloat32
type when areindex
introduces a missing value (GH 45857)
Strings#
Bug in
Series.value_counts()
would not respectsort=False
for series havingstring
dtype (GH 55224)
Interval#
Index.is_monotonic_decreasing()
,Index.is_monotonic_increasing()
, andIndex.is_unique()
could incorrectly beFalse
for anIndex
created from a slice of anotherIndex
. (GH 57911)Bug in
interval_range()
where start and end numeric types were always cast to 64 bit (GH 57268)
Indexing#
Bug in
DataFrame.__getitem__()
returning modified columns when called withslice
in Python 3.12 (GH 57500)Bug in
DataFrame.from_records()
throwing aValueError
when passed an empty list inindex
(GH 58594)
Missing#
Bug in
DataFrame.fillna()
andSeries.fillna()
that would ignore thelimit
argument onExtensionArray
dtypes (GH 58001)
MultiIndex#
DataFrame.loc()
withaxis=0
andMultiIndex
when setting a value adds extra columns (GH 58116)DataFrame.melt()
would not accept multiple names invar_name
when the columns were aMultiIndex
(GH 58033)MultiIndex.insert()
would not insert NA value correctly at unified location of index -1 (GH 59003)MultiIndex.get_level_values()
accessing aDatetimeIndex
does not carry the frequency attribute along (GH 58327, GH 57949)
I/O#
Bug in
DataFrame
andSeries
repr
ofcollections.abc.Mapping`
elements. (GH 57915)Bug in
DataFrame.to_json()
when"index"
was a value in theDataFrame.column
andIndex.name
wasNone
. Now, this will fail with aValueError
(GH 58925)Bug in
DataFrame.to_dict()
raises unnecessaryUserWarning
when columns are not unique andorient='tight'
. (GH 58281)Bug in
DataFrame.to_excel()
when writing emptyDataFrame
withMultiIndex
on both axes (GH 57696)Bug in
DataFrame.to_stata()
when writingDataFrame
andbyteorder=`big`
. (GH 58969)Bug in
DataFrame.to_string()
that raisedStopIteration
with nested DataFrames. (GH 16098)Bug in
HDFStore.get()
was failing to save data of dtype datetime64[s] correctly (GH 59004)Bug in
read_csv()
causing segmentation fault whenencoding_errors
is not a string. (GH 59059)Bug in
read_csv()
raisingTypeError
whenindex_col
is specified andna_values
is a dict containing the keyNone
. (GH 57547)Bug in
read_csv()
raisingTypeError
whennrows
anditerator
are specified without specifying achunksize
. (GH 59079)Bug in
read_csv()
where the order of thena_values
makes an inconsistency whenna_values
is a list non-string values. (GH 59303)Bug in
read_excel()
raisingValueError
when passing array of boolean values whendtype="boolean"
. (GH 58159)Bug in
read_json()
not validating thetyp
argument to not be exactly"frame"
or"series"
(GH 59124)Bug in
read_stata()
raisingKeyError
when input file is stored in big-endian format and contains strL data. (GH 58638)Bug in
read_stata()
where extreme value integers were incorrectly interpreted as missing for format versions 111 and prior (GH 58130)Bug in
read_stata()
where the missing code for double was not recognised for format versions 105 and prior (GH 58149)
Period#
Fixed error message when passing invalid period alias to
PeriodIndex.to_timestamp()
(GH 58974)
Plotting#
Bug in
DataFrameGroupBy.boxplot()
failed when there were multiple groupings (GH 14701)Bug in
DataFrame.plot.line()
raisingValueError
when set both color and adict
style (GH 59461)Bug in
DataFrame.plot()
that causes a shift to the right when the frequency multiplier is greater than one. (GH 57587)Bug in
Series.plot()
withkind="pie"
withArrowDtype
(GH 59192)
Groupby/resample/rolling#
Bug in
DataFrameGroupBy.__len__()
andSeriesGroupBy.__len__()
would raise when the grouping contained NA values anddropna=False
(GH 58644)Bug in
DataFrameGroupBy.groups()
andSeriesGroupby.groups()
that would not respect groupby argumentdropna
(GH 55919)Bug in
DataFrameGroupBy.median()
where nat values gave an incorrect result. (GH 57926)Bug in
DataFrameGroupBy.quantile()
wheninterpolation="nearest"
is inconsistent withDataFrame.quantile()
(GH 47942)Bug in
Resampler.interpolate()
on aDataFrame
with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (GH 21351)Bug in
DataFrame.ewm()
andSeries.ewm()
when passedtimes
and aggregation functions other than mean (GH 51695)Bug in
DataFrameGroupBy.agg()
that raisesAttributeError
when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (GH 55041)Bug in
DataFrameGroupBy.apply()
that was returning a completely empty DataFrame when all return values offunc
wereNone
instead of returning an empty DataFrame with the original columns and dtypes. (GH 57775)Bug in
DataFrameGroupBy.apply()
withas_index=False
that was returningMultiIndex
instead of returningIndex
. (GH 58291)Bug in
DataFrameGroupBy.cumsum()
andDataFrameGroupBy.cumprod()
wherenumeric_only
parameter was passed indirectly through kwargs instead of passing directly. (GH 58811)Bug in
DataFrameGroupBy.cumsum()
where it did not return the correct dtype when the label containedNone
. (GH 58811)Bug in
DataFrameGroupby.transform()
andSeriesGroupby.transform()
with a reducer andobserved=False
that coerces dtype to float when there are unobserved categories. (GH 55326)Bug in
Rolling.apply()
where the applied function could be called on fewer thanmin_period
periods ifmethod="table"
. (GH 58868)Bug in
Series.resample()
could raise when the the date range ended shortly before a non-existent time. (GH 58380)
Reshaping#
Bug in
qcut()
where values at the quantile boundaries could be incorrectly assigned (GH 59355)Bug in
DataFrame.join()
inconsistently setting result index name (GH 55815)Bug in
DataFrame.join()
when aDataFrame
with aMultiIndex
would raise anAssertionError
whenMultiIndex.names
containedNone
. (GH 58721)Bug in
DataFrame.merge()
where merging on a column containing onlyNaN
values resulted in an out-of-bounds array access (GH 59421)Bug in
DataFrame.unstack()
producing incorrect results whensort=False
(GH 54987, GH 55516)Bug in
DataFrame.pivot_table()
incorrectly subaggregating results when called without anindex
argument (GH 58722)Bug in
DataFrame.unstack()
producing incorrect results when manipulating emptyDataFrame
with anExtentionDtype
(GH 59123)
Sparse#
Bug in
SparseDtype
for equal comparison with na fill value. (GH 54770)Bug in
DataFrame.sparse.from_spmatrix()
which hard coded an invalidfill_value
for certain subtypes. (GH 59063)
ExtensionArray#
Bug in
arrays.ArrowExtensionArray.__setitem__()
which caused wrong behavior when using an integer array with repeated values as a key (GH 58530)Bug in
api.types.is_datetime64_any_dtype()
where a customExtensionDtype
would returnFalse
for array-likes (GH 57055)Bug in comparison between object with
ArrowDtype
and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-False
(for==
) or all-True
(for!=
) (GH 59505)Bug in various
DataFrame
reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (GH 59234)
Styler#
Other#
Bug in
DataFrame
when passing adict
with a NA scalar andcolumns
that would always returnnp.nan
(GH 57205)Bug in
eval()
onExtensionArray
on including division/
failed with aTypeError
. (GH 58748)Bug in
eval()
oncomplex
including division/
discards imaginary part. (GH 21374)Bug in
eval()
where the names of theSeries
were not preserved when usingengine="numexpr"
. (GH 10239)Bug in
unique()
onIndex
not always returningIndex
(GH 57043)Bug in
DataFrame.apply()
where passingengine="numba"
ignoredargs
passed to the applied function (GH 58712)Bug in
DataFrame.eval()
andDataFrame.query()
which caused an exception when using NumPy attributes via@
notation, e.g.,df.eval("@np.floor(a)")
. (GH 58041)Bug in
DataFrame.eval()
andDataFrame.query()
which did not allow to usetan
function. (GH 55091)Bug in
DataFrame.query()
which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character#
, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (GH 59285) (GH 49633)Bug in
DataFrame.sort_index()
when passingaxis="columns"
andignore_index=True
andascending=False
not returning aRangeIndex
columns (GH 57293)Bug in
DataFrame.transform()
that was returning the wrong order unless the index was monotonically increasing. (GH 57069)Bug in
DataFrame.where()
where using a non-bool type array in the function would return aValueError
instead of aTypeError
(GH 56330)Bug in
Index.sort_values()
when passing a key function that turns values into tuples, e.g.key=natsort.natsort_key
, would raiseTypeError
(GH 56081)Bug in
Series.diff()
allowing non-integer values for theperiods
argument. (GH 56607)Bug in
Series.dt()
methods inArrowDtype
that were returning incorrect values. (GH 57355)Bug in
Series.rank()
that doesn’t preserve missing values for nullable integers whenna_option='keep'
. (GH 56976)Bug in
Series.replace()
andDataFrame.replace()
inconsistently replacing matching instances whenregex=True
and missing values are present. (GH 56599)Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers’ associated dtype, for string and datetime columns (GH 54781)
Bug in
Series.list
methods not preserving the originalIndex
. (GH 58425)