What’s new in 3.0.0 (Month XX, 2025)#
These are the changes in pandas 3.0.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
Dedicated string data type by default#
Historically, pandas represented string columns with NumPy object data type.
This representation has numerous problems: it is not specific to strings (any
Python object can be stored in an object-dtype array, not just strings) and
it is often not very efficient (both performance wise and for memory usage).
Starting with pandas 3.0, a dedicated string data type is enabled by default
(backed by PyArrow under the hood, if installed, otherwise falling back to being
backed by NumPy object-dtype). This means that pandas will start inferring
columns containing string data as the new str data type when creating pandas
objects, such as in constructors or IO functions.
Old behavior:
>>> ser = pd.Series(["a", "b"])
0 a
1 b
dtype: object
New behavior:
>>> ser = pd.Series(["a", "b"])
0 a
1 b
dtype: str
The string data type that is used in these scenarios will mostly behave as NumPy object would, including missing value semantics and general operations on these columns.
The main characteristic of the new string data type:
Inferred by default for string data (instead of object dtype)
The
strdtype can only hold strings (or missing values), in contrast toobjectdtype. (setitem with non string fails)The missing value sentinel is always
NaN(np.nan) and follows the same missing value semantics as the other default dtypes.
Those intentional changes can have breaking consequences, for example when checking
for the .dtype being object dtype or checking the exact missing value sentinel.
See the Migration guide for the new string data type (pandas 3.0) for more details on the behaviour changes
and how to adapt your code to the new default.
Copy-on-Write#
The new “copy-on-write” behaviour in pandas 3.0 brings changes in behavior in how pandas operates with respect to copies and views. A summary of the changes:
The result of any indexing operation (subsetting a DataFrame or Series in any way, i.e. including accessing a DataFrame column as a Series) or any method returning a new DataFrame or Series, always behaves as if it were a copy in terms of user API.
As a consequence, if you want to modify an object (DataFrame or Series), the only way to do this is to directly modify that object itself.
The main goal of this change is to make the user API more consistent and predictable. There is now a clear rule: any subset or returned series/dataframe always behaves as a copy of the original, and thus never modifies the original (before pandas 3.0, whether a derived object would be a copy or a view depended on the exact operation performed, which was often confusing).
Because every single indexing step now behaves as a copy, this also means that
“chained assignment” (updating a DataFrame with multiple setitem steps) will
stop working. Because this now consistently never works, the
SettingWithCopyWarning is removed.
The new behavioral semantics are explained in more detail in the user guide about Copy-on-Write.
A secondary goal is to improve performance by avoiding unnecessary copies. As mentioned above, every new DataFrame or Series returned from an indexing operation or method behaves as a copy, but under the hood pandas will use views as much as possible, and only copy when needed to guarantee the “behaves as a copy” behaviour (this is the actual “copy-on-write” mechanism used as an implementation detail).
Some of the behaviour changes described above are breaking changes in pandas 3.0. When upgrading to pandas 3.0, it is recommended to first upgrade to pandas 2.3 to get deprecation warnings for a subset of those changes. The migration guide explains the upgrade process in more detail.
pd.col syntax can now be used in DataFrame.assign() and DataFrame.loc()#
You can now use pd.col to create callables for use in dataframe methods which accept them. For example, if you have a dataframe
In [1]: df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
and you want to create a new column 'c' by summing 'a' and 'b', then instead of
In [2]: df.assign(c = lambda df: df['a'] + df['b'])
Out[2]:
a b c
0 1 4 5
1 1 5 6
2 2 6 8
you can now write:
In [3]: df.assign(c = pd.col('a') + pd.col('b'))
Out[3]:
a b c
0 1 4 5
1 1 5 6
2 2 6 8
New Deprecation Policy#
pandas 3.0.0 introduces a new 3-stage deprecation policy: using DeprecationWarning initially, then switching to FutureWarning for broader visibility in the last minor version before the next major release, and then removal of the deprecated functionality in the major release. This was done to give downstream packages more time to adjust to pandas deprecations, which should reduce the amount of warnings that a user gets from code that isn’t theirs. See PDEP 17 for more details.
All warnings for upcoming changes in pandas will have the base class pandas.errors.PandasChangeWarning. Users may also use the following subclasses to control warnings.
pandas.errors.Pandas4Warning: Warnings which will be enforced in pandas 4.0.pandas.errors.Pandas5Warning: Warnings which will be enforced in pandas 5.0.pandas.errors.PandasPendingDeprecationWarning: Base class of all warnings which emit aPendingDeprecationWarning, independent of the version they will be enforced.pandas.errors.PandasDeprecationWarning: Base class of all warnings which emit aDeprecationWarning, independent of the version they will be enforced.pandas.errors.PandasFutureWarning: Base class of all warnings which emit aFutureWarning, independent of the version they will be enforced.
Other enhancements#
pandas.merge()propagates theattrsattribute to the result if all inputs have identicalattrs, as has so far already been the case forpandas.concat().pandas.api.typing.FrozenListis available for typing the outputs ofMultiIndex.names,MultiIndex.codesandMultiIndex.levels(GH 58237)pandas.api.typing.SASReaderis available for typing the output ofread_sas()(GH 55689)Added
Styler.to_typst()to write Styler objects to file, buffer or string in Typst format (GH 57617)Added missing
pandas.Series.info()to API reference (GH 60926)pandas.api.typing.NoDefaultis available for typingno_defaultDataFrame.to_excel()now raises anUserWarningwhen the character count in a cell exceeds Excel’s limitation of 32767 characters (GH 56954)pandas.merge()now validates thehowparameter input (merge type) (GH 59435)pandas.merge(),DataFrame.merge()andDataFrame.join()now support anti joins (left_antiandright_anti) in thehowparameter (GH 42916)read_spss()now supports kwargs to be passed to pyreadstat (GH 56356)read_stata()now returnsdatetime64resolutions better matching those natively stored in the stata format (GH 55642)DataFrame.agg()called withaxis=1and afuncwhich relabels the result index now raises aNotImplementedError(GH 58807).Index.get_loc()now accepts also subclasses oftupleas keys (GH 57922)Styler.set_tooltips()provides alternative method to storing tooltips by using title attribute of td elements. (GH 56981)Added missing parameter
weightsinDataFrame.plot.kde()for the estimation of the PDF (GH 59337)Allow dictionaries to be passed to
pandas.Series.str.replace()viapatparameter (GH 51748)Support passing a
Seriesinput tojson_normalize()that retains theSeriesIndex(GH 51452)Support reading value labels from Stata 108-format (Stata 6) and earlier files (GH 58154)
Users can globally disable any
PerformanceWarningby setting the optionmode.performance_warningstoFalse(GH 56920)Styler.format_index_names()can now be used to format the index and column names (GH 48936 and GH 47489)errors.DtypeWarningimproved to include column names when mixed data types are detected (GH 58174)RollingandExpandingnow supportpipemethod (GH 57076)Seriesnow supports the Arrow PyCapsule Interface for export (GH 59518)DataFrame.to_excel()argumentmerge_cellsnow accepts a value of"columns"to only mergeMultiIndexcolumn header header cells (GH 35384)set_option()now accepts a dictionary of options, simplifying configuration of multiple settings at once (GH 61093)DataFrame.corrwith()now acceptsmin_periodsas optional arguments, as inDataFrame.corr()andSeries.corr()(GH 9490)DataFrame.cummin(),DataFrame.cummax(),DataFrame.cumprod()andDataFrame.cumsum()methods now have anumeric_onlyparameter (GH 53072)DataFrame.ewm()now allowsadjust=Falsewhentimesis provided (GH 54328)DataFrame.fillna()andSeries.fillna()can now acceptvalue=None; for non-object dtype the corresponding NA value will be used (GH 57723)DataFrame.pivot_table()andpivot_table()now allow the passing of keyword arguments toaggfuncthrough**kwargs(GH 57884)DataFrame.to_json()now encodesDecimalas strings instead of floats (GH 60698)Series.cummin()andSeries.cummax()now supportsCategoricalDtype(GH 52335)Series.plot()now correctly handle theylabelparameter for pie charts, allowing for explicit control over the y-axis label (GH 58239)DataFrame.plot.scatter()argumentcnow accepts a column of strings, where rows with the same string are colored identically (GH 16827 and GH 16485)Series.nlargest()uses a ‘stable’ sort internally and will preserve original ordering.ArrowDtypenow supportspyarrow.JsonType(GH 60958)DataFrameGroupByandSeriesGroupBymethodssum,mean,median,prod,min,max,std,varandsemnow acceptskipnaparameter (GH 15675)Easterhas gained a new constructor argumentmethodwhich specifies the method used to calculate Easter — for example, Orthodox Easter (GH 61665)Holidayconstructor argumentdays_of_weekwill raise aValueErrorwhen type is something other thanNoneortuple(GH 61658)Holidayhas gained the constructor argument and fieldexclude_datesto exclude specific datetimes from a custom holiday calendar (GH 54382)RollingandExpandingnow supportnunique(GH 26958)RollingandExpandingnow support aggregationsfirstandlast(GH 33155)read_parquet()acceptsto_pandas_kwargswhich are forwarded topyarrow.Table.to_pandas()which enables passing additional keywords to customize the conversion to pandas, such asmaps_as_pydictsto read the Parquet map data type as python dictionaries (GH 56842)to_numeric()on big integers converts toobjectdatatype with python integers when not coercing. (GH 51295)DataFrameGroupBy.transform(),SeriesGroupBy.transform(),DataFrameGroupBy.agg(),SeriesGroupBy.agg(),SeriesGroupBy.apply(),DataFrameGroupBy.apply()now supportkurt(GH 40139)DataFrame.apply()supports using third-party execution engines like the Bodo.ai JIT compiler (GH 60668)DataFrame.iloc()andSeries.iloc()now support boolean masks in__getitem__for more consistent indexing behavior (GH 60994)DataFrame.to_csv()andSeries.to_csv()now support Python’s new-style format strings (e.g.,"{:.6f}") for thefloat_formatparameter, in addition to old-style%format strings and callables. This allows for more flexible and modern formatting of floating point numbers when exporting to CSV. (GH 49580)DataFrameGroupBy.transform(),SeriesGroupBy.transform(),DataFrameGroupBy.agg(),SeriesGroupBy.agg(),RollingGroupby.apply(),ExpandingGroupby.apply(),Rolling.apply(),Expanding.apply(),DataFrame.apply()withengine="numba"now supports positional arguments passed as kwargs (GH 58995)Rolling.agg(),Expanding.agg()andExponentialMovingWindow.agg()now acceptNamedAggaggregations through**kwargs(GH 28333)Series.map()can now accept kwargs to pass on to func (GH 59814)Series.map()now accepts anengineparameter to allow execution with a third-party execution engine (GH 61125)Series.rank()andDataFrame.rank()with numpy-nullable dtypes preserveNAvalues and returnUInt64dtype where appropriate instead of castingNAtoNaNwithfloat64dtype (GH 62043)Series.str.get_dummies()now accepts adtypeparameter to specify the dtype of the resulting DataFrame (GH 47872)pandas.concat()will raise aValueErrorwhenignore_index=Trueandkeysis notNone(GH 59274)frozensetelements in pandas objects are now natively printed (GH 60690)Add
"delete_rows"option toif_existsargument inDataFrame.to_sql()deleting all records of the table before inserting data (GH 37210).Added half-year offset classes
HalfYearBegin,HalfYearEnd,BHalfYearBeginandBHalfYearEnd(GH 60928)Added support for
axis=1withdictorSeriesarguments intoDataFrame.fillna()(GH 4514)Added support to read and write from and to Apache Iceberg tables with the new
read_iceberg()andDataFrame.to_iceberg()functions (GH 61383)Errors occurring during SQL I/O will now throw a generic
DatabaseErrorinstead of the raw Exception type from the underlying driver manager library (GH 60748)Implemented
Series.str.isascii()andSeries.str.isascii()(GH 59091)Improve error reporting through outputting the first few duplicates when
merge()validation fails (GH 62742)Improve the resulting dtypes in
DataFrame.where()andDataFrame.mask()withExtensionDtypeother(GH 62038)Improved deprecation message for offset aliases (GH 60820)
Many type aliases are now exposed in the new submodule
pandas.api.typing.aliases(GH 55231)Multiplying two
DateOffsetobjects will now raise aTypeErrorinstead of aRecursionError(GH 59442)Restore support for reading Stata 104-format and enable reading 103-format dta files (GH 58554)
Support passing a
Iterable[Hashable]input toDataFrame.drop_duplicates()(GH 59237)Support reading Stata 102-format (Stata 1) dta files (GH 58978)
Support reading Stata 110-format (Stata 7) dta files (GH 47176)
Switched wheel upload to PyPI Trusted Publishing (OIDC) for release-tag pushes in
wheels.yml. (GH 61718)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
Improved behavior in groupby for observed=False#
A number of bugs have been fixed due to improved handling of unobserved groups (GH 55738). All remarks in this section equally impact SeriesGroupBy.
In previous versions of pandas, a single grouping with DataFrameGroupBy.apply() or DataFrameGroupBy.agg() would pass the unobserved groups to the provided function, resulting in 0 below.
In [4]: df = pd.DataFrame(
...: {
...: "key1": pd.Categorical(list("aabb"), categories=list("abc")),
...: "key2": [1, 1, 1, 2],
...: "values": [1, 2, 3, 4],
...: }
...: )
...:
In [5]: df
Out[5]:
key1 key2 values
0 a 1 1
1 a 1 2
2 b 1 3
3 b 2 4
In [6]: gb = df.groupby("key1", observed=False)
In [7]: gb[["values"]].apply(lambda x: x.sum())
Out[7]:
values
key1
a 3
b 7
c 0
However this was not the case when using multiple groupings, resulting in NaN below.
In [1]: gb = df.groupby(["key1", "key2"], observed=False)
In [2]: gb[["values"]].apply(lambda x: x.sum())
Out[2]:
values
key1 key2
a 1 3.0
2 NaN
b 1 3.0
2 4.0
c 1 NaN
2 NaN
Now using multiple groupings will also pass the unobserved groups to the provided function.
In [8]: gb = df.groupby(["key1", "key2"], observed=False)
In [9]: gb[["values"]].apply(lambda x: x.sum())
Out[9]:
values
key1 key2
a 1 3
2 0
b 1 3
2 4
c 1 0
2 0
Similarly:
In previous versions of pandas the method
DataFrameGroupBy.sum()would result in0for unobserved groups, butDataFrameGroupBy.prod(),DataFrameGroupBy.all(), andDataFrameGroupBy.any()would all result in NA values. Now these methods result in1,True, andFalserespectively.DataFrameGroupBy.groups()did not include unobserved groups and now does.
These improvements also fixed certain bugs in groupby:
DataFrameGroupBy.agg()would fail when there are multiple groupings, unobserved groups, andas_index=False(GH 36698)DataFrameGroupBy.groups()withsort=Falsewould sort groups; they now occur in the order they are observed (GH 56966)DataFrameGroupBy.nunique()would fail when there are multiple groupings, unobserved groups, andas_index=False(GH 52848)DataFrameGroupBy.sum()would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (GH 43891)DataFrameGroupBy.value_counts()would produce incorrect results when used with some categorical and some non-categorical groupings andobserved=False(GH 56016)
notable_bug_fix2#
Backwards incompatible API changes#
Datetime resolution inference#
Converting a sequence of strings, datetime objects, or np.datetime64 objects to
a datetime64 dtype now performs inference on the appropriate resolution (AKA unit) for the output dtype. This affects Series, DataFrame, Index, DatetimeIndex, and to_datetime().
Previously, these would always give nanosecond resolution:
In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
In [2]: pd.to_datetime([dt]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.Index([dt]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.DatetimeIndex([dt]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.Series([dt]).dtype
Out[5]: dtype('<M8[ns]')
This now infers the unit microsecond unit “us” from the pydatetime object, matching the scalar Timestamp behavior.
In [10]: In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
In [11]: In [2]: pd.to_datetime([dt]).dtype
Out[11]: dtype('<M8[us]')
In [12]: In [3]: pd.Index([dt]).dtype
Out[12]: dtype('<M8[us]')
In [13]: In [4]: pd.DatetimeIndex([dt]).dtype
Out[13]: dtype('<M8[us]')
In [14]: In [5]: pd.Series([dt]).dtype
Out[14]: dtype('<M8[us]')
Similar when passed a sequence of np.datetime64 objects, the resolution of the passed objects will be retained (or for lower-than-second resolution, second resolution will be used).
When passing strings, the resolution will depend on the precision of the string, again matching the Timestamp behavior. Previously:
In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
Out[5]: dtype('<M8[ns]')
The inferred resolution now matches that of the input strings:
In [15]: In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
Out[15]: dtype('<M8[s]')
In [16]: In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
Out[16]: dtype('<M8[ms]')
In [17]: In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
Out[17]: dtype('<M8[us]')
In [18]: In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
Out[18]: dtype('<M8[ns]')
In cases with mixed-resolution inputs, the highest resolution is used:
In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
Out[2]: dtype('<M8[ns]')
Changed behavior in DataFrame.value_counts() and DataFrameGroupBy.value_counts() when sort=False#
In previous versions of pandas, DataFrame.value_counts() with sort=False would sort the result by row labels (as was documented). This was nonintuitive and inconsistent with Series.value_counts() which would maintain the order of the input. Now DataFrame.value_counts() will maintain the order of the input.
In [19]: df = pd.DataFrame(
....: {
....: "a": [2, 2, 2, 2, 1, 1, 1, 1],
....: "b": [2, 1, 3, 1, 2, 3, 1, 1],
....: }
....: )
....:
In [20]: df
Out[20]:
a b
0 2 2
1 2 1
2 2 3
3 2 1
4 1 2
5 1 3
6 1 1
7 1 1
Old behavior
In [3]: df.value_counts(sort=False)
Out[3]:
a b
1 1 2
2 1
3 1
2 1 2
2 1
3 1
Name: count, dtype: int64
New behavior
In [21]: df.value_counts(sort=False)
Out[21]:
a b
2 2 1
1 2
3 1
1 2 1
3 1
1 2
Name: count, dtype: int64
This change also applies to DataFrameGroupBy.value_counts(). Here, there are two options for sorting: one sort passed to DataFrame.groupby() and one passed directly to DataFrameGroupBy.value_counts(). The former will determine whether to sort the groups, the latter whether to sort the counts. All non-grouping columns will maintain the order of the input within groups.
Old behavior
In [5]: df.groupby("a", sort=True).value_counts(sort=False)
Out[5]:
a b
1 1 2
2 1
3 1
2 1 2
2 1
3 1
dtype: int64
New behavior
In [22]: df.groupby("a", sort=True).value_counts(sort=False)
Out[22]:
a b
1 2 1
3 1
1 2
2 2 1
3 1
1 2
Name: count, dtype: int64
Changed behavior of pd.offsets.Day to always represent calendar-day#
In previous versions of pandas, offsets.Day represented a fixed span
of 24 hours, disregarding Daylight Savings Time transitions. It now consistently
behaves as a calendar-day, preserving time-of-day across DST transitions:
Old behavior
In [5]: ts = pd.Timestamp("2025-03-08 08:00", tz="US/Eastern")
In [6]: ts + pd.offsets.Day(1)
Out[3]: Timestamp('2025-03-09 09:00:00-0400', tz='US/Eastern')
New behavior
In [23]: ts = pd.Timestamp("2025-03-08 08:00", tz="US/Eastern")
In [24]: ts + pd.offsets.Day(1)
Out[24]: Timestamp('2025-03-09 08:00:00-0400', tz='US/Eastern')
This change fixes a long-standing bug in date_range() (GH 51716, GH 35388), but causes several
small behavior differences as collateral:
pd.offsets.Day(n)no longer compares as equal topd.offsets.Hour(24*n)offsets.Dayno longer supports divisionTimedeltano longer acceptsDayobjects as inputstseries.frequencies.to_offset()on aTimedeltaobject returns aoffsets.Hourobject in cases where it used to return aDayobject.Adding or subtracting a scalar from a timezone-aware
DatetimeIndexwith aDayfreqno longer preserves thatfreqattribute.Adding or subtracting a
Daywith aTimedeltais no longer supported.Adding or subtracting a
Dayoffset to a timezone-awareTimestampor datetime-like may lead to an ambiguous or non-existent time, which will raise.
Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes#
Previously, when dealing with a nullable dtype (e.g. Float64Dtype or int64[pyarrow]), NaN was treated as interchangeable with NA in some circumstances but not others. This was done to make adoption easier, but caused some confusion (GH 32265). In 3.0, an option "mode.nan_is_na" (default True) controls whether to treat NaN as equivalent to NA.
With pd.set_option("mode.nan_is_na", True) (again, this is the default), NaN can be passed to constructors, __setitem__, __contains__ and be treated the same as NA. The only change users will see is that arithmetic and np.ufunc operations that previously introduced NaN entries produce NA entries instead:
Old behavior:
In [2]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
In [3]: ser / 0
Out[3]:
0 NaN
1 <NA>
dtype: Float64
New behavior:
In [25]: ser = pd.Series([0, None], dtype=pd.Float64Dtype())
In [26]: ser / 0
Out[26]:
0 <NA>
1 <NA>
dtype: Float64
By contrast, with pd.set_option("mode.nan_is_na", False), NaN is always considered distinct and specifically as a floating-point value, so cannot be used with integer dtypes:
Old behavior:
In [2]: ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
In [3]: ser[1]
Out[3]: <NA>
New behavior:
In [27]: pd.set_option("mode.nan_is_na", False)
In [28]: ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
In [29]: ser[1]
Out[29]: np.float64(nan)
If we had passed pd.Int64Dtype() or "int64[pyarrow]" for the dtype in the latter example, this would raise, as a float NaN cannot be held by an integer dtype.
With "mode.nan_is_na" set to False, ser.to_numpy() (and frame.values and np.asarray(obj)) will convert to object dtype if NA entries are present, where before they would coerce to NaN. To retain a float numpy dtype, explicitly pass na_value=np.nan to Series.to_numpy().
The __module__ attribute now points to public modules#
The __module__ attribute on functions and classes in the public API has been
updated to refer to the preferred public module from which to access the object,
rather than the module in which the object happens to be defined (GH 55178).
This produces more informative displays in the Python console for classes, e.g.,
instead of <class 'pandas.core.frame.DataFrame'> you now see
<class 'pandas.DataFrame'>, and in interactive tools such as IPython, e.g.,
instead of <function pandas.io.parsers.readers.read_csv(...)> you now see
<function pandas.read_csv(...)>.
This may break code that relies on the previous __module__ values (e.g.
doctests inspecting the type() of a DataFrame object).
Increased minimum version for Python#
pandas 3.0.0 supports Python 3.11 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. The following required dependencies were updated:
Package |
New Minimum Version |
|---|---|
numpy |
1.26.0 |
tzdata |
2023.3 |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
New Minimum Version |
|---|---|
adbc-driver-postgresql |
1.2.0 |
adbc-driver-sqlite |
1.2.0 |
mypy (dev) |
1.9.0 |
beautifulsoup4 |
4.12.3 |
bottleneck |
1.4.2 |
fastparquet |
2024.11.0 |
fsspec |
2024.10.0 |
hypothesis |
6.116.0 |
gcsfs |
2024.10.0 |
Jinja2 |
3.1.5 |
lxml |
5.3.0 |
Jinja2 |
3.1.3 |
matplotlib |
3.9.3 |
numba |
0.60.0 |
numexpr |
2.10.2 |
qtpy |
2.4.2 |
openpyxl |
3.1.5 |
psycopg2 |
2.9.10 |
pyarrow |
13.0.0 |
pymysql |
1.1.1 |
pyreadstat |
1.2.8 |
pytables |
3.10.1 |
python-calamine |
0.3.0 |
pytz |
2024.2 |
s3fs |
2024.10.0 |
SciPy |
1.14.1 |
sqlalchemy |
2.0.36 |
xarray |
2024.10.0 |
xlsxwriter |
3.2.0 |
zstandard |
0.23.0 |
See Dependencies and Optional dependencies for more.
pytz now an optional dependency#
pandas now uses zoneinfo from the standard library as the default timezone implementation when passing a timezone
string to various methods. (GH 34916)
Old behavior:
In [1]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [2]: ts.tz
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>
New behavior:
In [30]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [31]: ts.tz
Out[31]: zoneinfo.ZoneInfo(key='US/Pacific')
pytz timezone objects are still supported when passed directly, but they will no longer be returned by default
from string inputs. Moreover, pytz is no longer a required dependency of pandas, but can be installed
with the pip extra pip install pandas[timezone].
Additionally, pandas no longer throws pytz exceptions for timezone operations leading to ambiguous or nonexistent
times. These cases will now raise a ValueError.
Other API changes#
3rd party
py.pathobjects are no longer explicitly supported in IO methods. Usepathlib.Pathobjects instead (GH 57091)read_table()’sparse_datesargument defaults toNoneto improve consistency withread_csv()(GH 57476)All classes inheriting from builtin
tuple(including types created withcollections.namedtuple()) are now hashed and compared as builtintupleduring indexing operations (GH 57922)Made
dtypea required argument inExtensionArray._from_sequence_of_strings()(GH 56519)Passing a
Seriesinput tojson_normalize()will now retain theSeriesIndex, previously output had a newRangeIndex(GH 51452)Removed
Index.sort()which always raised aTypeError. This attribute is not defined and will raise anAttributeError(GH 59283)Unused
dtypeargument has been removed from theMultiIndexconstructor (GH 60962)Updated
DataFrame.to_excel()so that the output spreadsheet has no styling. Custom styling can still be done usingStyler.to_excel()(GH 54154)pickle and HDF (
.h5) files created with Python 2 are no longer explicitly supported (GH 57387)pickled objects from pandas version less than
1.0.0are no longer supported (GH 57155)when comparing the indexes in
testing.assert_series_equal(), check_exact defaults to True if anIndexis of integer dtypes. (GH 57386)Index set operations (like union or intersection) will now ignore the dtype of an empty
RangeIndexor emptyIndexwith object dtype when determining the dtype of the resulting Index (GH 60797)IncompatibleFrequencynow subclassesTypeErrorinstead ofValueError. As a result, joins with mismatched frequencies now cast to object like other non-comparable joins, and arithmetic with indexes with mismatched frequencies align (GH 55782)Series“flex” methods likeSeries.add()no longer allow passing aDataFrameforother; use the DataFrame reversed method instead (GH 46179)CategoricalIndex.append()no longer attempts to cast different-dtype indexes to the caller’s dtype (GH 41626)ExtensionDtype.construct_array_type()is now a regular method instead of aclassmethod(GH 58663)Comparison operations between
IndexandSeriesnow consistently returnSeriesregardless of which object is on the left or right (GH 36759)Numpy functions like
np.isinfthat return a bool dtype when called on aIndexobject now return a bool-dtypeIndexinstead ofnp.ndarray(GH 52676)
Deprecations#
Copy keyword#
The copy keyword argument in the following methods is deprecated and
will be removed in a future version:
DataFrame.merge()/pd.merge()
Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until
necessary. Use .copy to trigger an eager copy. The copy keyword has no effect
starting with 3.0, so it can be safely removed from your code.
Other Deprecations#
Deprecated
core.internals.api.make_block(), use public APIs instead (GH 56815)Deprecated
DataFrameGroupby.corrwith()(GH 57158)Deprecated
Timestamp.utcfromtimestamp(), useTimestamp.fromtimestamp(ts, "UTC")instead (GH 56680)Deprecated
Timestamp.utcnow(), useTimestamp.now("UTC")instead (GH 56680)Deprecated
pd.core.internals.api.maybe_infer_ndim(GH 40226)Deprecated allowing constructing or casting to
Categoricalwith non-NA values that are not present in specifieddtype.categories(GH 40996)Deprecated allowing non-keyword arguments in
DataFrame.all(),DataFrame.min(),DataFrame.max(),DataFrame.sum(),DataFrame.prod(),DataFrame.mean(),DataFrame.median(),DataFrame.sem(),DataFrame.var(),DataFrame.std(),DataFrame.skew(),DataFrame.kurt(),Series.all(),Series.min(),Series.max(),Series.sum(),Series.prod(),Series.mean(),Series.median(),Series.sem(),Series.var(),Series.std(),Series.skew(), andSeries.kurt(). (GH 57087)Deprecated allowing non-keyword arguments in
DataFrame.groupby()andSeries.groupby()exceptbyandlevel. (GH 62102)Deprecated allowing non-keyword arguments in
Series.to_markdown()exceptbuf. (GH 57280)Deprecated allowing non-keyword arguments in
Series.to_string()exceptbuf. (GH 57280)Deprecated behavior of
DataFrameGroupBy.groups()andSeriesGroupBy.groups(), in a future versiongroupsby one element list will return tuple instead of scalar. (GH 58858)Deprecated behavior of
Series.dt.to_pytimedelta(), in a future version this will return aSeriescontaining pythondatetime.timedeltaobjects instead of anndarrayof timedelta; this matches the behavior of otherSeries.dt()properties. (GH 57463)Deprecated converting object-dtype columns of
datetime.datetimeobjects to datetime64 when writing to stata (GH 56536)Deprecated lowercase strings
d,bandcdenoting frequencies inDay,BusinessDayandCustomBusinessDayin favour ofD,BandC(GH 58998)Deprecated lowercase strings
w,w-mon,w-tue, etc. denoting frequencies inWeekin favour ofW,W-MON,W-TUE, etc. (GH 58998)Deprecated parameter
methodinDataFrame.reindex_like()/Series.reindex_like()(GH 58667)Deprecated strings
w,d,MIN,MS,USandNSdenoting units inTimedeltain favour ofW,D,min,ms,usandns(GH 59051)Deprecated the
argparameter ofSeries.map; pass the addedfuncargument instead. (GH 61260)Deprecated using
epochdate format inDataFrame.to_json()andSeries.to_json(), useisoinstead. (GH 57063)Deprecated allowing
fill_valuethat cannot be held in the original dtype (excepting NA values for integer and bool dtypes) inSeries.unstack()andDataFrame.unstack()(GH 12189, GH 53868)Deprecated allowing
fill_valuethat cannot be held in the original dtype (excepting NA values for integer and bool dtypes) inSeries.shift()andDataFrame.shift()(GH 53802)Deprecated backward-compatibility behavior for
DataFrame.select_dtypes()matching “str” dtype whennp.object_is specified (GH 61916)Deprecated option “future.no_silent_downcasting”, as it is no longer used. In a future version accessing this option will raise (GH 59502)
Deprecated slicing on a
SeriesorDataFramewith aDatetimeIndexusing adatetime.dateobject, explicitly cast toTimestampinstead (GH 35830)Deprecated the ‘inplace’ keyword from
Resampler.interpolate(), as passingTrueraisesAttributeError(GH 58690)
Removal of prior version deprecations/changes#
Enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets#
Renamed the following offset aliases (GH 57986):
offset |
removed alias |
new alias |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other Removals#
DataFrameGroupBy.idxmin,DataFrameGroupBy.idxmax,SeriesGroupBy.idxmin, andSeriesGroupBy.idxmaxwill now raise aValueErrorwhen a group has all NA values, or when used withskipna=Falseand any NA value is encountered (GH 10694, GH 57745)concat()no longer ignores empty objects when determining output dtypes (GH 39122)concat()with all-NA entries no longer ignores the dtype of those entries when determining the result dtype (GH 40893)read_excel(),read_json(),read_html(), andread_xml()no longer accept raw string or byte representation of the data. That type of data must be wrapped in aStringIOorBytesIO(GH 53767)to_datetime()with aunitspecified no longer parses strings into floats, instead parses them the same way as withoutunit(GH 50735)DataFrame.groupby()withas_index=Falseand aggregation methods will no longer exclude from the result the groupings that do not arise from the input (GH 49519)ExtensionArray._reduce()now requires akeepdims: bool = Falseparameter in the signature (GH 52788)Series.dt.to_pydatetime()now returns aSeriesofdatetime.datetimeobjects (GH 52459)SeriesGroupBy.agg()no longer pins the name of the group to the input passed to the providedfunc(GH 51703)All arguments except
nameinIndex.rename()are now keyword only (GH 56493)All arguments except the first
path-like argument in IO writers are now keyword only (GH 54229)Changed behavior of
Series.__getitem__()andSeries.__setitem__()to always treat integer keys as labels, never as positional, consistent withDataFramebehavior (GH 50617)Changed behavior of
Series.__getitem__(),Series.__setitem__(),DataFrame.__getitem__(),DataFrame.__setitem__()with an integer slice on objects with a floating-dtype index. This is now treated as positional indexing (GH 49612)Disallow a callable argument to
Series.iloc()to return atuple(GH 53769)Disallow allowing logical operations (
||,&,^) between pandas objects and dtype-less sequences (e.g.list,tuple); wrap the objects inSeries,Index, ornp.arrayfirst instead (GH 52264)Disallow automatic casting to object in
Serieslogical operations (&,^,||) between series with mismatched indexes and dtypes other thanobjectorbool(GH 52538)Disallow calling
Series.replace()orDataFrame.replace()without avalueand with non-dict-liketo_replace(GH 33302)Disallow constructing a
arrays.SparseArraywith scalar data (GH 53039)Disallow indexing an
Indexwith a boolean indexer of length zero, it now raisesValueError(GH 55820)Disallow non-standard (
np.ndarray,Index,ExtensionArray, orSeries) toisin(),unique(),factorize()(GH 52986)Disallow passing a pandas type to
Index.view()(GH 55709)Disallow units other than “s”, “ms”, “us”, “ns” for datetime64 and timedelta64 dtypes in
array()(GH 53817)Removed “freq” keyword from
PeriodArrayconstructor, use “dtype” instead (GH 52462)Removed ‘fastpath’ keyword in
Categoricalconstructor (GH 20110)Removed ‘kind’ keyword in
Series.resample()andDataFrame.resample()(GH 58125)Removed
Block,DatetimeTZBlock,ExtensionBlock,create_block_manager_from_blocksfrompandas.core.internalsandpandas.core.internals.api(GH 55139)Removed alias
arrays.PandasArrayforarrays.NumpyExtensionArray(GH 53694)Removed deprecated “method” and “limit” keywords from
Series.replace()andDataFrame.replace()(GH 53492)Removed extension test classes
BaseNoReduceTests,BaseNumericReduceTests,BaseBooleanReduceTests(GH 54663)Removed the “closed” and “normalize” keywords in
DatetimeIndex.__new__()(GH 52628)Removed the deprecated
delim_whitespacekeyword inread_csv()andread_table(), usesep=r"\s+"instead (GH 55569)Require
SparseDtype.fill_value()to be a valid value for theSparseDtype.subtype()(GH 53043)Stopped automatically casting non-datetimelike values (mainly strings) in
Series.isin()andIndex.isin()withdatetime64,timedelta64, andPeriodDtypedtypes (GH 53111)Stopped performing dtype inference in
Index,SeriesandDataFrameconstructors when given a pandas object (Series,Index,ExtensionArray), call.infer_objectson the input to keep the current behavior (GH 56012)Stopped performing dtype inference when setting a
Indexinto aDataFrame(GH 56102)Stopped performing dtype inference with in
Index.insert()with object-dtype index; this often affects the index/columns that result when setting new entries into an emptySeriesorDataFrame(GH 51363)Removed the “closed” and “unit” keywords in
TimedeltaIndex.__new__()(GH 52628, GH 55499)All arguments in
Index.sort_values()are now keyword only (GH 56493)All arguments in
Series.to_dict()are now keyword only (GH 56493)Changed the default value of
na_actioninCategorical.map()toNone(GH 51645)Changed the default value of
observedinDataFrame.groupby()andSeries.groupby()toTrue(GH 51811)Enforce deprecation in
testing.assert_series_equal()andtesting.assert_frame_equal()with object dtype and mismatched null-like values, which are now considered not-equal (GH 18463)Enforce banning of upcasting in in-place setitem-like operations (GH 59007) (see PDEP6)
Enforced deprecation
allandanyreductions withdatetime64,DatetimeTZDtype, andPeriodDtypedtypes (GH 58029)Enforced deprecation allowing non-
booland NA values fornainstr.contains(),str.startswith(), andstr.endswith()(GH 59615)Enforced deprecation disallowing
float“periods” indate_range(),period_range(),timedelta_range(),interval_range(), (GH 56036)Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes
utc=Truetoto_datetime()(GH 57275)Enforced deprecation in
Series.value_counts()andIndex.value_counts()with object dtype performing dtype inference on the.indexof the result (GH 56161)Enforced deprecation of
DataFrameGroupBy.get_group()andSeriesGroupBy.get_group()allowing thenameargument to be a non-tuple when grouping by a list of length 1 (GH 54155)Enforced deprecation of
Series.interpolate()andDataFrame.interpolate()for object-dtype (GH 57820)Enforced deprecation of
offsets.Tick.delta(), usepd.Timedelta(obj)instead (GH 55498)Enforced deprecation of
axis=Noneacting the same asaxis=0in the DataFrame reductionssum,prod,std,var, andsem, passingaxis=Nonewill now reduce over both axes; this is particularly the case when doing e.g.numpy.sum(df)(GH 21597)Enforced deprecation of
core.internalsmemberDatetimeTZBlock(GH 58467)Enforced deprecation of
date_parserinread_csv(),read_table(),read_fwf(), andread_excel()in favour ofdate_format(GH 50601)Enforced deprecation of
keep_date_colkeyword inread_csv()(GH 55569)Enforced deprecation of
quantilekeyword inRolling.quantile()andExpanding.quantile(), renamed toqinstead. (GH 52550)Enforced deprecation of argument
infer_datetime_formatinread_csv(), as a strict version of it is now the default (GH 48621)Enforced deprecation of combining parsed datetime columns in
read_csv()inparse_dates(GH 55569)Enforced deprecation of non-standard (
np.ndarray,ExtensionArray,Index, orSeries) argument toapi.extensions.take()(GH 52981)Enforced deprecation of parsing system timezone strings to
tzlocal, which depended on system timezone, pass the ‘tz’ keyword instead (GH 50791)Enforced deprecation of passing a dictionary to
SeriesGroupBy.agg()(GH 52268)Enforced deprecation of string
ASdenoting frequency inYearBeginand stringsAS-DEC,AS-JAN, etc. denoting annual frequencies with various fiscal year starts (GH 57793)Enforced deprecation of string
Adenoting frequency inYearEndand stringsA-DEC,A-JAN, etc. denoting annual frequencies with various fiscal year ends (GH 57699)Enforced deprecation of string
BASdenoting frequency inBYearBeginand stringsBAS-DEC,BAS-JAN, etc. denoting annual frequencies with various fiscal year starts (GH 57793)Enforced deprecation of string
BAdenoting frequency inBYearEndand stringsBA-DEC,BA-JAN, etc. denoting annual frequencies with various fiscal year ends (GH 57793)Enforced deprecation of strings
H,BH, andCBHdenoting frequencies inHour,BusinessHour,CustomBusinessHour(GH 59143)Enforced deprecation of strings
H,BH, andCBHdenoting units inTimedelta(GH 59143)Enforced deprecation of strings
T,L,U, andNdenoting frequencies inMinute,Milli,Micro,Nano(GH 57627)Enforced deprecation of strings
T,L,U, andNdenoting units inTimedelta(GH 57627)Enforced deprecation of the behavior of
concat()whenlen(keys) != len(objs)would truncate to the shorter of the two. Now this raises aValueError(GH 43485)Enforced deprecation of the behavior of
DataFrame.replace()andSeries.replace()withCategoricalDtypethat would introduce new categories. (GH 58270)Enforced deprecation of the behavior of
Series.argsort()in the presence of NA values (GH 58232)Enforced deprecation of values “pad”, “ffill”, “bfill”, and “backfill” for
Series.interpolate()andDataFrame.interpolate()(GH 57869)Enforced deprecation removing
Categorical.to_list(), useobj.tolist()instead (GH 51254)Enforced silent-downcasting deprecation for all relevant methods (GH 54710)
In
DataFrame.stack(), the default value offuture_stackis nowTrue; specifyingFalsewill raise aFutureWarning(GH 55448)Iterating over a
DataFrameGroupByorSeriesGroupBywill return tuples of length 1 for the groups when grouping bylevela list of length 1 (GH 50064)Methods
apply,agg, andtransformwill no longer replace NumPy functions (e.g.np.sum) and built-in functions (e.g.min) with the equivalent pandas implementation; use string aliases (e.g."sum"and"min") if you desire to use the pandas implementation (GH 53974)Passing both
freqandfill_valueinDataFrame.shift()andSeries.shift()andDataFrameGroupBy.shift()now raises aValueError(GH 54818)Removed
DataFrameGroupBy.quantile()andSeriesGroupBy.quantile()supporting bool dtype (GH 53975)Removed
DateOffset.is_anchored()andoffsets.Tick.is_anchored()(GH 56594)Removed
DataFrame.applymap,Styler.applymapandStyler.applymap_index(GH 52364)Removed
DataFrame.boolandSeries.bool(GH 51756)Removed
DataFrame.firstandDataFrame.last(GH 53710)Removed
DataFrame.swapaxesandSeries.swapaxes(GH 51946)Removed
DataFrameGroupBy.grouperandSeriesGroupBy.grouper(GH 56521)Removed
DataFrameGroupby.fillnaandSeriesGroupBy.fillna`(GH 55719)Removed
Index.format, useIndex.astype()withstrorIndex.map()with aformatterfunction instead (GH 55439)Removed
Resample.fillna(GH 55719)Removed
Series.__int__andSeries.__float__. Callint(Series.iloc[0])orfloat(Series.iloc[0])instead. (GH 51131)Removed
Series.ravel(GH 56053)Removed
Series.view(GH 56054)Removed
StataReader.close(GH 49228)Removed
_datafromDataFrame,Series,arrays.ArrowExtensionArray(GH 52003)Removed
axisargument fromDataFrame.groupby(),Series.groupby(),DataFrame.rolling(),Series.rolling(),DataFrame.resample(), andSeries.resample()(GH 51203)Removed
axisargument from all groupby operations (GH 50405)Removed
convert_dtypefromSeries.apply()(GH 52257)Removed
method,limitfill_axisandbroadcast_axiskeywords fromDataFrame.align()(GH 51968)Removed
pandas.api.types.is_intervalandpandas.api.types.is_period, useisinstance(obj, pd.Interval)andisinstance(obj, pd.Period)instead (GH 55264)Removed
pandas.io.sql.execute(GH 50185)Removed
pandas.value_counts, useSeries.value_counts()instead (GH 53493)Removed
read_gbqandDataFrame.to_gbq. Usepandas_gbq.read_gbqandpandas_gbq.to_gbqinstead https://pandas-gbq.readthedocs.io/en/latest/api.html (GH 55525)Removed
use_nullable_dtypesfromread_parquet()(GH 51853)Removed
year,month,quarter,day,hour,minute, andsecondkeywords in thePeriodIndexconstructor, usePeriodIndex.from_fields()instead (GH 55960)Removed argument
limitfromDataFrame.pct_change(),Series.pct_change(),DataFrameGroupBy.pct_change(), andSeriesGroupBy.pct_change(); the argumentmethodmust be set toNoneand will be removed in a future version of pandas (GH 53520)Removed deprecated argument
objinDataFrameGroupBy.get_group()andSeriesGroupBy.get_group()(GH 53545)Removed deprecated behavior of
Series.agg()usingSeries.apply()(GH 53325)Removed deprecated keyword
methodonSeries.fillna(),DataFrame.fillna()(GH 57760)Removed option
mode.use_inf_as_na, convert inf entries toNaNbefore instead (GH 51684)Removed support for
DataFrameinDataFrame.from_records`(:issue:`51697())Removed support for
errors="ignore"into_datetime(),to_timedelta()andto_numeric()(GH 55734)Removed support for
sliceinDataFrame.take()(GH 51539)Removed the
ArrayManager(GH 55043)Removed the
fastpathargument from theSeriesconstructor (GH 55466)Removed the
is_boolean,is_integer,is_floating,holds_integer,is_numeric,is_categorical,is_object, andis_intervalattributes ofIndex(GH 50042)Removed the
ordinalkeyword inPeriodIndex, usePeriodIndex.from_ordinals()instead (GH 55960)Removed unused arguments
*argsand**kwargsinResamplermethods (GH 50977)Unrecognized timezones when parsing strings to datetimes now raises a
ValueError(GH 51477)Removed the
Grouperattributesax,groups,indexer, andobj(GH 51206, GH 51182)Removed deprecated keyword
verboseonread_csv()andread_table()(GH 56556)Removed the
methodkeyword inExtensionArray.fillna, implementExtensionArray._pad_or_backfillinstead (GH 53621)Removed the attribute
dtypesfromDataFrameGroupBy(GH 51997)Enforced deprecation of
argmin,argmax,idxmin, andidxmaxreturning a result whenskipna=Falseand an NA value is encountered or all values are NA values; these operations will now raise in such cases (GH 33941, GH 51276)Enforced deprecation of storage option “pyarrow_numpy” for
StringDtype(GH 60152)Removed specifying
include_groups=TrueinDataFrameGroupBy.applyandResampler.apply(GH 7155)
Performance improvements#
Eliminated circular reference in to original pandas object in accessor attributes (e.g.
Series.str). However, accessor instantiation is no longer cached (GH 47667, GH 41357)Categorical.categoriesreturns aRangeIndexcolumns instead of anIndexif the constructedvalueswas arange. (GH 57787)DataFramereturns aRangeIndexcolumns when possible whendatais adict(GH 57943)Seriesreturns aRangeIndexindex when possible whendatais adict(GH 58118)concat()returns aRangeIndexcolumn when possible whenobjscontainsSeriesandDataFrameandaxis=0(GH 58119)concat()returns aRangeIndexlevel in theMultiIndexresult whenkeysis arangeorRangeIndex(GH 57542)RangeIndex.append()returns aRangeIndexinstead of aIndexwhen appending values that could continue theRangeIndex(GH 57467)Series.nlargest()has improved performance when there are duplicate values in the index (GH 55767)Series.str.extract()returns aRangeIndexcolumns instead of anIndexcolumn when possible (GH 57542)Series.str.partition()withArrowDtypereturns aRangeIndexcolumns instead of anIndexcolumn when possible (GH 57768)Performance improvement in
DataFramewhendatais adictandcolumnsis specified (GH 24368)Performance improvement in
MultiIndexwhen settingMultiIndex.namesdoesn’t invalidate all cached operations (GH 59578)Performance improvement in
DataFrame.join()for sorted but non-unique indexes (GH 56941)Performance improvement in
DataFrame.join()when left and/or right are non-unique andhowis"left","right", or"inner"(GH 56817)Performance improvement in
DataFrame.join()withhow="left"orhow="right"andsort=True(GH 56919)Performance improvement in
DataFrame.to_csv()whenindex=False(GH 59312)Performance improvement in
DataFrameGroupBy.ffill(),DataFrameGroupBy.bfill(),SeriesGroupBy.ffill(), andSeriesGroupBy.bfill()(GH 56902)Performance improvement in
Index.join()by propagating cached attributes in cases where the result matches one of the inputs (GH 57023)Performance improvement in
Index.take()whenindicesis a full range indexer from zero to length of index (GH 56806)Performance improvement in
Index.to_frame()returning aRangeIndexcolumns of aIndexwhen possible. (GH 58018)Performance improvement in
MultiIndex._engine()to use smaller dtypes if possible (GH 58411)Performance improvement in
MultiIndex.equals()for equal length indexes (GH 56990)Performance improvement in
MultiIndex.memory_usage()to ignore the index engine when it isn’t already cached. (GH 58385)Performance improvement in
RangeIndex.__getitem__()with a boolean mask or integers returning aRangeIndexinstead of aIndexwhen possible. (GH 57588)Performance improvement in
RangeIndex.append()when appending the same index (GH 57252)Performance improvement in
RangeIndex.argmin()andRangeIndex.argmax()(GH 57823)Performance improvement in
RangeIndex.insert()returning aRangeIndexinstead of aIndexwhen theRangeIndexis empty. (GH 57833)Performance improvement in
RangeIndex.round()returning aRangeIndexinstead of aIndexwhen possible. (GH 57824)Performance improvement in
RangeIndex.searchsorted()(GH 58376)Performance improvement in
RangeIndex.to_numpy()when specifying anna_value(GH 58376)Performance improvement in
RangeIndex.value_counts()(GH 58376)Performance improvement in
RangeIndex.join()returning aRangeIndexinstead of aIndexwhen possible. (GH 57651, GH 57752)Performance improvement in
RangeIndex.reindex()returning aRangeIndexinstead of aIndexwhen possible. (GH 57647, GH 57752)Performance improvement in
RangeIndex.take()returning aRangeIndexinstead of aIndexwhen possible. (GH 57445, GH 57752)Performance improvement in
merge()if hash-join can be used (GH 57970)Performance improvement in
merge()when join keys have different dtypes and need to be upcast (GH 62902)Performance improvement in
CategoricalDtype.update_dtype()whendtypeis aCategoricalDtypewith nonNonecategories and ordered (GH 59647)Performance improvement in
DataFrame.__getitem__()whenkeyis aDataFramewith many columns (GH 61010)Performance improvement in
DataFrame.astype()when converting to extension floating dtypes, e.g. “Float64” (GH 60066)Performance improvement in
DataFrame.stack()when usingfuture_stack=Trueand the DataFrame does not have aMultiIndex(GH 58391)Performance improvement in
DataFrame.where()whencondis aDataFramewith many columns (GH 61010)Performance improvement in
to_hdf()avoid unnecessary reopenings of the HDF5 file to speedup data addition to files with a very large number of groups . (GH 58248)Performance improvement in
DataFrameGroupBy.__len__andSeriesGroupBy.__len__(GH 57595)Performance improvement in indexing operations for string dtypes (GH 56997)
Performance improvement in unary methods on a
RangeIndexreturning aRangeIndexinstead of aIndexwhen possible. (GH 57825)
Bug fixes#
Categorical#
Bug in
Categoricalwhere constructing from a pandasSeriesorIndexwithdtype='object'did not preserve the categories’ dtype asobject; now thecategories.dtypeis preserved asobjectfor these cases, while numpy arrays and Python sequences withdtype='object'continue to infer the most specific dtype (for example,strif all elements are strings) (GH 61778)Bug in
Series.apply()wherenanwas ignored forCategoricalDtype(GH 59938)Bug in
testing.assert_index_equal()raisingTypeErrorinstead ofAssertionErrorfor incomparableCategoricalIndexwhencheck_categorical=Trueandexact=False(GH 61935)Bug in
Categorical.astype()wherecopy=Falsewould still trigger a copy of the codes (GH 62000)Bug in
DataFrame.pivot()andDataFrame.set_index()raising anArrowNotImplementedErrorfor columns with pyarrow dictionary dtype (GH 53051)Bug in
Series.convert_dtypes()withdtype_backend="pyarrow"where emptyCategoricalDtypeSeriesraised an error or got converted tonull[pyarrow](GH 59934)
Datetimelike#
Bug in
is_year_startwhere a DateTimeIndex constructed via a date_range with frequency ‘MS’ wouldn’t have the correct year or quarter start attributes (GH 57377)Bug in
DataFrameraisingValueErrorwhendtypeistimedelta64anddatais a list containingNone(GH 60064)Bug in
Timestampconstructor failing to raise whentz=Noneis explicitly specified in conjunction with timezone-awaretzinfoor data (GH 48688)Bug in
Timestampconstructor failing to raise when given anp.datetime64object with non-standard unit (GH 25611)Bug in
date_range()where the last valid timestamp would sometimes not be produced (GH 56134)Bug in
date_range()where using a negative frequency value would not include all points between the start and end values (GH 56147)Bug in
to_datetime()where passing anlxml.etree._ElementUnicodeResulttogether withformatraisedTypeError. Now subclasses ofstrare handled. (GH 60933)Bug in
tseries.api.guess_datetime_format()would fail to infer time format when “%Y” == “%H%M” (GH 57452)Bug in
tseries.frequencies.to_offset()would fail to parse frequency strings starting with “LWOM” (GH 59218)Bug in
DataFrame.fillna()raising anAssertionErrorinstead ofOutOfBoundsDatetimewhen filling adatetime64[ns]column with an out-of-bounds timestamp. Now correctly raisesOutOfBoundsDatetime. (GH 61208)Bug in
DataFrame.min()andDataFrame.max()castingdatetime64andtimedelta64columns tofloat64and losing precision (GH 60850)Bug in
Dataframe.agg()with df with missing values resulting in IndexError (GH 58810)Bug in
DateOffset.rollback()(and subclass methods) withnormalize=Truerolling back one offset too long (GH 32616)Bug in
DatetimeIndex.is_year_start()andDatetimeIndex.is_quarter_start()does not raise on Custom business days frequencies bigger then “1C” (GH 58664)Bug in
DatetimeIndex.is_year_start()andDatetimeIndex.is_quarter_start()returningFalseon double-digit frequencies (GH 58523)Bug in
DatetimeIndex.union()andDatetimeIndex.intersection()whenunitwas non-nanosecond (GH 59036)Bug in
DatetimeIndex.where()andTimedeltaIndex.where()failing to setfreq=Nonein some cases (GH 24555)Bug in
Index.union()with apyarrowtimestamp dtype incorrectly returningobjectdtype (GH 58421)Bug in
Series.dt.microsecond()producing incorrect results for pyarrow backedSeries. (GH 59154)Bug in
Timestamp.normalize()andDatetimeArray.normalize()returning incorrect results instead of raising on integer overflow for very small (distant past) values (GH 60583)Bug in
Timestamp.replace()failing to updateunitattribute when replacement introduces non-zeronanosecondormicrosecond(GH 57749)Bug in
to_datetime()not respecting dayfirst if an uncommon date string was passed. (GH 58859)Bug in
to_datetime()on float array with missing values throwingFloatingPointError(GH 58419)Bug in
to_datetime()on float32 df with year, month, day etc. columns leads to precision issues and incorrect result. (GH 60506)Bug in
to_datetime()reports incorrect index in case of any failure scenario. (GH 58298)Bug in
to_datetime()withformat="ISO8601"andutc=Truewhere naive timestamps incorrectly inherited timezone offset from previous timestamps in a series. (GH 61389)Bug in
to_datetime()wrongly converts whenargis anp.datetime64object with unit ofps. (GH 60341)Bug in comparison between objects with
np.datetime64dtype andtimestamp[pyarrow]dtypes incorrectly raisingTypeError(GH 60937)Bug in comparison between objects with pyarrow date dtype and
timestamp[pyarrow]ornp.datetime64dtype failing to consider these as non-comparable (GH 62157)Bug in constructing arrays with
ArrowDtypewithtimestamptype incorrectly allowingDecimal("NaN")(GH 61773)Bug in constructing arrays with a timezone-aware
ArrowDtypefrom timezone-naive datetime objects incorrectly treating those as UTC times instead of wall times likeDatetimeTZDtype(GH 61775)Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond
datetime64,timedelta64orDatetimeTZDtypeincorrectly truncating those scalars (GH 56410)
Timedelta#
Accuracy improvement in
Timedelta.to_pytimedelta()to round microseconds consistently for large nanosecond based Timedelta (GH 57841)Bug in
Timedeltaconstructor failing to raise when passed an invalid keyword (GH 53801)Bug in
DataFrame.cumsum()which was raisingIndexErrorif dtype istimedelta64[ns](GH 57956)Bug in multiplication operations with
timedelta64dtype failing to raiseTypeErrorwhen multiplying byboolobjects or dtypes (GH 58054)
Timezones#
Bug in
DatetimeIndex.union(),DatetimeIndex.intersection(), andDatetimeIndex.symmetric_difference()changing timezone to UTC when merging two DatetimeIndex objects with the same timezone but different units (GH 60080)Bug in
Series.dt.tz_localize()with a timezone-awareArrowDtypeincorrectly converting to UTC whentz=None(GH 61780)Fixed bug in
date_range()where tz-aware endpoints with calendar offsets (e.g."MS") failed on DST fall-back. These now respectambiguous/nonexistent. (GH 52908)
Numeric#
Bug in
api.types.infer_dtype()returning “mixed” for complex andpd.NAmix (GH 61976)Bug in
api.types.infer_dtype()returning “mixed-integer-float” for float andpd.NAmix (GH 61621)Bug in
DataFrame.combine_first()where Int64 and UInt64 integers with absolute value greater than2**53would lose precision after the operation. (GH 60128)Bug in
DataFrame.corr()where numerical precision errors resulted in correlations above1.0(GH 61120)Bug in
DataFrame.cov()raises aTypeErrorinstead of returning potentially incorrect results or other errors (GH 53115)Bug in
DataFrame.quantile()where the column type was not preserved whennumeric_only=Truewith a list-likeqproduced an empty result (GH 59035)Bug in
Series.dot()returningobjectdtype forArrowDtypeand nullable-dtype data (GH 61375)Bug in
Series.std()andSeries.var()when using complex-valued data (GH 61645)Bug in
np.matmulwithIndexinputs raising aTypeError(GH 57079)Bug in arithmetic operations between objects with numpy-nullable dtype and
ArrowDtypeincorrectly raising (GH 58602)
Conversion#
Bug in
DataFrame.astype()not castingvaluesfor Arrow-based dictionary dtype correctly (GH 58479)Bug in
DataFrame.update()bool dtype being converted to object (GH 55509)Bug in
Series.astype()might modify read-only array inplace when casting to a string dtype (GH 57212)Bug in
Series.convert_dtypes()andDataFrame.convert_dtypes()removing timezone information for objects withArrowDtype(GH 60237)Bug in
Series.reindex()not maintainingfloat32type when areindexintroduces a missing value (GH 45857)Bug in
to_datetime()andto_timedelta()with inputNonereturningNoneinstead ofNaT, inconsistent with other conversion methods (GH 23055)
Strings#
Bug in
Series.str.replace()raising an error on valid group references (\1,\2, etc.) on series converted to PyArrow backend dtype (GH 62653)Bug in
Series.str.zfill()raisingAttributeErrorforArrowDtype(GH 61485)Bug in
Series.value_counts()would not respectsort=Falsefor series havingstringdtype (GH 55224)Bug in multiplication with a
StringDtypeincorrectly allowing multiplying by bools; explicitly cast to integers instead (GH 62595)
Interval#
Index.is_monotonic_decreasing(),Index.is_monotonic_increasing(), andIndex.is_unique()could incorrectly beFalsefor anIndexcreated from a slice of anotherIndex. (GH 57911)Bug in
Index,Series,DataFrameconstructors when given a sequence ofIntervalsubclass objects casting them toInterval(GH 46945)Bug in
interval_range()where start and end numeric types were always cast to 64 bit (GH 57268)Bug in
IntervalIndex.get_indexer()andIntervalIndex.drop()when one of the sides of the index is non-unique (GH 52245)Construction of
IntervalArrayandIntervalIndexfrom arrays with mismatched signed/unsigned integer dtypes (e.g.,int64anduint64) now raises aTypeErrorinstead of proceeding silently. (GH 55715)
Indexing#
Bug in
DataFrame.__getitem__()returning modified columns when called withslicein Python 3.12 (GH 57500)Bug in
DataFrame.__getitem__()when slicing aDataFramewith many rows raised anOverflowError(GH 59531)Bug in
DataFrame.from_records()throwing aValueErrorwhen passed an empty list inindex(GH 58594)Bug in
DataFrame.loc()andDataFrame.iloc()returning incorrect dtype when selecting from aDataFramewith mixed data types. (GH 60600)Bug in
DataFrame.loc()with inconsistent behavior of loc-set with 2 given indexes to Series (GH 59933)Bug in
Index.equals()when comparing betweenSerieswith string dtypeIndex(GH 61099)Bug in
Index.get_indexer()and similar methods whenNaNis located at or after position 128 (GH 58924)Bug in
MultiIndex.insert()when a new value inserted to a datetime-like level gets cast toNaTand fails indexing (GH 60388)Bug in
Series.__setitem__()when assigning boolean series with boolean indexer will raiseLossySetitemError(GH 57338)Bug in printing
Index.namesandMultiIndex.levelswould not escape single quotes (GH 60190)Bug in reindexing of
DataFramewithPeriodDtypecolumns in case of consolidated block (GH 60980, GH 60273)Bug in
DataFrame.loc.__getitem__()andDataFrame.iloc.__getitem__()with aCategoricalDtypecolumn with integer categories raising when trying to index a row containing aNaNentry (GH 58954)Bug in
Index.__getitem__()incorrectly raising with a 0-dimnp.ndarraykey (GH 55601)Bug in
Index.get_indexer()not casting missing values correctly for new string datatype (GH 55833)Bug in adding new rows with
DataFrame.loc.__setitem__()orSeries.loc.__setitem__which failed to retain dtype on the object’s index in some cases (GH 41626)Bug in indexing on a
DatetimeIndexwith atimestamp[pyarrow]dtype or on aTimedeltaIndexwith aduration[pyarrow]dtype (GH 62277)
Missing#
Bug in
DataFrame.fillna()andSeries.fillna()that would ignore thelimitargument onExtensionArraydtypes (GH 58001)Bug in
NA.__and__(),NA.__or__()andNA.__xor__()when operating withnp.bool_objects (GH 58427)
MultiIndex#
DataFrame.loc()withaxis=0andMultiIndexwhen setting a value adds extra columns (GH 58116)DataFrame.melt()would not accept multiple names invar_namewhen the columns were aMultiIndex(GH 58033)MultiIndex.insert()would not insert NA value correctly at unified location of index -1 (GH 59003)MultiIndex.get_level_values()accessing aDatetimeIndexdoes not carry the frequency attribute along (GH 58327, GH 57949)Bug in
DataFramearithmetic operations in case of unaligned MultiIndex columns (GH 60498)Bug in
DataFramearithmetic operations withSeriesin case of unaligned MultiIndex (GH 61009)Bug in
MultiIndex.union()raising when indexes have duplicates with differing names (GH 62059)Bug in
MultiIndex.from_tuples()causing wrong output with input of type tuples having NaN values (GH 60695, GH 60988)Bug in
DataFrame.__setitem__()where column alignment logic would reindex the assigned value with an empty index, incorrectly setting all values toNaN.(GH 61841)Bug in
DataFrame.reindex()andSeries.reindex()where reindexingIndexto aMultiIndexwould incorrectly set all values toNaN.(GH 60923)
I/O#
Bug in
DataFrameandSeriesreprofcollections.abc.Mappingelements. (GH 57915)Fix bug in
on_bad_linescallable when returning too many fields: now emitsParserWarningand truncates extra fields regardless ofindex_col(GH 61837)Bug in
DataFrame.to_json()when"index"was a value in theDataFrame.columnandIndex.namewasNone. Now, this will fail with aValueError(GH 58925)Bug in
io.common.is_fsspec_url()not recognizing chained fsspec URLs (GH 48978)Bug in
DataFrame._repr_html_()which ignored the"display.float_format"option (GH 59876)Bug in
DataFrame.from_records()ignoringcolumnsandindexparameters whendatais an empty iterator andnrows=0. (GH 61140)Bug in
DataFrame.from_records()wherecolumnsparameter with numpy structured array was not reordering and filtering out the columns (GH 59717)Bug in
DataFrame.to_dict()raises unnecessaryUserWarningwhen columns are not unique andorient='tight'. (GH 58281)Bug in
DataFrame.to_excel()when writing emptyDataFramewithMultiIndexon both axes (GH 57696)Bug in
DataFrame.to_excel()where theMultiIndexindex with a period level was not a date (GH 60099)Bug in
DataFrame.to_stata()when exporting a column containing both long strings (Stata strL) andpd.NAvalues (GH 23633)Bug in
DataFrame.to_stata()when input encoded length and normal length are mismatched (GH 61583)Bug in
DataFrame.to_stata()when writingDataFrameandbyteorder=`big`. (GH 58969)Bug in
DataFrame.to_stata()when writing more than 32,000 value labels. (GH 60107)Bug in
DataFrame.to_string()that raisedStopIterationwith nested DataFrames. (GH 16098)Bug in
HDFStore.get()was failing to save data of dtype datetime64[s] correctly (GH 59004)Bug in
HDFStore.select()causing queries on categorical string columns to return unexpected results (GH 57608)Bug in
MultiIndex.factorize()incorrectly raising on length-0 indexes (GH 57517)Bug in
read_csv()causing segmentation fault whenencoding_errorsis not a string. (GH 59059)Bug in
read_csv()for thecandpythonengines where parsing numbers with large exponents caused overflows. Now, numbers with large positive exponents are parsed asinfor-infdepending on the sign of the mantissa, while those with large negative exponents are parsed as0.0(GH 62617, GH 38794, GH 62740)Bug in
read_csv()raisingTypeErrorwhenindex_colis specified andna_valuesis a dict containing the keyNone. (GH 57547)Bug in
read_csv()raisingTypeErrorwhennrowsanditeratorare specified without specifying achunksize. (GH 59079)Bug in
read_csv()where the order of thena_valuesmakes an inconsistency whenna_valuesis a list non-string values. (GH 59303)Bug in
read_csv()withcandpythonengines reading big integers as strings. Now reads them as python integers. (GH 51295)Bug in
read_csv()withengine="c"reading large float numbers with preceding integers as strings. Now reads them as floats. (GH 51295)Bug in
read_csv()withengine="pyarrow"anddtype="Int64"losing precision (GH 56136)Bug in
read_excel()raisingValueErrorwhen passing array of boolean values whendtype="boolean". (GH 58159)Bug in
read_html()whererowspanin header row causes incorrect conversion toDataFrame. (GH 60210)Bug in
read_json()ignoring the givendtypewhenengine="pyarrow"(GH 59516)Bug in
read_json()not validating thetypargument to not be exactly"frame"or"series"(GH 59124)Bug in
read_json()where extreme value integers in string format were incorrectly parsed as a different integer number (GH 20608)Bug in
read_stata()raisingKeyErrorwhen input file is stored in big-endian format and contains strL data. (GH 58638)Bug in
read_stata()where extreme value integers were incorrectly interpreted as missing for format versions 111 and prior (GH 58130)Bug in
read_stata()where the missing code for double was not recognised for format versions 105 and prior (GH 58149)Bug in
set_option()where setting the pandas optiondisplay.html.use_mathjaxtoFalsehas no effect (GH 59884)Bug in
to_csv()wherequotechar`is not escaped whenescapecharis not None (GH 61407)Bug in
to_excel()whereMultiIndexcolumns would be merged to a single row whenmerge_cells=Falseis passed (GH 60274)
Period#
Fixed error message when passing invalid period alias to
PeriodIndex.to_timestamp()(GH 58974)
Plotting#
Bug in
DataFrameGroupBy.boxplot()failed when there were multiple groupings (GH 14701)Bug in
DataFrame.plot.bar()whensubplotsandstacked=Trueare used in conjunction which causes incorrect stacking. (GH 61018)Bug in
DataFrame.plot.bar()withstacked=Truewhere labels on stacked bars with zero-height segments were incorrectly positioned at the base instead of the label position of the previous segment (GH 59429)Bug in
DataFrame.plot.line()raisingValueErrorwhen set both color and adictstyle (GH 59461)Bug in
DataFrame.plot()that causes a shift to the right when the frequency multiplier is greater than one. (GH 57587)Bug in
DataFrame.plot()wheretitlewould require extra titles when plotting more than one column per subplot. (GH 61019)Bug in
Series.plot()preventing a line and bar from being aligned on the same plot (GH 61161)Bug in
Series.plot()preventing a line and scatter plot from being aligned (GH 61005)Bug in
Series.plot()withkind="pie"withArrowDtype(GH 59192)
Groupby/resample/rolling#
Bug in
DataFrameGroupBy.__len__()andSeriesGroupBy.__len__()would raise when the grouping contained NA values anddropna=False(GH 58644)Bug in
DataFrameGroupBy.any()that returned True for groups where all Timedelta values are NaT. (GH 59712)Bug in
DataFrameGroupBy.groups()andSeriesGroupBy.groups()would fail when the groups wereCategoricalwith an NA value (GH 61356)Bug in
DataFrameGroupBy.groups()andSeriesGroupby.groups()that would not respect groupby argumentdropna(GH 55919)Bug in
DataFrameGroupBy.median()where nat values gave an incorrect result. (GH 57926)Bug in
DataFrameGroupBy.quantile()wheninterpolation="nearest"is inconsistent withDataFrame.quantile()(GH 47942)Bug in
DataFrameGroupBy()reductions where non-Boolean values were allowed for thenumeric_onlyargument; passing a non-Boolean value will now raise (GH 62778)Bug in
Resampler.interpolate()on aDataFramewith non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (GH 21351)Bug in
Series.rolling()when used with aBaseIndexersubclass and computing min/max (GH 46726)Bug in
DataFrame.ewm()andSeries.ewm()when passedtimesand aggregation functions other than mean (GH 51695)Bug in
DataFrame.resample()andSeries.resample()were not keeping the index name when the index hadArrowDtypetimestamp dtype (GH 61222)Bug in
DataFrame.resample()changing index type toMultiIndexwhen the dataframe is empty and using an upsample method (GH 55572)Bug in
DataFrameGroupBy.agg()andSeriesGroupBy.agg()that was returning numpy dtype values when input values are pyarrow dtype values, instead of returning pyarrow dtype values. (GH 53030)Bug in
DataFrameGroupBy.agg()that raisesAttributeErrorwhen there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (GH 55041)Bug in
DataFrameGroupBy.agg()where applying a user-defined function to an empty DataFrame returned a Series instead of an empty DataFrame. (GH 61503)Bug in
DataFrameGroupBy.apply()andSeriesGroupBy.apply()for empty data frame withgroup_keys=Falsestill creating output index using group keys. (GH 60471)Bug in
DataFrameGroupBy.apply()andSeriesGroupBy.apply()not preserving_metadataattributes from subclassed DataFrames and Series (GH 62134)Bug in
DataFrameGroupBy.apply()that was returning a completely empty DataFrame when all return values offuncwereNoneinstead of returning an empty DataFrame with the original columns and dtypes. (GH 57775)Bug in
DataFrameGroupBy.apply()withas_index=Falsethat was returningMultiIndexinstead of returningIndex. (GH 58291)Bug in
DataFrameGroupBy.cumsum()andDataFrameGroupBy.cumprod()wherenumeric_onlyparameter was passed indirectly through kwargs instead of passing directly. (GH 58811)Bug in
DataFrameGroupBy.cumsum()where it did not return the correct dtype when the label containedNone. (GH 58811)Bug in
DataFrameGroupby.transform()andSeriesGroupby.transform()with a reducer andobserved=Falsethat coerces dtype to float when there are unobserved categories. (GH 55326)Bug in
Rolling.apply()formethod="table"where column order was not being respected due to the columns getting sorted by default. (GH 59666)Bug in
Rolling.apply()where the applied function could be called on fewer thanmin_periodperiods ifmethod="table". (GH 58868)Bug in
Series.resample()could raise when the date range ended shortly before a non-existent time. (GH 58380)
Reshaping#
Bug in
concat()with mixed integer and bool dtypes incorrectly casting the bools to integers (GH 45101)Bug in
qcut()where values at the quantile boundaries could be incorrectly assigned (GH 59355)Bug in
DataFrame.combine_first()not preserving the column order (GH 60427)Bug in
DataFrame.explode()producing incorrect result forpyarrow.large_listtype (GH 61091)Bug in
DataFrame.join()inconsistently setting result index name (GH 55815)Bug in
DataFrame.join()when aDataFramewith aMultiIndexwould raise anAssertionErrorwhenMultiIndex.namescontainedNone. (GH 58721)Bug in
DataFrame.merge()where merging on a column containing onlyNaNvalues resulted in an out-of-bounds array access (GH 59421)Bug in
DataFrame.unstack()producing incorrect results whensort=False(GH 54987, GH 55516)Bug in
DataFrame.unstack()raising an error with indexes containingNaNwithsort=False(GH 61221)Bug in
DataFrame.merge()when merging twoDataFrameonintcoruintctypes on Windows (GH 60091, GH 58713)Bug in
DataFrame.pivot_table()incorrectly subaggregating results when called without anindexargument (GH 58722)Bug in
DataFrame.pivot_table()incorrectly ignoring thevaluesargument when also supplied to theindexorcolumnsparameters (GH 57876, GH 61292)Bug in
DataFrame.pivot_table()wheremargins=Truedid not correctly include groups withNaNvalues in the index or columns whendropna=Falsewas explicitly passed. (GH 61509)Bug in
DataFrame.stack()with the new implementation whereValueErroris raised whenlevel=[](GH 60740)Bug in
DataFrame.unstack()producing incorrect results when manipulating emptyDataFramewith anExtentionDtype(GH 59123)Bug in
concat()where concatenating DataFrame and Series withignore_index = Truedrops the series name (GH 60723, GH 56257)Bug in
melt()where calling with duplicate column names inid_varsraised a misleadingAttributeError(GH 61475)Bug in
DataFrame.merge()where user-provided suffixes could result in duplicate column names if the resulting names matched existing columns. Now raises aMergeErrorin such cases. (GH 61402)Bug in
DataFrame.merge()withCategoricalDtypecolumns incorrectly raisingRecursionError(GH 56376)Bug in
DataFrame.merge()with afloat32index incorrectly casting the index tofloat64(GH 41626)
Sparse#
Bug in
SparseDtypefor equal comparison with na fill value. (GH 54770)Bug in
DataFrame.sparse.from_spmatrix()which hard coded an invalidfill_valuefor certain subtypes. (GH 59063)Bug in
DataFrame.sparse.to_dense()which ignored subclassing and always returned an instance ofDataFrame(GH 59913)
ExtensionArray#
Bug in
Categoricalwhen constructing with anIndexwithArrowDtype(GH 60563)Bug in
arrays.ArrowExtensionArray.__setitem__()which caused wrong behavior when using an integer array with repeated values as a key (GH 58530)Bug in
ArrowExtensionArray.factorize()where NA values were dropped when input was dictionary-encoded even when dropna was set to False(GH 60567)Bug in
api.types.is_datetime64_any_dtype()where a customExtensionDtypewould returnFalsefor array-likes (GH 57055)Bug in comparison between object with
ArrowDtypeand incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-False(for==) or all-True(for!=) (GH 59505)Bug in constructing pandas data structures when passing into
dtypea string of the type followed by[pyarrow]while PyArrow is not installed would raiseNameErrorrather thanImportError(GH 57928)Bug in various
DataFramereductions for pyarrow temporal dtypes returning incorrect dtype when result was null (GH 59234)Fixed flex arithmetic with
ExtensionArrayoperands raising whenfill_valuewas passed. (GH 62467)
Styler#
Bug in
Styler.to_latex()where styling column headers when combined with a hidden index or hidden index-levels is fixed.
Other#
Bug in
DataFramewhen passing adictwith a NA scalar andcolumnsthat would always returnnp.nan(GH 57205)Bug in
Seriesignoring errors when trying to convertSeriesinput data to the givendtype(GH 60728)Bug in
eval()onExtensionArrayon including division/failed with aTypeError. (GH 58748)Bug in
eval()where method calls on binary operations like(x + y).dropna()would raiseAttributeError: 'BinOp' object has no attribute 'value'(GH 61175)Bug in
eval()where the names of theSerieswere not preserved when usingengine="numexpr". (GH 10239)Bug in
eval()withengine="numexpr"returning unexpected result for float division. (GH 59736)Bug in
to_numeric()raisingTypeErrorwhenargis aTimedeltaorTimestampscalar. (GH 59944)Bug in
unique()onIndexnot always returningIndex(GH 57043)Bug in
DataFrame.apply()raisingRecursionErrorwhen passingfunc=list[int]. (GH 61565)Bug in
DataFrame.apply()where passingengine="numba"ignoredargspassed to the applied function (GH 58712)Bug in
DataFrame.eval()andDataFrame.query()which caused an exception when using NumPy attributes via@notation, e.g.,df.eval("@np.floor(a)"). (GH 58041)Bug in
DataFrame.eval()andDataFrame.query()which did not allow to usetanfunction. (GH 55091)Bug in
DataFrame.query()where using duplicate column names led to aTypeError. (GH 59950)Bug in
DataFrame.query()which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character#, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (GH 59285) (GH 49633)Bug in
DataFrame.query()which raised an exception when querying integer column names using backticks. (GH 60494)Bug in
DataFrame.rename()andSeries.rename()when passed amapper,index, orcolumnsargument that is aSerieswith non-uniqueser.indexproducing a corrupted result instead of raisingValueError(GH 58621)Bug in
DataFrame.sample()withreplace=Falseand(n * max(weights) / sum(weights)) > 1, the method would return biased results. Now raisesValueError. (GH 61516)Bug in
DataFrame.shift()where passing afreqon a DataFrame with no columns did not shift the index correctly. (GH 60102)Bug in
DataFrame.sort_index()when passingaxis="columns"andignore_index=Trueandascending=Falsenot returning aRangeIndexcolumns (GH 57293)Bug in
DataFrame.sort_values()where sorting by a column explicitly namedNoneraised aKeyErrorinstead of sorting by the column as expected. (GH 61512)Bug in
DataFrame.transform()that was returning the wrong order unless the index was monotonically increasing. (GH 57069)Bug in
DataFrame.where()where using a non-bool type array in the function would return aValueErrorinstead of aTypeError(GH 56330)Bug in
Index.sort_values()when passing a key function that turns values into tuples, e.g.key=natsort.natsort_key, would raiseTypeError(GH 56081)Bug in
MultiIndex.fillna()error message was referring toisnainstead offillna(GH 60974)Bug in
Series.describe()where median percentile was always included when thepercentilesargument was passed (GH 60550).Bug in
Series.diff()allowing non-integer values for theperiodsargument. (GH 56607)Bug in
Series.dt()methods inArrowDtypethat were returning incorrect values. (GH 57355)Bug in
Series.isin()raisingTypeErrorwhen series is large (>10**6) andvaluescontains NA (GH 60678)Bug in
Series.kurt()andSeries.skew()resulting in zero for low variance arrays (GH 57972)Bug in
Series.map()with atimestamp[pyarrow]dtype orduration[pyarrow]dtype incorrectly returning all-NaNentries (GH 61231)Bug in
Series.mode()where an exception was raised when taking the mode with nullable types with no null values in the series. (GH 58926)Bug in
Series.rank()that doesn’t preserve missing values for nullable integers whenna_option='keep'. (GH 56976)Bug in
Series.replace()andDataFrame.replace()throwingValueErrorwhenregex=Trueand all NA values. (GH 60688)Bug in
Series.to_string()when series contains complex floats with exponents (GH 60405)Bug in
read_csv()where chained fsspec TAR file andcompression="infer"fails withtarfile.ReadError(GH 60028)Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers’ associated dtype, for string and datetime columns (GH 54781)
Bug in
Series.listmethods not preserving the originalIndex. (GH 58425)Bug in
Series.listmethods not preserving the original name. (GH 60522)Bug in
Series.replacewhen the Series was created from anIndexand Copy-On-Write is enabled (GH 61622)Bug in
divmodandrdivmodwithDataFrame,Series, andIndexwithbooldtypes failing to raise, which was inconsistent with__floordiv__behavior (GH 46043)Bug in printing a
DataFramewith aDataFramestored inDataFrame.attrsraised aValueError(GH 60455)Bug in printing a
Serieswith aDataFramestored inSeries.attrsraised aValueError(GH 60568)Deprecated the keyword
check_datetimelike_compatintesting.assert_frame_equal()andtesting.assert_series_equal()(GH 55638)Fixed bug in
Series.replace()andDataFrame.replace()when trying to replaceNAvalues in aFloat64Dtypeobject withnp.nan; this now works withpd.set_option("mode.nan_is_na", False)and is irrelevant otherwise (GH 55127)Fixed bug in
Series.replace()andDataFrame.replace()when trying to replacenp.nanvalues in aInt64Dtypeobject withNA; this is now a no-op withpd.set_option("mode.nan_is_na", False)and is irrelevant otherwise (GH 51237)Fixed bug in the
Series.rank()with object dtype and extremely small float values (GH 62036)Fixed bug where the
DataFrameconstructor misclassified array-like objects with a.nameattribute asSeriesorIndex(GH 61443)Fixed regression in
DataFrame.from_records()not initializing subclasses properly (GH 57008)