What’s new in 2.1.0 (Aug 30, 2023)#
These are the changes in pandas 2.1.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
PyArrow will become a required dependency with pandas 3.0#
PyArrow will become a required dependency of pandas starting with pandas 3.0. This decision was made based on PDEP 10.
This will enable more changes that are hugely beneficial to pandas users, including but not limited to:
- inferring strings as PyArrow backed strings by default enabling a significant reduction of the memory footprint and huge performance improvements. 
- inferring more complex dtypes with PyArrow by default, like - Decimal,- lists,- bytes,- structured dataand more.
- Better interoperability with other libraries that depend on Apache Arrow. 
We are collecting feedback on this decision here.
Avoid NumPy object dtype for strings by default#
Previously, all strings were stored in columns with NumPy object dtype by default.
This release introduces an option future.infer_string that infers all
strings as PyArrow backed strings with dtype "string[pyarrow_numpy]" instead.
This is a new string dtype implementation that follows NumPy semantics in comparison
operations and will return np.nan as the missing value indicator.
Setting the option will also infer the dtype "string" as a StringDtype with
storage set to "pyarrow_numpy", ignoring the value behind the option
mode.string_storage.
This option only works if PyArrow is installed. PyArrow backed strings have a significantly reduced memory footprint and provide a big performance improvement compared to NumPy object (GH 54430).
The option can be enabled with:
pd.options.future.infer_string = True
This behavior will become the default with pandas 3.0.
DataFrame reductions preserve extension dtypes#
In previous versions of pandas, the results of DataFrame reductions
(DataFrame.sum() DataFrame.mean() etc.) had NumPy dtypes, even when the DataFrames
were of extension dtypes. Pandas can now keep the dtypes when doing reductions over DataFrame
columns with a common dtype (GH 52788).
Old Behavior
In [1]: df = pd.DataFrame({"a": [1, 1, 2, 1], "b": [np.nan, 2.0, 3.0, 4.0]}, dtype="Int64")
In [2]: df.sum()
Out[2]:
a    5
b    9
dtype: int64
In [3]: df = df.astype("int64[pyarrow]")
In [4]: df.sum()
Out[4]:
a    5
b    9
dtype: int64
New Behavior
In [1]: df = pd.DataFrame({"a": [1, 1, 2, 1], "b": [np.nan, 2.0, 3.0, 4.0]}, dtype="Int64")
In [2]: df.sum()
Out[2]: 
a    5
b    9
dtype: Int64
In [3]: df = df.astype("int64[pyarrow]")
In [4]: df.sum()
Out[4]: 
a    5
b    9
dtype: int64[pyarrow]
Notice that the dtype is now a masked dtype and PyArrow dtype, respectively, while previously it was a NumPy integer dtype.
To allow DataFrame reductions to preserve extension dtypes, ExtensionArray._reduce() has gotten a new keyword parameter keepdims. Calling ExtensionArray._reduce() with keepdims=True should return an array of length 1 along the reduction axis. In order to maintain backward compatibility, the parameter is not required, but will it become required in the future. If the parameter is not found in the signature, DataFrame reductions can not preserve extension dtypes. Also, if the parameter is not found, a FutureWarning will be emitted and type checkers like mypy may complain about the signature not being compatible with ExtensionArray._reduce().
Copy-on-Write improvements#
- Series.transform()not respecting Copy-on-Write when- funcmodifies- Seriesinplace (GH 53747)
- Calling - Index.values()will now return a read-only NumPy array (GH 53704)
- Setting a - Seriesinto a- DataFramenow creates a lazy instead of a deep copy (GH 53142)
- The - DataFrameconstructor, when constructing a DataFrame from a dictionary of Index objects and specifying- copy=False, will now use a lazy copy of those Index objects for the columns of the DataFrame (GH 52947)
- A shallow copy of a Series or DataFrame ( - df.copy(deep=False)) will now also return a shallow copy of the rows/columns- Indexobjects instead of only a shallow copy of the data, i.e. the index of the result is no longer identical (- df.copy(deep=False).index is df.indexis no longer True) (GH 53721)
- DataFrame.head()and- DataFrame.tail()will now return deep copies (GH 54011)
- Add lazy copy mechanism to - DataFrame.eval()(GH 53746)
- Trying to operate inplace on a temporary column selection (for example, - df["a"].fillna(100, inplace=True)) will now always raise a warning when Copy-on-Write is enabled. In this mode, operating inplace like this will never work, since the selection behaves as a temporary copy. This holds true for:- DataFrame.update / Series.update 
- DataFrame.fillna / Series.fillna 
- DataFrame.replace / Series.replace 
- DataFrame.clip / Series.clip 
- DataFrame.where / Series.where 
- DataFrame.mask / Series.mask 
- DataFrame.interpolate / Series.interpolate 
- DataFrame.ffill / Series.ffill 
- DataFrame.bfill / Series.bfill 
 
New DataFrame.map() method and support for ExtensionArrays#
The DataFrame.map() been added and DataFrame.applymap() has been deprecated. DataFrame.map() has the same functionality as DataFrame.applymap(), but the new name better communicates that this is the DataFrame version of Series.map() (GH 52353).
When given a callable, Series.map() applies the callable to all elements of the Series.
Similarly, DataFrame.map() applies the callable to all elements of the DataFrame,
while Index.map() applies the callable to all elements of the Index.
Frequently, it is not desirable to apply the callable to nan-like values of the array and to avoid doing
that, the map method could be called with na_action="ignore", i.e. ser.map(func, na_action="ignore").
However, na_action="ignore" was not implemented for many ExtensionArray and Index types
and na_action="ignore" did not work correctly for any ExtensionArray subclass except the nullable numeric ones (i.e. with dtype Int64 etc.).
na_action="ignore" now works for all array types (GH 52219, GH 51645, GH 51809, GH 51936, GH 52033; GH 52096).
Previous behavior:
In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [2]: ser.map(str.upper, na_action="ignore")
NotImplementedError
In [3]: df = pd.DataFrame(ser)
In [4]: df.applymap(str.upper, na_action="ignore")  # worked for DataFrame
     0
0    A
1    B
2  NaN
In [5]: idx = pd.Index(ser)
In [6]: idx.map(str.upper, na_action="ignore")
TypeError: CategoricalIndex.map() got an unexpected keyword argument 'na_action'
New behavior:
In [5]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [6]: ser.map(str.upper, na_action="ignore")
Out[6]: 
0      A
1      B
2    NaN
dtype: category
Categories (2, object): ['A', 'B']
In [7]: df = pd.DataFrame(ser)
In [8]: df.map(str.upper, na_action="ignore")
Out[8]: 
     0
0    A
1    B
2  NaN
In [9]: idx = pd.Index(ser)
In [10]: idx.map(str.upper, na_action="ignore")
Out[10]: CategoricalIndex(['A', 'B', nan], categories=['A', 'B'], ordered=False, dtype='category')
Also, note that Categorical.map() implicitly has had its na_action set to "ignore" by default.
This has been deprecated and the default for Categorical.map() will change
to na_action=None, consistent with all the other array types.
New implementation of DataFrame.stack()#
pandas has reimplemented DataFrame.stack(). To use the new implementation, pass the argument future_stack=True. This will become the only option in pandas 3.0.
The previous implementation had two main behavioral downsides.
- The previous implementation would unnecessarily introduce NA values into the result. The user could have NA values automatically removed by passing - dropna=True(the default), but doing this could also remove NA values from the result that existed in the input. See the examples below.
- The previous implementation with - sort=True(the default) would sometimes sort part of the resulting index, and sometimes not. If the input’s columns are not a- MultiIndex, then the resulting index would never be sorted. If the columns are a- MultiIndex, then in most cases the level(s) in the resulting index that come from stacking the column level(s) would be sorted. In rare cases such level(s) would be sorted in a non-standard order, depending on how the columns were created.
The new implementation (future_stack=True) will no longer unnecessarily introduce NA values when stacking multiple levels and will never sort. As such, the arguments dropna and sort are not utilized and must remain unspecified when using future_stack=True. These arguments will be removed in the next major release.
In [11]: columns = pd.MultiIndex.from_tuples([("B", "d"), ("A", "c")])
In [12]: df = pd.DataFrame([[0, 2], [1, 3]], index=["z", "y"], columns=columns)
In [13]: df
Out[13]: 
   B  A
   d  c
z  0  2
y  1  3
In the previous version (future_stack=False), the default of dropna=True would remove unnecessarily introduced NA values but still coerce the dtype to float64 in the process. In the new version, no NAs are introduced and so there is no coercion of the dtype.
In [14]: df.stack([0, 1], future_stack=False, dropna=True)
Out[14]: 
z  A  c    2.0
   B  d    0.0
y  A  c    3.0
   B  d    1.0
dtype: float64
In [15]: df.stack([0, 1], future_stack=True)
Out[15]: 
z  B  d    0
   A  c    2
y  B  d    1
   A  c    3
dtype: int64
If the input contains NA values, the previous version would drop those as well with dropna=True or introduce new NA values with dropna=False. The new version persists all values from the input.
In [16]: df = pd.DataFrame([[0, 2], [np.nan, np.nan]], columns=columns)
In [17]: df
Out[17]: 
     B    A
     d    c
0  0.0  2.0
1  NaN  NaN
In [18]: df.stack([0, 1], future_stack=False, dropna=True)
Out[18]: 
0  A  c    2.0
   B  d    0.0
dtype: float64
In [19]: df.stack([0, 1], future_stack=False, dropna=False)
Out[19]: 
0  A  d    NaN
      c    2.0
   B  d    0.0
      c    NaN
1  A  d    NaN
      c    NaN
   B  d    NaN
      c    NaN
dtype: float64
In [20]: df.stack([0, 1], future_stack=True)
Out[20]: 
0  B  d    0.0
   A  c    2.0
1  B  d    NaN
   A  c    NaN
dtype: float64
Other enhancements#
- Series.ffill()and- Series.bfill()are now supported for objects with- IntervalDtype(GH 54247)
- Added - filtersparameter to- read_parquet()to filter out data, compatible with both- engines(GH 53212)
- Categorical.map()and- CategoricalIndex.map()now have a- na_actionparameter.- Categorical.map()implicitly had a default value of- "ignore"for- na_action. This has formally been deprecated and will be changed to- Nonein the future. Also notice that- Series.map()has default- na_action=Noneand calls to series with categorical data will now use- na_action=Noneunless explicitly set otherwise (GH 44279)
- api.extensions.ExtensionArraynow has a- map()method (GH 51809)
- DataFrame.applymap()now uses the- map()method of underlying- api.extensions.ExtensionArrayinstances (GH 52219)
- MultiIndex.sort_values()now supports- na_position(GH 51612)
- MultiIndex.sortlevel()and- Index.sortlevel()gained a new keyword- na_position(GH 51612)
- arrays.DatetimeArray.map(),- arrays.TimedeltaArray.map()and- arrays.PeriodArray.map()can now take a- na_actionargument (GH 51644)
- arrays.SparseArray.map()now supports- na_action(GH 52096).
- pandas.read_html()now supports the- storage_optionskeyword when used with a URL, allowing users to add headers to the outbound HTTP request (GH 49944)
- Add - Index.diff()and- Index.round()(GH 19708)
- Add - "latex-math"as an option to the- escapeargument of- Stylerwhich will not escape all characters between- "\("and- "\)"during formatting (GH 51903)
- Add dtype of categories to - reprinformation of- CategoricalDtype(GH 52179)
- Adding - engine_kwargsparameter to- read_excel()(GH 52214)
- Classes that are useful for type-hinting have been added to the public API in the new submodule - pandas.api.typing(GH 48577)
- Implemented - Series.dt.is_month_start,- Series.dt.is_month_end,- Series.dt.is_year_start,- Series.dt.is_year_end,- Series.dt.is_quarter_start,- Series.dt.is_quarter_end,- Series.dt.days_in_month,- Series.dt.unit,- Series.dt.normalize,- Series.dt.day_name(),- Series.dt.month_name(),- Series.dt.tz_convert()for- ArrowDtypewith- pyarrow.timestamp(GH 52388, GH 51718)
- DataFrameGroupBy.agg()and- DataFrameGroupBy.transform()now support grouping by multiple keys when the index is not a- MultiIndexfor- engine="numba"(GH 53486)
- SeriesGroupBy.agg()and- DataFrameGroupBy.agg()now support passing in multiple functions for- engine="numba"(GH 53486)
- SeriesGroupBy.transform()and- DataFrameGroupBy.transform()now support passing in a string as the function for- engine="numba"(GH 53579)
- DataFrame.stack()gained the- sortkeyword to dictate whether the resulting- MultiIndexlevels are sorted (GH 15105)
- DataFrame.unstack()gained the- sortkeyword to dictate whether the resulting- MultiIndexlevels are sorted (GH 15105)
- Series.explode()now supports PyArrow-backed list types (GH 53602)
- Series.str.join()now supports- ArrowDtype(pa.string())(GH 53646)
- Add - validateparameter to- Categorical.from_codes()(GH 50975)
- Added - ExtensionArray.interpolate()used by- Series.interpolate()and- DataFrame.interpolate()(GH 53659)
- Added - engine_kwargsparameter to- DataFrame.to_excel()(GH 53220)
- Implemented - api.interchange.from_dataframe()for- DatetimeTZDtype(GH 54239)
- Implemented - __from_arrow__on- DatetimeTZDtype(GH 52201)
- Implemented - __pandas_priority__to allow custom types to take precedence over- DataFrame,- Series,- Index, or- ExtensionArrayfor arithmetic operations, see the developer guide (GH 48347)
- Improve error message when having incompatible columns using - DataFrame.merge()(GH 51861)
- Improve error message when setting - DataFramewith wrong number of columns through- DataFrame.isetitem()(GH 51701)
- Improved error handling when using - DataFrame.to_json()with incompatible- indexand- orientarguments (GH 52143)
- Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns (GH 52084) 
- Improved error message when providing an invalid - indexor- offsetargument to- VariableOffsetWindowIndexer(GH 54379)
- Let - DataFrame.to_feather()accept a non-default- Indexand non-string column names (GH 51787)
- Added a new parameter - by_rowto- Series.apply()and- DataFrame.apply(). When set to- Falsethe supplied callables will always operate on the whole Series or DataFrame (GH 53400, GH 53601).
- DataFrame.shift()and- Series.shift()now allow shifting by multiple periods by supplying a list of periods (GH 44424)
- Groupby aggregations with - numba(such as- DataFrameGroupBy.sum()) now can preserve the dtype of the input instead of casting to- float64(GH 44952)
- Improved error message when - DataFrameGroupBy.agg()failed (GH 52930)
- Many read/to_* functions, such as - DataFrame.to_pickle()and- read_csv(), support forwarding compression arguments to- lzma.LZMAFile(GH 52979)
- Reductions - Series.argmax(),- Series.argmin(),- Series.idxmax(),- Series.idxmin(),- Index.argmax(),- Index.argmin(),- DataFrame.idxmax(),- DataFrame.idxmin()are now supported for object-dtype (GH 4279, GH 18021, GH 40685, GH 43697)
- DataFrame.to_parquet()and- read_parquet()will now write and read- attrsrespectively (GH 54346)
- Index.all()and- Index.any()with floating dtypes and timedelta64 dtypes no longer raise- TypeError, matching the- Series.all()and- Series.any()behavior (GH 54566)
- Series.cummax(),- Series.cummin()and- Series.cumprod()are now supported for pyarrow dtypes with pyarrow version 13.0 and above (GH 52085)
- Added support for the DataFrame Consortium Standard (GH 54383) 
- Performance improvement in - DataFrameGroupBy.quantile()and- SeriesGroupBy.quantile()(GH 51722)
- PyArrow-backed integer dtypes now support bitwise operations (GH 54495) 
Backwards incompatible API changes#
Increased minimum version for Python#
pandas 2.1.0 supports Python 3.9 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
| Package | Minimum Version | Required | Changed | 
|---|---|---|---|
| numpy | 1.22.4 | X | X | 
| mypy (dev) | 1.4.1 | X | |
| beautifulsoup4 | 4.11.1 | X | |
| bottleneck | 1.3.4 | X | |
| dataframe-api-compat | 0.1.7 | X | |
| fastparquet | 0.8.1 | X | |
| fsspec | 2022.05.0 | X | |
| hypothesis | 6.46.1 | X | |
| gcsfs | 2022.05.0 | X | |
| jinja2 | 3.1.2 | X | |
| lxml | 4.8.0 | X | |
| numba | 0.55.2 | X | |
| numexpr | 2.8.0 | X | |
| openpyxl | 3.0.10 | X | |
| pandas-gbq | 0.17.5 | X | |
| psycopg2 | 2.9.3 | X | |
| pyreadstat | 1.1.5 | X | |
| pyqt5 | 5.15.6 | X | |
| pytables | 3.7.0 | X | |
| pytest | 7.3.2 | X | |
| python-snappy | 0.6.1 | X | |
| pyxlsb | 1.0.9 | X | |
| s3fs | 2022.05.0 | X | |
| scipy | 1.8.1 | X | |
| sqlalchemy | 1.4.36 | X | |
| tabulate | 0.8.10 | X | |
| xarray | 2022.03.0 | X | |
| xlsxwriter | 3.0.3 | X | |
| zstandard | 0.17.0 | X | 
For optional libraries the general recommendation is to use the latest version.
See Dependencies and Optional dependencies for more.
Other API changes#
- arrays.PandasArrayhas been renamed- NumpyExtensionArrayand the attached dtype name changed from- PandasDtypeto- NumpyEADtype; importing- PandasArraystill works until the next major version (GH 53694)
Deprecations#
Deprecated silent upcasting in setitem-like Series operations#
PDEP-6: https://pandas.pydata.org/pdeps/0006-ban-upcasting.html
Setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning. Examples of affected operations are:
- ser.fillna('foo', inplace=True)
- ser.where(ser.isna(), 'foo', inplace=True)
- ser.iloc[indexer] = 'foo'
- ser.loc[indexer] = 'foo'
- df.iloc[indexer, 0] = 'foo'
- df.loc[indexer, 'a'] = 'foo'
- ser[indexer] = 'foo'
where ser is a Series, df is a DataFrame, and indexer
could be a slice, a mask, a single value, a list or array of values, or any other
allowed indexer.
In a future version, these will raise an error and you should cast to a common dtype first.
Previous behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64
In [3]: ser[0] = 'not an int64'
In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object
New behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64
In [3]: ser[0] = 'not an int64'
FutureWarning:
  Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
  Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object
To retain the current behaviour, in the case above you could cast ser to object dtype first:
In [21]: ser = pd.Series([1, 2, 3])
In [22]: ser = ser.astype('object')
In [23]: ser[0] = 'not an int64'
In [24]: ser
Out[24]: 
0    not an int64
1               2
2               3
dtype: object
Depending on the use-case, it might be more appropriate to cast to a different dtype.
In the following, for example, we cast to float64:
In [25]: ser = pd.Series([1, 2, 3])
In [26]: ser = ser.astype('float64')
In [27]: ser[0] = 1.1
In [28]: ser
Out[28]: 
0    1.1
1    2.0
2    3.0
dtype: float64
For further reading, please see https://pandas.pydata.org/pdeps/0006-ban-upcasting.html.
Deprecated parsing datetimes with mixed time zones#
Parsing datetimes with mixed time zones is deprecated and shows a warning unless user passes utc=True to to_datetime() (GH 50887)
Previous behavior:
In [7]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
In [8]:  pd.to_datetime(data, utc=False)
Out[8]:
Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
New behavior:
In [9]: pd.to_datetime(data, utc=False)
FutureWarning:
  In a future version of pandas, parsing datetimes with mixed time zones will raise
  a warning unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour
  and silence this warning. To create a `Series` with mixed offsets and `object` dtype,
  please use `apply` and `datetime.datetime.strptime`.
Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
In order to silence this warning and avoid an error in a future version of pandas,
please specify utc=True:
In [29]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
In [30]: pd.to_datetime(data, utc=True)
Out[30]: DatetimeIndex(['2019-12-31 18:00:00+00:00', '2019-12-31 23:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
To create a Series with mixed offsets and object dtype, please use apply
and datetime.datetime.strptime:
In [31]: import datetime as dt
In [32]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
In [33]: pd.Series(data).apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z'))
Out[33]: 
0    2020-01-01 00:00:00+06:00
1    2020-01-01 00:00:00+01:00
dtype: object
Other Deprecations#
- Deprecated - DataFrameGroupBy.dtypes, check- dtypeson the underlying object instead (GH 51045)
- Deprecated - DataFrame._dataand- Series._data, use public APIs instead (GH 33333)
- Deprecated - concat()behavior when any of the objects being concatenated have length 0; in the past the dtypes of empty objects were ignored when determining the resulting dtype, in a future version they will not (GH 39122)
- Deprecated - Categorical.to_list(), use- obj.tolist()instead (GH 51254)
- Deprecated - DataFrameGroupBy.all()and- DataFrameGroupBy.any()with datetime64 or- PeriodDtypevalues, matching the- Seriesand- DataFramedeprecations (GH 34479)
- Deprecated - axis=1in- DataFrame.ewm(),- DataFrame.rolling(),- DataFrame.expanding(), transpose before calling the method instead (GH 51778)
- Deprecated - axis=1in- DataFrame.groupby()and in- Grouperconstructor, do- frame.T.groupby(...)instead (GH 51203)
- Deprecated - broadcast_axiskeyword in- Series.align()and- DataFrame.align(), upcast before calling- alignwith- left = DataFrame({col: left for col in right.columns}, index=right.index)(GH 51856)
- Deprecated - downcastkeyword in- Index.fillna()(GH 53956)
- Deprecated - fill_methodand- limitkeywords in- DataFrame.pct_change(),- Series.pct_change(),- DataFrameGroupBy.pct_change(), and- SeriesGroupBy.pct_change(), explicitly call e.g.- DataFrame.ffill()or- DataFrame.bfill()before calling- pct_changeinstead (GH 53491)
- Deprecated - method,- limit, and- fill_axiskeywords in- DataFrame.align()and- Series.align(), explicitly call- DataFrame.fillna()or- Series.fillna()on the alignment results instead (GH 51856)
- Deprecated - quantilekeyword in- Rolling.quantile()and- Expanding.quantile(), renamed to- qinstead (GH 52550)
- Deprecated accepting slices in - DataFrame.take(), call- obj[slicer]or pass a sequence of integers instead (GH 51539)
- Deprecated behavior of - DataFrame.idxmax(),- DataFrame.idxmin(),- Series.idxmax(),- Series.idxmin()in with all-NA entries or any-NA and- skipna=False; in a future version these will raise- ValueError(GH 51276)
- Deprecated explicit support for subclassing - Index(GH 45289)
- Deprecated making functions given to - Series.agg()attempt to operate on each element in the- Seriesand only operate on the whole- Seriesif the elementwise operations failed. In the future, functions given to- Series.agg()will always operate on the whole- Seriesonly. To keep the current behavior, use- Series.transform()instead (GH 53325)
- Deprecated making the functions in a list of functions given to - DataFrame.agg()attempt to operate on each element in the- DataFrameand only operate on the columns of the- DataFrameif the elementwise operations failed. To keep the current behavior, use- DataFrame.transform()instead (GH 53325)
- Deprecated passing a - DataFrameto- DataFrame.from_records(), use- DataFrame.set_index()or- DataFrame.drop()instead (GH 51353)
- Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (GH 18702) 
- Deprecated the - axiskeyword in- DataFrame.ewm(),- Series.ewm(),- DataFrame.rolling(),- Series.rolling(),- DataFrame.expanding(),- Series.expanding()(GH 51778)
- Deprecated the - axiskeyword in- DataFrame.resample(),- Series.resample()(GH 51778)
- Deprecated the - downcastkeyword in- Series.interpolate(),- DataFrame.interpolate(),- Series.fillna(),- DataFrame.fillna(),- Series.ffill(),- DataFrame.ffill(),- Series.bfill(),- DataFrame.bfill()(GH 40988)
- Deprecated the behavior of - concat()with both- len(keys) != len(objs), in a future version this will raise instead of truncating to the shorter of the two sequences (GH 43485)
- Deprecated the behavior of - Series.argsort()in the presence of NA values; in a future version these will be sorted at the end instead of giving -1 (GH 54219)
- Deprecated the default of - observed=Falsein- DataFrame.groupby()and- Series.groupby(); this will default to- Truein a future version (GH 43999)
- Deprecating pinning - group.nameto each group in- SeriesGroupBy.aggregate()aggregations; if your operation requires utilizing the groupby keys, iterate over the groupby object instead (GH 41090)
- Deprecated the - axiskeyword in- DataFrameGroupBy.idxmax(),- DataFrameGroupBy.idxmin(),- DataFrameGroupBy.fillna(),- DataFrameGroupBy.take(),- DataFrameGroupBy.skew(),- DataFrameGroupBy.rank(),- DataFrameGroupBy.cumprod(),- DataFrameGroupBy.cumsum(),- DataFrameGroupBy.cummax(),- DataFrameGroupBy.cummin(),- DataFrameGroupBy.pct_change(),- DataFrameGroupBy.diff(),- DataFrameGroupBy.shift(), and- DataFrameGroupBy.corrwith(); for- axis=1operate on the underlying- DataFrameinstead (GH 50405, GH 51046)
- Deprecated - DataFrameGroupBywith- as_index=Falsenot including groupings in the result when they are not columns of the DataFrame (GH 49519)
- Deprecated - is_categorical_dtype(), use- isinstance(obj.dtype, pd.CategoricalDtype)instead (GH 52527)
- Deprecated - is_datetime64tz_dtype(), check- isinstance(dtype, pd.DatetimeTZDtype)instead (GH 52607)
- Deprecated - is_int64_dtype(), check- dtype == np.dtype(np.int64)instead (GH 52564)
- Deprecated - is_interval_dtype(), check- isinstance(dtype, pd.IntervalDtype)instead (GH 52607)
- Deprecated - is_period_dtype(), check- isinstance(dtype, pd.PeriodDtype)instead (GH 52642)
- Deprecated - is_sparse(), check- isinstance(dtype, pd.SparseDtype)instead (GH 52642)
- Deprecated - Styler.applymap_index(). Use the new- Styler.map_index()method instead (GH 52708)
- Deprecated - Styler.applymap(). Use the new- Styler.map()method instead (GH 52708)
- Deprecated - DataFrame.applymap(). Use the new- DataFrame.map()method instead (GH 52353)
- Deprecated - DataFrame.swapaxes()and- Series.swapaxes(), use- DataFrame.transpose()or- Series.transpose()instead (GH 51946)
- Deprecated - freqparameter in- PeriodArrayconstructor, pass- dtypeinstead (GH 52462)
- Deprecated allowing non-standard inputs in - take(), pass either a- numpy.ndarray,- ExtensionArray,- Index, or- Series(GH 52981)
- Deprecated allowing non-standard sequences for - isin(),- value_counts(),- unique(),- factorize(), case to one of- numpy.ndarray,- Index,- ExtensionArray, or- Seriesbefore calling (GH 52986)
- Deprecated behavior of - DataFramereductions- sum,- prod,- std,- var,- semwith- axis=None, in a future version this will operate over both axes returning a scalar instead of behaving like- axis=0; note this also affects numpy functions e.g.- np.sum(df)(GH 21597)
- Deprecated behavior of - concat()when- DataFramehas columns that are all-NA, in a future version these will not be discarded when determining the resulting dtype (GH 40893)
- Deprecated behavior of - Series.dt.to_pydatetime(), in a future version this will return a- Seriescontaining python- datetimeobjects instead of an- ndarrayof datetimes; this matches the behavior of other- Series.dtproperties (GH 20306)
- Deprecated logical operations ( - |,- &,- ^) between pandas objects and dtype-less sequences (e.g.- list,- tuple), wrap a sequence in a- Seriesor NumPy array before operating instead (GH 51521)
- Deprecated parameter - convert_typein- Series.apply()(GH 52140)
- Deprecated passing a dictionary to - SeriesGroupBy.agg(); pass a list of aggregations instead (GH 50684)
- Deprecated the - fastpathkeyword in- Categoricalconstructor, use- Categorical.from_codes()instead (GH 20110)
- Deprecated the behavior of - is_bool_dtype()returning- Truefor object-dtype- Indexof bool objects (GH 52680)
- Deprecated the methods - Series.bool()and- DataFrame.bool()(GH 51749)
- Deprecated unused - closedand- normalizekeywords in the- DatetimeIndexconstructor (GH 52628)
- Deprecated unused - closedkeyword in the- TimedeltaIndexconstructor (GH 52628)
- Deprecated logical operation between two non boolean - Serieswith different indexes always coercing the result to bool dtype. In a future version, this will maintain the return type of the inputs (GH 52500, GH 52538)
- Deprecated - Periodand- PeriodDtypewith- BDayfreq, use a- DatetimeIndexwith- BDayfreq instead (GH 53446)
- Deprecated - value_counts(), use- pd.Series(obj).value_counts()instead (GH 47862)
- Deprecated - Series.first()and- DataFrame.first(); create a mask and filter using- .locinstead (GH 45908)
- Deprecated - Series.interpolate()and- DataFrame.interpolate()for object-dtype (GH 53631)
- Deprecated - Series.last()and- DataFrame.last(); create a mask and filter using- .locinstead (GH 53692)
- Deprecated allowing arbitrary - fill_valuein- SparseDtype, in a future version the- fill_valuewill need to be compatible with the- dtype.subtype, either a scalar that can be held by that subtype or- NaNfor integer or bool subtypes (GH 23124)
- Deprecated allowing bool dtype in - DataFrameGroupBy.quantile()and- SeriesGroupBy.quantile(), consistent with the- Series.quantile()and- DataFrame.quantile()behavior (GH 51424)
- Deprecated behavior of - testing.assert_series_equal()and- testing.assert_frame_equal()considering NA-like values (e.g.- NaNvs- Noneas equivalent) (GH 52081)
- Deprecated bytes input to - read_excel(). To read a file path, use a string or path-like object (GH 53767)
- Deprecated constructing - SparseArrayfrom scalar data, pass a sequence instead (GH 53039)
- Deprecated falling back to filling when - valueis not specified in- DataFrame.replace()and- Series.replace()with non-dict-like- to_replace(GH 33302)
- Deprecated literal json input to - read_json(). Wrap literal json string input in- io.StringIOinstead (GH 53409)
- Deprecated literal string input to - read_xml(). Wrap literal string/bytes input in- io.StringIO/- io.BytesIOinstead (GH 53767)
- Deprecated literal string/bytes input to - read_html(). Wrap literal string/bytes input in- io.StringIO/- io.BytesIOinstead (GH 53767)
- Deprecated option - mode.use_inf_as_na, convert inf entries to- NaNbefore instead (GH 51684)
- Deprecated parameter - objin- DataFrameGroupBy.get_group()(GH 53545)
- Deprecated positional indexing on - Serieswith- Series.__getitem__()and- Series.__setitem__(), in a future version- ser[item]will always interpret- itemas a label, not a position (GH 50617)
- Deprecated replacing builtin and NumPy functions in - .agg,- .apply, and- .transform; use the corresponding string alias (e.g.- "sum"for- sumor- np.sum) instead (GH 53425)
- Deprecated strings - T,- t,- Land- ldenoting units in- to_timedelta()(GH 52536)
- Deprecated the “method” and “limit” keywords in - .ExtensionArray.fillna, implement- _pad_or_backfillinstead (GH 53621)
- Deprecated the - methodand- limitkeywords in- DataFrame.replace()and- Series.replace()(GH 33302)
- Deprecated the - methodand- limitkeywords on- Series.fillna(),- DataFrame.fillna(),- SeriesGroupBy.fillna(),- DataFrameGroupBy.fillna(), and- Resampler.fillna(), use- obj.bfill()or- obj.ffill()instead (GH 53394)
- Deprecated the behavior of - Series.__getitem__(),- Series.__setitem__(),- DataFrame.__getitem__(),- DataFrame.__setitem__()with an integer slice on objects with a floating-dtype index, in a future version this will be treated as positional indexing (GH 49612)
- Deprecated the use of non-supported datetime64 and timedelta64 resolutions with - pandas.array(). Supported resolutions are: “s”, “ms”, “us”, “ns” resolutions (GH 53058)
- Deprecated values - "pad",- "ffill",- "bfill",- "backfill"for- Series.interpolate()and- DataFrame.interpolate(), use- obj.ffill()or- obj.bfill()instead (GH 53581)
- Deprecated the behavior of - Index.argmax(),- Index.argmin(),- Series.argmax(),- Series.argmin()with either all-NAs and- skipna=Trueor any-NAs and- skipna=Falsereturning -1; in a future version this will raise- ValueError(GH 33941, GH 33942)
- Deprecated allowing non-keyword arguments in - DataFrame.to_sql()except- nameand- con(GH 54229)
- Deprecated silently ignoring - fill_valuewhen passing both- freqand- fill_valueto- DataFrame.shift(),- Series.shift()and- DataFrameGroupBy.shift(); in a future version this will raise- ValueError(GH 53832)
Performance improvements#
- Performance improvement in - concat()with homogeneous- np.float64or- np.float32dtypes (GH 52685)
- Performance improvement in - factorize()for object columns not containing strings (GH 51921)
- Performance improvement in - read_orc()when reading a remote URI file path (GH 51609)
- Performance improvement in - read_parquet()and- DataFrame.to_parquet()when reading a remote file with- engine="pyarrow"(GH 51609)
- Performance improvement in - read_parquet()on string columns when using- use_nullable_dtypes=True(GH 47345)
- Performance improvement in - DataFrame.clip()and- Series.clip()(GH 51472)
- Performance improvement in - DataFrame.filter()when- itemsis given (GH 52941)
- Performance improvement in - DataFrame.first_valid_index()and- DataFrame.last_valid_index()for extension array dtypes (GH 51549)
- Performance improvement in - DataFrame.where()when- condis backed by an extension dtype (GH 51574)
- Performance improvement in - MultiIndex.set_levels()and- MultiIndex.set_codes()when- verify_integrity=True(GH 51873)
- Performance improvement in - MultiIndex.sortlevel()when- ascendingis a list (GH 51612)
- Performance improvement in - Series.combine_first()(GH 51777)
- Performance improvement in - fillna()when array does not contain nulls (GH 51635)
- Performance improvement in - isna()when array has zero nulls or is all nulls (GH 51630)
- Performance improvement when parsing strings to - boolean[pyarrow]dtype (GH 51730)
- Performance improvement when searching an - Indexsliced from other indexes (GH 51738)
- Period’s default formatter (- period_format) is now significantly (~twice) faster. This improves performance of- str(Period),- repr(Period), and- Period.strftime(fmt=None)(), as well as- .PeriodArray.strftime(fmt=None),- .PeriodIndex.strftime(fmt=None)and- .PeriodIndex.format(fmt=None).- to_csvoperations involving- PeriodArrayor- PeriodIndexwith default- date_formatare also significantly accelerated (GH 51459)
- Performance improvement accessing - arrays.IntegerArrays.dtype&- arrays.FloatingArray.dtype(GH 52998)
- Performance improvement for - DataFrameGroupBy/- SeriesGroupByaggregations (e.g.- DataFrameGroupBy.sum()) with- engine="numba"(GH 53731)
- Performance improvement in - DataFramereductions with- axis=1and extension dtypes (GH 54341)
- Performance improvement in - DataFramereductions with- axis=Noneand extension dtypes (GH 54308)
- Performance improvement in - MultiIndexand multi-column operations (e.g.- DataFrame.sort_values(),- DataFrame.groupby(),- Series.unstack()) when index/column values are already sorted (GH 53806)
- Performance improvement in - concat()when- axis=1and objects have different indexes (GH 52541)
- Performance improvement in - concat()when the concatenation axis is a- MultiIndex(GH 53574)
- Performance improvement in - merge()for PyArrow backed strings (GH 54443)
- Performance improvement in - read_csv()with- engine="c"(GH 52632)
- Performance improvement in - ArrowExtensionArray.to_numpy()(GH 52525)
- Performance improvement in - DataFrameGroupBy.groups()(GH 53088)
- Performance improvement in - DataFrame.astype()when- dtypeis an extension dtype (GH 54299)
- Performance improvement in - DataFrame.iloc()when input is an single integer and dataframe is backed by extension dtypes (GH 54508)
- Performance improvement in - DataFrame.isin()for extension dtypes (GH 53514)
- Performance improvement in - DataFrame.loc()when selecting rows and columns (GH 53014)
- Performance improvement in - DataFrame.transpose()when transposing a DataFrame with a single PyArrow dtype (GH 54224)
- Performance improvement in - DataFrame.transpose()when transposing a DataFrame with a single masked dtype, e.g.- Int64(GH 52836)
- Performance improvement in - Series.add()for PyArrow string and binary dtypes (GH 53150)
- Performance improvement in - Series.corr()and- Series.cov()for extension dtypes (GH 52502)
- Performance improvement in - Series.drop_duplicates()for- ArrowDtype(GH 54667).
- Performance improvement in - Series.ffill(),- Series.bfill(),- DataFrame.ffill(),- DataFrame.bfill()with PyArrow dtypes (GH 53950)
- Performance improvement in - Series.str.get_dummies()for PyArrow-backed strings (GH 53655)
- Performance improvement in - Series.str.get()for PyArrow-backed strings (GH 53152)
- Performance improvement in - Series.str.split()with- expand=Truefor PyArrow-backed strings (GH 53585)
- Performance improvement in - Series.to_numpy()when dtype is a NumPy float dtype and- na_valueis- np.nan(GH 52430)
- Performance improvement in - astype()when converting from a PyArrow timestamp or duration dtype to NumPy (GH 53326)
- Performance improvement in various - MultiIndexset and indexing operations (GH 53955)
- Performance improvement when doing various reshaping operations on - arrays.IntegerArray&- arrays.FloatingArrayby avoiding doing unnecessary validation (GH 53013)
- Performance improvement when indexing with PyArrow timestamp and duration dtypes (GH 53368) 
- Performance improvement when passing an array to - RangeIndex.take(),- DataFrame.loc(), or- DataFrame.iloc()and the DataFrame is using a RangeIndex (GH 53387)
Bug fixes#
Categorical#
- Bug in - CategoricalIndex.remove_categories()where ordered categories would not be maintained (GH 53935).
- Bug in - Series.astype()with- dtype="category"for nullable arrays with read-only null value masks (GH 53658)
- Bug in - Series.map(), where the value of the- na_actionparameter was not used if the series held a- Categorical(GH 22527).
Datetimelike#
- DatetimeIndex.map()with- na_action="ignore"now works as expected (GH 51644)
- DatetimeIndex.slice_indexer()now raises- KeyErrorfor non-monotonic indexes if either of the slice bounds is not in the index; this behaviour was previously deprecated but inconsistently handled (GH 53983)
- Bug in - DateOffsetwhich had inconsistent behavior when multiplying a- DateOffsetobject by a constant (GH 47953)
- Bug in - date_range()when- freqwas a- DateOffsetwith- nanoseconds(GH 46877)
- Bug in - to_datetime()converting- Seriesor- DataFramecontaining- arrays.ArrowExtensionArrayof PyArrow timestamps to numpy datetimes (GH 52545)
- Bug in - DatetimeArray.map()and- DatetimeIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)
- Bug in - DataFrame.to_sql()raising- ValueErrorfor PyArrow-backed date like dtypes (GH 53854)
- Bug in - Timestamp.date(),- Timestamp.isocalendar(),- Timestamp.timetuple(), and- Timestamp.toordinal()were returning incorrect results for inputs outside those supported by the Python standard library’s datetime module (GH 53668)
- Bug in - Timestamp.round()with values close to the implementation bounds returning incorrect results instead of raising- OutOfBoundsDatetime(GH 51494)
- Bug in constructing a - Seriesor- DataFramefrom a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (GH 52212)
- Bug in constructing a - Timestampfrom a string representing a time without a date inferring an incorrect unit (GH 54097)
- Bug in constructing a - Timestampwith- ts_input=pd.NAraising- TypeError(GH 45481)
- Bug in parsing datetime strings with weekday but no day e.g. “2023 Sept Thu” incorrectly raising - AttributeErrorinstead of- ValueError(GH 52659)
- Bug in the repr for - Serieswhen dtype is a timezone aware datetime with non-nanosecond resolution raising- OutOfBoundsDatetime(GH 54623)
Timedelta#
- Bug in - TimedeltaIndexdivision or multiplication leading to- .freqof “0 Days” instead of- None(GH 51575)
- Bug in - Timedeltawith NumPy- timedelta64objects not properly raising- ValueError(GH 52806)
- Bug in - to_timedelta()converting- Seriesor- DataFramecontaining- ArrowDtypeof- pyarrow.durationto NumPy- timedelta64(GH 54298)
- Bug in - Timedelta.__hash__(), raising an- OutOfBoundsTimedeltaon certain large values of second resolution (GH 54037)
- Bug in - Timedelta.round()with values close to the implementation bounds returning incorrect results instead of raising- OutOfBoundsTimedelta(GH 51494)
- Bug in - TimedeltaIndex.map()with- na_action="ignore"(GH 51644)
- Bug in - arrays.TimedeltaArray.map()and- TimedeltaIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)
Timezones#
- Bug in - infer_freq()that raises- TypeErrorfor- Seriesof timezone-aware timestamps (GH 52456)
- Bug in - DatetimeTZDtype.base()that always returns a NumPy dtype with nanosecond resolution (GH 52705)
Numeric#
- Bug in - RangeIndexsetting- stepincorrectly when being the subtrahend with minuend a numeric value (GH 53255)
- Bug in - Series.corr()and- Series.cov()raising- AttributeErrorfor masked dtypes (GH 51422)
- Bug when calling - Series.kurt()and- Series.skew()on NumPy data of all zero returning a Python type instead of a NumPy type (GH 53482)
- Bug in - Series.mean(),- DataFrame.mean()with object-dtype values containing strings that can be converted to numbers (e.g. “2”) returning incorrect numeric results; these now raise- TypeError(GH 36703, GH 44008)
- Bug in - DataFrame.corrwith()raising- NotImplementedErrorfor PyArrow-backed dtypes (GH 52314)
- Bug in - DataFrame.size()and- Series.size()returning 64-bit integer instead of a Python int (GH 52897)
- Bug in - DateFrame.dot()returning- objectdtype for- ArrowDtypedata (GH 53979)
- Bug in - Series.any(),- Series.all(),- DataFrame.any(), and- DataFrame.all()had the default value of- bool_onlyset to- Noneinstead of- False; this change should have no impact on users (GH 53258)
- Bug in - Series.corr()and- Series.cov()raising- AttributeErrorfor masked dtypes (GH 51422)
- Bug in - Series.median()and- DataFrame.median()with object-dtype values containing strings that can be converted to numbers (e.g. “2”) returning incorrect numeric results; these now raise- TypeError(GH 34671)
- Bug in - Series.sum()converting dtype- uint64to- int64(GH 53401)
Conversion#
- Bug in - DataFrame.style.to_latex()and- DataFrame.style.to_html()if the DataFrame contains integers with more digits than can be represented by floating point double precision (GH 52272)
- Bug in - array()when given a- datetime64or- timedelta64dtype with unit of “s”, “us”, or “ms” returning- NumpyExtensionArrayinstead of- DatetimeArrayor- TimedeltaArray(GH 52859)
- Bug in - array()when given an empty list and no dtype returning- NumpyExtensionArrayinstead of- FloatingArray(GH 54371)
- Bug in - ArrowDtype.numpy_dtype()returning nanosecond units for non-nanosecond- pyarrow.timestampand- pyarrow.durationtypes (GH 51800)
- Bug in - DataFrame.__repr__()incorrectly raising a- TypeErrorwhen the dtype of a column is- np.record(GH 48526)
- Bug in - DataFrame.info()raising- ValueErrorwhen- use_numbais set (GH 51922)
- Bug in - DataFrame.insert()raising- TypeErrorif- locis- np.int64(GH 53193)
- Bug in - HDFStore.select()loses precision of large int when stored and retrieved (GH 54186)
- Bug in - Series.astype()not supporting- object_(GH 54251)
Strings#
- Bug in - Series.str()that did not raise a- TypeErrorwhen iterated (GH 54173)
- Bug in - reprfor- DataFrame`with string-dtype columns (GH 54797)
Interval#
- IntervalIndex.get_indexer()and- IntervalIndex.get_indexer_nonunique()raising if- targetis read-only array (GH 53703)
- Bug in - IntervalDtypewhere the object could be kept alive when deleted (GH 54184)
- Bug in - interval_range()where a float- stepwould produce incorrect intervals from floating point artifacts (GH 54477)
Indexing#
- Bug in - DataFrame.__setitem__()losing dtype when setting a- DataFrameinto duplicated columns (GH 53143)
- Bug in - DataFrame.__setitem__()with a boolean mask and- DataFrame.putmask()with mixed non-numeric dtypes and a value other than- NaNincorrectly raising- TypeError(GH 53291)
- Bug in - DataFrame.iloc()when using- nanas the only element (GH 52234)
- Bug in - Series.loc()casting- Seriesto- np.dnarraywhen assigning- Seriesat predefined index of- objectdtype- Series(GH 48933)
Missing#
- Bug in - DataFrame.interpolate()failing to fill across data when- methodis- "pad",- "ffill",- "bfill", or- "backfill"(GH 53898)
- Bug in - DataFrame.interpolate()ignoring- inplacewhen- DataFrameis empty (GH 53199)
- Bug in - Series.idxmin(),- Series.idxmax(),- DataFrame.idxmin(),- DataFrame.idxmax()with a- DatetimeIndexindex containing- NaTincorrectly returning- NaNinstead of- NaT(GH 43587)
- Bug in - Series.interpolate()and- DataFrame.interpolate()failing to raise on invalid- downcastkeyword, which can be only- Noneor- "infer"(GH 53103)
- Bug in - Series.interpolate()and- DataFrame.interpolate()with complex dtype incorrectly failing to fill- NaNentries (GH 53635)
MultiIndex#
- Bug in - MultiIndex.set_levels()not preserving dtypes for- Categorical(GH 52125)
- Bug in displaying a - MultiIndexwith a long element (GH 52960)
I/O#
- DataFrame.to_orc()now raising- ValueErrorwhen non-default- Indexis given (GH 51828)
- DataFrame.to_sql()now raising- ValueErrorwhen the name param is left empty while using SQLAlchemy to connect (GH 52675)
- Bug in - json_normalize()could not parse metadata fields list type (GH 37782)
- Bug in - read_csv()where it would error when- parse_dateswas set to a list or dictionary with- engine="pyarrow"(GH 47961)
- Bug in - read_csv()with- engine="pyarrow"raising when specifying a- dtypewith- index_col(GH 53229)
- Bug in - read_hdf()not properly closing store after an- IndexErroris raised (GH 52781)
- Bug in - read_html()where style elements were read into DataFrames (GH 52197)
- Bug in - read_html()where tail texts were removed together with elements containing- display:nonestyle (GH 51629)
- Bug in - read_sql_table()raising an exception when reading a view (GH 52969)
- Bug in - read_sql()when reading multiple timezone aware columns with the same column name (GH 44421)
- Bug in - read_xml()stripping whitespace in string data (GH 53811)
- Bug in - DataFrame.to_html()where- colspacewas incorrectly applied in case of multi index columns (GH 53885)
- Bug in - DataFrame.to_html()where conversion for an empty- DataFramewith complex dtype raised a- ValueError(GH 54167)
- Bug in - DataFrame.to_json()where- DateTimeArray/- DateTimeIndexwith non nanosecond precision could not be serialized correctly (GH 53686)
- Bug when writing and reading empty Stata dta files where dtype information was lost (GH 46240) 
- Bug where - bz2was treated as a hard requirement (GH 53857)
Period#
- Bug in - PeriodDtypeconstructor failing to raise- TypeErrorwhen no argument is passed or when- Noneis passed (GH 27388)
- Bug in - PeriodDtypeconstructor incorrectly returning the same- normalizefor different- DateOffset- freqinputs (GH 24121)
- Bug in - PeriodDtypeconstructor raising- ValueErrorinstead of- TypeErrorwhen an invalid type is passed (GH 51790)
- Bug in - PeriodDtypewhere the object could be kept alive when deleted (GH 54184)
- Bug in - read_csv()not processing empty strings as a null value, with- engine="pyarrow"(GH 52087)
- Bug in - read_csv()returning- objectdtype columns instead of- float64dtype columns with- engine="pyarrow"for columns that are all null with- engine="pyarrow"(GH 52087)
- Bug in - Period.now()not accepting the- freqparameter as a keyword argument (GH 53369)
- Bug in - PeriodIndex.map()with- na_action="ignore"(GH 51644)
- Bug in - arrays.PeriodArray.map()and- PeriodIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)
- Bug in incorrectly allowing construction of - Periodor- PeriodDtypewith- CustomBusinessDayfreq; use- BusinessDayinstead (GH 52534)
Plotting#
- Bug in - Series.plot()when invoked with- color=None(GH 51953)
- Fixed UserWarning in - DataFrame.plot.scatter()when invoked with- c="b"(GH 53908)
Groupby/resample/rolling#
- Bug in - DataFrameGroupBy.idxmin(),- SeriesGroupBy.idxmin(),- DataFrameGroupBy.idxmax(),- SeriesGroupBy.idxmax()returns wrong dtype when used on an empty DataFrameGroupBy or SeriesGroupBy (GH 51423)
- Bug in - DataFrame.groupby.rank()on nullable datatypes when passing- na_option="bottom"or- na_option="top"(GH 54206)
- Bug in - DataFrame.resample()and- Series.resample()in incorrectly allowing non-fixed- freqwhen resampling on a- TimedeltaIndex(GH 51896)
- Bug in - DataFrame.resample()and- Series.resample()losing time zone when resampling empty data (GH 53664)
- Bug in - DataFrame.resample()and- Series.resample()where- originhas no effect in resample when values are outside of axis (GH 53662)
- Bug in weighted rolling aggregations when specifying - min_periods=0(GH 51449)
- Bug in - DataFrame.groupby()and- Series.groupby()where, when the index of the grouped- Seriesor- DataFramewas a- DatetimeIndex,- TimedeltaIndexor- PeriodIndex, and the- groupbymethod was given a function as its first argument, the function operated on the whole index rather than each element of the index (GH 51979)
- Bug in - DataFrameGroupBy.agg()with lists not respecting- as_index=False(GH 52849)
- Bug in - DataFrameGroupBy.apply()causing an error to be raised when the input- DataFramewas subset as a- DataFrameafter groupby (- [['a']]and not- ['a']) and the given callable returned- Seriesthat were not all indexed the same (GH 52444)
- Bug in - DataFrameGroupBy.apply()raising a- TypeErrorwhen selecting multiple columns and providing a function that returns- np.ndarrayresults (GH 18930)
- Bug in - DataFrameGroupBy.groups()and- SeriesGroupBy.groups()with a datetime key in conjunction with another key produced an incorrect number of group keys (GH 51158)
- Bug in - DataFrameGroupBy.quantile()and- SeriesGroupBy.quantile()may implicitly sort the result index with- sort=False(GH 53009)
- Bug in - SeriesGroupBy.size()where the dtype would be- np.int64for data with- ArrowDtypeor masked dtypes (e.g.- Int64) (GH 53831)
- Bug in - DataFrame.groupby()with column selection on the resulting groupby object not returning names as tuples when grouping by a list consisting of a single element (GH 53500)
- Bug in - DataFrameGroupBy.var()and- SeriesGroupBy.var()failing to raise- TypeErrorwhen called with datetime64, timedelta64 or- PeriodDtypevalues (GH 52128, GH 53045)
- Bug in - DataFrameGroupBy.resample()with- kind="period"raising- AttributeError(GH 24103)
- Bug in - Resampler.ohlc()with empty object returning a- Seriesinstead of empty- DataFrame(GH 42902)
- Bug in - SeriesGroupBy.count()and- DataFrameGroupBy.count()where the dtype would be- np.int64for data with- ArrowDtypeor masked dtypes (e.g.- Int64) (GH 53831)
- Bug in - SeriesGroupBy.nth()and- DataFrameGroupBy.nth()after performing column selection when using- dropna="any"or- dropna="all"would not subset columns (GH 53518)
- Bug in - SeriesGroupBy.nth()and- DataFrameGroupBy.nth()raised after performing column selection when using- dropna="any"or- dropna="all"resulted in rows being dropped (GH 53518)
- Bug in - SeriesGroupBy.sum()and- DataFrameGroupBy.sum()summing- np.inf + np.infand- (-np.inf) + (-np.inf)to- np.naninstead of- np.infand- -np.infrespectively (GH 53606)
- Bug in - Series.groupby()raising an error when grouped- Serieshas a- DatetimeIndexindex and a- Serieswith a name that is a month is given to the- byargument (GH 48509)
Reshaping#
- Bug in - concat()coercing to- objectdtype when one column has- pa.null()dtype (GH 53702)
- Bug in - crosstab()when- dropna=Falsewould not keep- np.nanin the result (GH 10772)
- Bug in - melt()where the- variablecolumn would lose extension dtypes (GH 54297)
- Bug in - merge_asof()raising- KeyErrorfor extension dtypes (GH 52904)
- Bug in - merge_asof()raising- ValueErrorfor data backed by read-only ndarrays (GH 53513)
- Bug in - merge_asof()with- left_index=Trueor- right_index=Truewith mismatched index dtypes giving incorrect results in some cases instead of raising- MergeError(GH 53870)
- Bug in - merge()when merging on integer- ExtensionDtypeand float NumPy dtype raising- TypeError(GH 46178)
- Bug in - DataFrame.agg()and- Series.agg()on non-unique columns would return incorrect type when dist-like argument passed in (GH 51099)
- Bug in - DataFrame.combine_first()ignoring other’s columns if- otheris empty (GH 53792)
- Bug in - DataFrame.idxmin()and- DataFrame.idxmax(), where the axis dtype would be lost for empty frames (GH 53265)
- Bug in - DataFrame.merge()not merging correctly when having- MultiIndexwith single level (GH 52331)
- Bug in - DataFrame.stack()losing extension dtypes when columns is a- MultiIndexand frame contains mixed dtypes (GH 45740)
- Bug in - DataFrame.stack()sorting columns lexicographically (GH 53786)
- Bug in - DataFrame.transpose()inferring dtype for object column (GH 51546)
- Bug in - Series.combine_first()converting- int64dtype to- float64and losing precision on very large integers (GH 51764)
- Bug when joining empty - DataFrameobjects, where the joined index would be a- RangeIndexinstead of the joined index type (GH 52777)
Sparse#
- Bug in - SparseDtypeconstructor failing to raise- TypeErrorwhen given an incompatible- dtypefor its subtype, which must be a NumPy dtype (GH 53160)
- Bug in - arrays.SparseArray.map()allowed the fill value to be included in the sparse values (GH 52095)
ExtensionArray#
- Bug in - ArrowStringArrayconstructor raises- ValueErrorwith dictionary types of strings (GH 54074)
- Bug in - DataFrameconstructor not copying- Serieswith extension dtype when given in dict (GH 53744)
- Bug in - ArrowExtensionArrayconverting pandas non-nanosecond temporal objects from non-zero values to zero values (GH 53171)
- Bug in - Series.quantile()for PyArrow temporal types raising- ArrowInvalid(GH 52678)
- Bug in - Series.rank()returning wrong order for small values with- Float64dtype (GH 52471)
- Bug in - Series.unique()for boolean- ArrowDtypewith- NAvalues (GH 54667)
- Bug in - __iter__()and- __getitem__()returning python datetime and timedelta objects for non-nano dtypes (GH 53326)
- Bug in - factorize()returning incorrect uniques for a- pyarrow.dictionarytype- pyarrow.chunked_arraywith more than one chunk (GH 54844)
- Bug when passing an - ExtensionArraysubclass to- dtypekeywords. This will now raise a- UserWarningto encourage passing an instance instead (GH 31356, GH 54592)
- Bug where the - DataFramerepr would not work when a column had an- ArrowDtypewith a- pyarrow.ExtensionDtype(GH 54063)
- Bug where the - __from_arrow__method of masked ExtensionDtypes (e.g.- Float64Dtype,- BooleanDtype) would not accept PyArrow arrays of type- pyarrow.null()(GH 52223)
Styler#
Metadata#
- Fixed metadata propagation in - DataFrame.max(),- DataFrame.min(),- DataFrame.prod(),- DataFrame.mean(),- Series.mode(),- DataFrame.median(),- DataFrame.sem(),- DataFrame.skew(),- DataFrame.kurt()(GH 28283)
- Fixed metadata propagation in - DataFrame.squeeze(), and- DataFrame.describe()(GH 28283)
- Fixed metadata propagation in - DataFrame.std()(GH 28283)
Other#
- Bug in - FloatingArray.__contains__with- NaNitem incorrectly returning- Falsewhen- NaNvalues are present (GH 52840)
- Bug in - DataFrameand- Seriesraising for data of complex dtype when- NaNvalues are present (GH 53627)
- Bug in - DatetimeIndexwhere- reprof index passed with time does not print time is midnight and non-day based freq(GH 53470)
- Bug in - testing.assert_frame_equal()and- testing.assert_series_equal()now throw assertion error for two unequal sets (GH 51727)
- Bug in - testing.assert_frame_equal()checks category dtypes even when asked not to check index type (GH 52126)
- Bug in - api.interchange.from_dataframe()was not respecting- allow_copyargument (GH 54322)
- Bug in - api.interchange.from_dataframe()was raising during interchanging from non-pandas tz-aware data containing null values (GH 54287)
- Bug in - api.interchange.from_dataframe()when converting an empty DataFrame object (GH 53155)
- Bug in - from_dummies()where the resulting- Indexdid not match the original- Index(GH 54300)
- Bug in - from_dummies()where the resulting data would always be- objectdtype instead of the dtype of the columns (GH 54300)
- Bug in - DataFrameGroupBy.first(),- DataFrameGroupBy.last(),- SeriesGroupBy.first(), and- SeriesGroupBy.last()where an empty group would return- np.naninstead of the corresponding- ExtensionArrayNA value (GH 39098)
- Bug in - DataFrame.pivot_table()with casting the mean of ints back to an int (GH 16676)
- Bug in - DataFrame.reindex()with a- fill_valuethat should be inferred with a- ExtensionDtypeincorrectly inferring- objectdtype (GH 52586)
- Bug in - DataFrame.shift()with- axis=1on a- DataFramewith a single- ExtensionDtypecolumn giving incorrect results (GH 53832)
- Bug in - Index.sort_values()when a- keyis passed (GH 52764)
- Bug in - Series.align(),- DataFrame.align(),- Series.reindex(),- DataFrame.reindex(),- Series.interpolate(),- DataFrame.interpolate(), incorrectly failing to raise with method=”asfreq” (GH 53620)
- Bug in - Series.argsort()failing to raise when an invalid- axisis passed (GH 54257)
- Bug in - Series.map()when giving a callable to an empty series, the returned series had- objectdtype. It now keeps the original dtype (GH 52384)
- Bug in - Series.memory_usage()when- deep=Truethrow an error with Series of objects and the returned value is incorrect, as it does not take into account GC corrections (GH 51858)
- Bug in - period_range()the default behavior when freq was not passed as an argument was incorrect(GH 53687)
- Fixed incorrect - __name__attribute of- pandas._libs.json(GH 52898)
Contributors#
A total of 266 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- AG + 
- Aarni Koskela 
- Adrian D’Alessandro + 
- Adrien RUAULT + 
- Ahmad + 
- Aidos Kanapyanov + 
- Alex Malins 
- Alexander Seiler + 
- Ali Asgar + 
- Allison Kwan 
- Amanda Bizzinotto + 
- Andres Algaba + 
- Angela Seo + 
- Anirudh Hegde + 
- Antony Evmorfopoulos + 
- Anushka Bishnoi 
- ArnaudChanoine + 
- Artem Vorobyev + 
- Arya Sarkar + 
- Ashwin Srinath 
- Austin Au-Yeung + 
- Austin Burnett + 
- Bear + 
- Ben Mangold + 
- Bernardo Gameiro + 
- Boyd Kane + 
- Brayan Alexander Muñoz B + 
- Brock 
- Chetan0402 + 
- Chris Carini 
- ChristofKaufmann 
- Clark-W + 
- Conrad Mcgee Stocks 
- Corrie Bartelheimer + 
- Coulton Theuer + 
- D067751 + 
- Daniel Isaac 
- Daniele Nicolodi + 
- David Samuel + 
- David Seifert + 
- Dea Leon + 
- Dea María Léon 
- Deepyaman Datta 
- Denis Sapozhnikov + 
- Dharani Akurathi + 
- DimiGrammatikakis + 
- Dirk Ulbricht + 
- Dmitry Shemetov + 
- Dominik Berger 
- Efkan S. Goktepe + 
- Ege Özgüroğlu 
- Eli Schwartz 
- Erdi + 
- Fabrizio Primerano + 
- Facundo Batista + 
- Fangchen Li 
- Felipe Maion + 
- Francis + 
- Future Programmer + 
- Gabriel Kabbe + 
- Gaétan Ramet + 
- Gianluca Ficarelli 
- Godwill Agbehonou + 
- Guillaume Lemaitre 
- Guo Ci 
- Gustavo Vargas + 
- Hamidreza Sanaee + 
- HappyHorse + 
- Harald Husum + 
- Hugo van Kemenade 
- Ido Ronen + 
- Irv Lustig 
- JHM Darbyshire 
- JHM Darbyshire (iMac) 
- JJ + 
- Jarrod Millman 
- Jay + 
- Jeff Reback 
- Jessica Greene + 
- Jiawei Zhang + 
- Jinli Xiao + 
- Joanna Ge + 
- Jona Sassenhagen + 
- Jonas Haag 
- Joris Van den Bossche 
- Joshua Shew + 
- Julian Badillo 
- Julian Ortiz + 
- Julien Palard + 
- Justin Tyson + 
- Justus Magin 
- Kabiir Krishna + 
- Kang Su Min 
- Ketu Patel + 
- Kevin + 
- Kevin Anderson 
- Kevin Jan Anker 
- Kevin Klein + 
- Kevin Sheppard 
- Kostya Farber 
- LM + 
- Lars Lien Ankile + 
- Lawrence Mitchell 
- Liwei Cai + 
- Loic Diridollou 
- Luciana Solorzano + 
- Luke Manley 
- Lumberbot (aka Jack) 
- Marat Kopytjuk + 
- Marc Garcia 
- Marco Edward Gorelli 
- MarcoGorelli 
- Maria Telenczuk + 
- MarvinGravert + 
- Mateusz Sokół + 
- Matt Richards 
- Matthew Barber + 
- Matthew Roeschke 
- Matus Valo + 
- Mia Reimer + 
- Michael Terry + 
- Michael Tiemann + 
- Milad Maani Jou + 
- Miles Cranmer + 
- MirijaH + 
- Miyuu + 
- Natalia Mokeeva 
- Nathan Goldbaum + 
- Nicklaus Roach + 
- Nicolas Camenisch + 
- Nikolay Boev + 
- Nirav 
- Nishu Choudhary 
- Noa Tamir 
- Noy Hanan + 
- Numan + 
- Numan Ijaz + 
- Omar Elbaz + 
- Pandas Development Team 
- Parfait Gasana 
- Parthi 
- Patrick Hoefler 
- Patrick Schleiter + 
- Pawel Kranzberg + 
- Philip 
- Philip Meier + 
- Pranav Saibhushan Ravuri 
- PrathumP + 
- Rahul Siloniya + 
- Rajasvi Vinayak + 
- Rajat Subhra Mukherjee + 
- Ralf Gommers 
- RaphSku 
- Rebecca Chen + 
- Renato Cotrim Maciel + 
- Reza (Milad) Maanijou + 
- Richard Shadrach 
- Rithik Reddy + 
- Robert Luce + 
- Ronalido + 
- Rylie Wei + 
- SOUMYADIP MAL + 
- Sanjith Chockan + 
- Sayed Qaiser Ali + 
- Scott Harp + 
- Se + 
- Shashwat Agrawal 
- Simar Bassi + 
- Simon Brugman + 
- Simon Hawkins 
- Simon Høxbro Hansen 
- Snorf Yang + 
- Sortofamudkip + 
- Stefan Krawczyk 
- Stefanie Molin 
- Stefanie Senger 
- Stelios Petrakis + 
- Stijn Van Hoey 
- Sven 
- Sylvain MARIE 
- Sylvain Marié 
- Terji Petersen 
- Thierry Moisan 
- Thomas 
- Thomas A Caswell 
- Thomas Grainger 
- Thomas Li 
- Thomas Vranken + 
- Tianye Song + 
- Tim Hoffmann 
- Tim Loderhose + 
- Tim Swast 
- Timon Jurschitsch + 
- Tolker-KU + 
- Tomas Pavlik + 
- Toroi + 
- Torsten Wörtwein 
- Travis Gibbs + 
- Umberto Fasci + 
- Valerii + 
- VanMyHu + 
- Victor Momodu + 
- Vijay Vaidyanathan + 
- VomV + 
- William Andrea 
- William Ayd 
- Wolf Behrenhoff + 
- Xiao Yuan 
- Yao Xiao 
- Yasin Tatar 
- Yaxin Li + 
- Yi Wei + 
- Yulia + 
- Yusharth Singh + 
- Zach Breger + 
- Zhengbo Wang 
- abokey1 + 
- ahmad2901 + 
- assafam + 
- auderson 
- august-tengland + 
- bunardsheng + 
- cmmck + 
- cnguyen-03 + 
- coco + 
- dependabot[bot] 
- giplessis + 
- github-actions[bot] 
- gmaiwald + 
- gmollard + 
- jbrockmendel 
- kathleenhang 
- kevx82 + 
- lia2710 + 
- liang3zy22 + 
- ltartaro + 
- lusolorz + 
- m-ganko + 
- mKlepsch + 
- mattkeanny + 
- mrastgoo + 
- nabdoni + 
- omar-elbaz + 
- paulreece + 
- penelopeysm + 
- potap75 + 
- pre-commit-ci[bot] + 
- raanasn + 
- raj-thapa + 
- ramvikrams + 
- rebecca-palmer 
- reddyrg1 + 
- rmhowe425 + 
- segatrade + 
- shteken + 
- sweisss + 
- taytzehao 
- tntmatthews + 
- tpaxman + 
- tzehaoo + 
- v-mcoutinho + 
- wcgonzal + 
- yonashub 
- yusharth + 
- Ádám Lippai 
- Štěpán Műller +