What’s new in 2.1.0 (Aug 30, 2023)#
These are the changes in pandas 2.1.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
PyArrow will become a required dependency with pandas 3.0#
PyArrow will become a required dependency of pandas starting with pandas 3.0. This decision was made based on PDEP 10.
This will enable more changes that are hugely beneficial to pandas users, including but not limited to:
inferring strings as PyArrow backed strings by default enabling a significant reduction of the memory footprint and huge performance improvements.
inferring more complex dtypes with PyArrow by default, like
Decimal,lists,bytes,structured dataand more.Better interoperability with other libraries that depend on Apache Arrow.
We are collecting feedback on this decision here.
Avoid NumPy object dtype for strings by default#
Previously, all strings were stored in columns with NumPy object dtype by default.
This release introduces an option future.infer_string that infers all
strings as PyArrow backed strings with dtype "string[pyarrow_numpy]" instead.
This is a new string dtype implementation that follows NumPy semantics in comparison
operations and will return np.nan as the missing value indicator.
Setting the option will also infer the dtype "string" as a StringDtype with
storage set to "pyarrow_numpy", ignoring the value behind the option
mode.string_storage.
This option only works if PyArrow is installed. PyArrow backed strings have a significantly reduced memory footprint and provide a big performance improvement compared to NumPy object (GH 54430).
The option can be enabled with:
pd.options.future.infer_string = True
This behavior will become the default with pandas 3.0.
DataFrame reductions preserve extension dtypes#
In previous versions of pandas, the results of DataFrame reductions
(DataFrame.sum() DataFrame.mean() etc.) had NumPy dtypes, even when the DataFrames
were of extension dtypes. Pandas can now keep the dtypes when doing reductions over DataFrame
columns with a common dtype (GH 52788).
Old Behavior
In [1]: df = pd.DataFrame({"a": [1, 1, 2, 1], "b": [np.nan, 2.0, 3.0, 4.0]}, dtype="Int64")
In [2]: df.sum()
Out[2]:
a 5
b 9
dtype: int64
In [3]: df = df.astype("int64[pyarrow]")
In [4]: df.sum()
Out[4]:
a 5
b 9
dtype: int64
New Behavior
In [1]: df = pd.DataFrame({"a": [1, 1, 2, 1], "b": [np.nan, 2.0, 3.0, 4.0]}, dtype="Int64")
In [2]: df.sum()
Out[2]:
a 5
b 9
dtype: Int64
In [3]: df = df.astype("int64[pyarrow]")
In [4]: df.sum()
Out[4]:
a 5
b 9
dtype: int64[pyarrow]
Notice that the dtype is now a masked dtype and PyArrow dtype, respectively, while previously it was a NumPy integer dtype.
To allow DataFrame reductions to preserve extension dtypes, ExtensionArray._reduce() has gotten a new keyword parameter keepdims. Calling ExtensionArray._reduce() with keepdims=True should return an array of length 1 along the reduction axis. In order to maintain backward compatibility, the parameter is not required, but will it become required in the future. If the parameter is not found in the signature, DataFrame reductions can not preserve extension dtypes. Also, if the parameter is not found, a FutureWarning will be emitted and type checkers like mypy may complain about the signature not being compatible with ExtensionArray._reduce().
Copy-on-Write improvements#
Series.transform()not respecting Copy-on-Write whenfuncmodifiesSeriesinplace (GH 53747)Calling
Index.values()will now return a read-only NumPy array (GH 53704)Setting a
Seriesinto aDataFramenow creates a lazy instead of a deep copy (GH 53142)The
DataFrameconstructor, when constructing a DataFrame from a dictionary of Index objects and specifyingcopy=False, will now use a lazy copy of those Index objects for the columns of the DataFrame (GH 52947)A shallow copy of a Series or DataFrame (
df.copy(deep=False)) will now also return a shallow copy of the rows/columnsIndexobjects instead of only a shallow copy of the data, i.e. the index of the result is no longer identical (df.copy(deep=False).index is df.indexis no longer True) (GH 53721)DataFrame.head()andDataFrame.tail()will now return deep copies (GH 54011)Add lazy copy mechanism to
DataFrame.eval()(GH 53746)Trying to operate inplace on a temporary column selection (for example,
df["a"].fillna(100, inplace=True)) will now always raise a warning when Copy-on-Write is enabled. In this mode, operating inplace like this will never work, since the selection behaves as a temporary copy. This holds true for:DataFrame.update / Series.update
DataFrame.fillna / Series.fillna
DataFrame.replace / Series.replace
DataFrame.clip / Series.clip
DataFrame.where / Series.where
DataFrame.mask / Series.mask
DataFrame.interpolate / Series.interpolate
DataFrame.ffill / Series.ffill
DataFrame.bfill / Series.bfill
New DataFrame.map() method and support for ExtensionArrays#
The DataFrame.map() been added and DataFrame.applymap() has been deprecated. DataFrame.map() has the same functionality as DataFrame.applymap(), but the new name better communicates that this is the DataFrame version of Series.map() (GH 52353).
When given a callable, Series.map() applies the callable to all elements of the Series.
Similarly, DataFrame.map() applies the callable to all elements of the DataFrame,
while Index.map() applies the callable to all elements of the Index.
Frequently, it is not desirable to apply the callable to nan-like values of the array and to avoid doing
that, the map method could be called with na_action="ignore", i.e. ser.map(func, na_action="ignore").
However, na_action="ignore" was not implemented for many ExtensionArray and Index types
and na_action="ignore" did not work correctly for any ExtensionArray subclass except the nullable numeric ones (i.e. with dtype Int64 etc.).
na_action="ignore" now works for all array types (GH 52219, GH 51645, GH 51809, GH 51936, GH 52033; GH 52096).
Previous behavior:
In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [2]: ser.map(str.upper, na_action="ignore")
NotImplementedError
In [3]: df = pd.DataFrame(ser)
In [4]: df.applymap(str.upper, na_action="ignore") # worked for DataFrame
0
0 A
1 B
2 NaN
In [5]: idx = pd.Index(ser)
In [6]: idx.map(str.upper, na_action="ignore")
TypeError: CategoricalIndex.map() got an unexpected keyword argument 'na_action'
New behavior:
In [5]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [6]: ser.map(str.upper, na_action="ignore")
Out[6]:
0 A
1 B
2 NaN
dtype: category
Categories (2, object): ['A', 'B']
In [7]: df = pd.DataFrame(ser)
In [8]: df.map(str.upper, na_action="ignore")
Out[8]:
0
0 A
1 B
2 NaN
In [9]: idx = pd.Index(ser)
In [10]: idx.map(str.upper, na_action="ignore")
Out[10]: CategoricalIndex(['A', 'B', nan], categories=['A', 'B'], ordered=False, dtype='category')
Also, note that Categorical.map() implicitly has had its na_action set to "ignore" by default.
This has been deprecated and the default for Categorical.map() will change
to na_action=None, consistent with all the other array types.
New implementation of DataFrame.stack()#
pandas has reimplemented DataFrame.stack(). To use the new implementation, pass the argument future_stack=True. This will become the only option in pandas 3.0.
The previous implementation had two main behavioral downsides.
The previous implementation would unnecessarily introduce NA values into the result. The user could have NA values automatically removed by passing
dropna=True(the default), but doing this could also remove NA values from the result that existed in the input. See the examples below.The previous implementation with
sort=True(the default) would sometimes sort part of the resulting index, and sometimes not. If the input’s columns are not aMultiIndex, then the resulting index would never be sorted. If the columns are aMultiIndex, then in most cases the level(s) in the resulting index that come from stacking the column level(s) would be sorted. In rare cases such level(s) would be sorted in a non-standard order, depending on how the columns were created.
The new implementation (future_stack=True) will no longer unnecessarily introduce NA values when stacking multiple levels and will never sort. As such, the arguments dropna and sort are not utilized and must remain unspecified when using future_stack=True. These arguments will be removed in the next major release.
In [11]: columns = pd.MultiIndex.from_tuples([("B", "d"), ("A", "c")])
In [12]: df = pd.DataFrame([[0, 2], [1, 3]], index=["z", "y"], columns=columns)
In [13]: df
Out[13]:
B A
d c
z 0 2
y 1 3
In the previous version (future_stack=False), the default of dropna=True would remove unnecessarily introduced NA values but still coerce the dtype to float64 in the process. In the new version, no NAs are introduced and so there is no coercion of the dtype.
In [14]: df.stack([0, 1], future_stack=False, dropna=True)
Out[14]:
z A c 2.0
B d 0.0
y A c 3.0
B d 1.0
dtype: float64
In [15]: df.stack([0, 1], future_stack=True)
Out[15]:
z B d 0
A c 2
y B d 1
A c 3
dtype: int64
If the input contains NA values, the previous version would drop those as well with dropna=True or introduce new NA values with dropna=False. The new version persists all values from the input.
In [16]: df = pd.DataFrame([[0, 2], [np.nan, np.nan]], columns=columns)
In [17]: df
Out[17]:
B A
d c
0 0.0 2.0
1 NaN NaN
In [18]: df.stack([0, 1], future_stack=False, dropna=True)
Out[18]:
0 A c 2.0
B d 0.0
dtype: float64
In [19]: df.stack([0, 1], future_stack=False, dropna=False)
Out[19]:
0 A d NaN
c 2.0
B d 0.0
c NaN
1 A d NaN
c NaN
B d NaN
c NaN
dtype: float64
In [20]: df.stack([0, 1], future_stack=True)
Out[20]:
0 B d 0.0
A c 2.0
1 B d NaN
A c NaN
dtype: float64
Other enhancements#
Series.ffill()andSeries.bfill()are now supported for objects withIntervalDtype(GH 54247)Added
filtersparameter toread_parquet()to filter out data, compatible with bothengines(GH 53212)Categorical.map()andCategoricalIndex.map()now have ana_actionparameter.Categorical.map()implicitly had a default value of"ignore"forna_action. This has formally been deprecated and will be changed toNonein the future. Also notice thatSeries.map()has defaultna_action=Noneand calls to series with categorical data will now usena_action=Noneunless explicitly set otherwise (GH 44279)api.extensions.ExtensionArraynow has amap()method (GH 51809)DataFrame.applymap()now uses themap()method of underlyingapi.extensions.ExtensionArrayinstances (GH 52219)MultiIndex.sort_values()now supportsna_position(GH 51612)MultiIndex.sortlevel()andIndex.sortlevel()gained a new keywordna_position(GH 51612)arrays.DatetimeArray.map(),arrays.TimedeltaArray.map()andarrays.PeriodArray.map()can now take ana_actionargument (GH 51644)arrays.SparseArray.map()now supportsna_action(GH 52096).pandas.read_html()now supports thestorage_optionskeyword when used with a URL, allowing users to add headers to the outbound HTTP request (GH 49944)Add
Index.diff()andIndex.round()(GH 19708)Add
"latex-math"as an option to theescapeargument ofStylerwhich will not escape all characters between"\("and"\)"during formatting (GH 51903)Add dtype of categories to
reprinformation ofCategoricalDtype(GH 52179)Adding
engine_kwargsparameter toread_excel()(GH 52214)Classes that are useful for type-hinting have been added to the public API in the new submodule
pandas.api.typing(GH 48577)Implemented
Series.dt.is_month_start,Series.dt.is_month_end,Series.dt.is_year_start,Series.dt.is_year_end,Series.dt.is_quarter_start,Series.dt.is_quarter_end,Series.dt.days_in_month,Series.dt.unit,Series.dt.normalize,Series.dt.day_name(),Series.dt.month_name(),Series.dt.tz_convert()forArrowDtypewithpyarrow.timestamp(GH 52388, GH 51718)DataFrameGroupBy.agg()andDataFrameGroupBy.transform()now support grouping by multiple keys when the index is not aMultiIndexforengine="numba"(GH 53486)SeriesGroupBy.agg()andDataFrameGroupBy.agg()now support passing in multiple functions forengine="numba"(GH 53486)SeriesGroupBy.transform()andDataFrameGroupBy.transform()now support passing in a string as the function forengine="numba"(GH 53579)DataFrame.stack()gained thesortkeyword to dictate whether the resultingMultiIndexlevels are sorted (GH 15105)DataFrame.unstack()gained thesortkeyword to dictate whether the resultingMultiIndexlevels are sorted (GH 15105)Series.explode()now supports PyArrow-backed list types (GH 53602)Series.str.join()now supportsArrowDtype(pa.string())(GH 53646)Add
validateparameter toCategorical.from_codes()(GH 50975)Added
ExtensionArray.interpolate()used bySeries.interpolate()andDataFrame.interpolate()(GH 53659)Added
engine_kwargsparameter toDataFrame.to_excel()(GH 53220)Implemented
api.interchange.from_dataframe()forDatetimeTZDtype(GH 54239)Implemented
__from_arrow__onDatetimeTZDtype(GH 52201)Implemented
__pandas_priority__to allow custom types to take precedence overDataFrame,Series,Index, orExtensionArrayfor arithmetic operations, see the developer guide (GH 48347)Improve error message when having incompatible columns using
DataFrame.merge()(GH 51861)Improve error message when setting
DataFramewith wrong number of columns throughDataFrame.isetitem()(GH 51701)Improved error handling when using
DataFrame.to_json()with incompatibleindexandorientarguments (GH 52143)Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns (GH 52084)
Improved error message when providing an invalid
indexoroffsetargument toVariableOffsetWindowIndexer(GH 54379)Let
DataFrame.to_feather()accept a non-defaultIndexand non-string column names (GH 51787)Added a new parameter
by_rowtoSeries.apply()andDataFrame.apply(). When set toFalsethe supplied callables will always operate on the whole Series or DataFrame (GH 53400, GH 53601).DataFrame.shift()andSeries.shift()now allow shifting by multiple periods by supplying a list of periods (GH 44424)Groupby aggregations with
numba(such asDataFrameGroupBy.sum()) now can preserve the dtype of the input instead of casting tofloat64(GH 44952)Improved error message when
DataFrameGroupBy.agg()failed (GH 52930)Many read/to_* functions, such as
DataFrame.to_pickle()andread_csv(), support forwarding compression arguments tolzma.LZMAFile(GH 52979)Reductions
Series.argmax(),Series.argmin(),Series.idxmax(),Series.idxmin(),Index.argmax(),Index.argmin(),DataFrame.idxmax(),DataFrame.idxmin()are now supported for object-dtype (GH 4279, GH 18021, GH 40685, GH 43697)DataFrame.to_parquet()andread_parquet()will now write and readattrsrespectively (GH 54346)Index.all()andIndex.any()with floating dtypes and timedelta64 dtypes no longer raiseTypeError, matching theSeries.all()andSeries.any()behavior (GH 54566)Series.cummax(),Series.cummin()andSeries.cumprod()are now supported for pyarrow dtypes with pyarrow version 13.0 and above (GH 52085)Added support for the DataFrame Consortium Standard (GH 54383)
Performance improvement in
DataFrameGroupBy.quantile()andSeriesGroupBy.quantile()(GH 51722)PyArrow-backed integer dtypes now support bitwise operations (GH 54495)
Backwards incompatible API changes#
Increased minimum version for Python#
pandas 2.1.0 supports Python 3.9 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
|---|---|---|---|
numpy |
1.22.4 |
X |
X |
mypy (dev) |
1.4.1 |
X |
|
beautifulsoup4 |
4.11.1 |
X |
|
bottleneck |
1.3.4 |
X |
|
dataframe-api-compat |
0.1.7 |
X |
|
fastparquet |
0.8.1 |
X |
|
fsspec |
2022.05.0 |
X |
|
hypothesis |
6.46.1 |
X |
|
gcsfs |
2022.05.0 |
X |
|
jinja2 |
3.1.2 |
X |
|
lxml |
4.8.0 |
X |
|
numba |
0.55.2 |
X |
|
numexpr |
2.8.0 |
X |
|
openpyxl |
3.0.10 |
X |
|
pandas-gbq |
0.17.5 |
X |
|
psycopg2 |
2.9.3 |
X |
|
pyreadstat |
1.1.5 |
X |
|
pyqt5 |
5.15.6 |
X |
|
pytables |
3.7.0 |
X |
|
pytest |
7.3.2 |
X |
|
python-snappy |
0.6.1 |
X |
|
pyxlsb |
1.0.9 |
X |
|
s3fs |
2022.05.0 |
X |
|
scipy |
1.8.1 |
X |
|
sqlalchemy |
1.4.36 |
X |
|
tabulate |
0.8.10 |
X |
|
xarray |
2022.03.0 |
X |
|
xlsxwriter |
3.0.3 |
X |
|
zstandard |
0.17.0 |
X |
For optional libraries the general recommendation is to use the latest version.
See Dependencies and Optional dependencies for more.
Other API changes#
arrays.PandasArrayhas been renamedNumpyExtensionArrayand the attached dtype name changed fromPandasDtypetoNumpyEADtype; importingPandasArraystill works until the next major version (GH 53694)
Deprecations#
Deprecated silent upcasting in setitem-like Series operations#
PDEP-6: https://pandas.pydata.org/pdeps/0006-ban-upcasting.html
Setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning. Examples of affected operations are:
ser.fillna('foo', inplace=True)ser.where(ser.isna(), 'foo', inplace=True)ser.iloc[indexer] = 'foo'ser.loc[indexer] = 'foo'df.iloc[indexer, 0] = 'foo'df.loc[indexer, 'a'] = 'foo'ser[indexer] = 'foo'
where ser is a Series, df is a DataFrame, and indexer
could be a slice, a mask, a single value, a list or array of values, or any other
allowed indexer.
In a future version, these will raise an error and you should cast to a common dtype first.
Previous behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
New behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
To retain the current behaviour, in the case above you could cast ser to object dtype first:
In [21]: ser = pd.Series([1, 2, 3])
In [22]: ser = ser.astype('object')
In [23]: ser[0] = 'not an int64'
In [24]: ser
Out[24]:
0 not an int64
1 2
2 3
dtype: object
Depending on the use-case, it might be more appropriate to cast to a different dtype.
In the following, for example, we cast to float64:
In [25]: ser = pd.Series([1, 2, 3])
In [26]: ser = ser.astype('float64')
In [27]: ser[0] = 1.1
In [28]: ser
Out[28]:
0 1.1
1 2.0
2 3.0
dtype: float64
For further reading, please see https://pandas.pydata.org/pdeps/0006-ban-upcasting.html.
Deprecated parsing datetimes with mixed time zones#
Parsing datetimes with mixed time zones is deprecated and shows a warning unless user passes utc=True to to_datetime() (GH 50887)
Previous behavior:
In [7]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
In [8]: pd.to_datetime(data, utc=False)
Out[8]:
Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
New behavior:
In [9]: pd.to_datetime(data, utc=False)
FutureWarning:
In a future version of pandas, parsing datetimes with mixed time zones will raise
a warning unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour
and silence this warning. To create a `Series` with mixed offsets and `object` dtype,
please use `apply` and `datetime.datetime.strptime`.
Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
In order to silence this warning and avoid an error in a future version of pandas,
please specify utc=True:
In [29]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
In [30]: pd.to_datetime(data, utc=True)
Out[30]: DatetimeIndex(['2019-12-31 18:00:00+00:00', '2019-12-31 23:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
To create a Series with mixed offsets and object dtype, please use apply
and datetime.datetime.strptime:
In [31]: import datetime as dt
In [32]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
In [33]: pd.Series(data).apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z'))
Out[33]:
0 2020-01-01 00:00:00+06:00
1 2020-01-01 00:00:00+01:00
dtype: object
Other Deprecations#
Deprecated
DataFrameGroupBy.dtypes, checkdtypeson the underlying object instead (GH 51045)Deprecated
DataFrame._dataandSeries._data, use public APIs instead (GH 33333)Deprecated
concat()behavior when any of the objects being concatenated have length 0; in the past the dtypes of empty objects were ignored when determining the resulting dtype, in a future version they will not (GH 39122)Deprecated
Categorical.to_list(), useobj.tolist()instead (GH 51254)Deprecated
DataFrameGroupBy.all()andDataFrameGroupBy.any()with datetime64 orPeriodDtypevalues, matching theSeriesandDataFramedeprecations (GH 34479)Deprecated
axis=1inDataFrame.ewm(),DataFrame.rolling(),DataFrame.expanding(), transpose before calling the method instead (GH 51778)Deprecated
axis=1inDataFrame.groupby()and inGrouperconstructor, doframe.T.groupby(...)instead (GH 51203)Deprecated
broadcast_axiskeyword inSeries.align()andDataFrame.align(), upcast before callingalignwithleft = DataFrame({col: left for col in right.columns}, index=right.index)(GH 51856)Deprecated
downcastkeyword inIndex.fillna()(GH 53956)Deprecated
fill_methodandlimitkeywords inDataFrame.pct_change(),Series.pct_change(),DataFrameGroupBy.pct_change(), andSeriesGroupBy.pct_change(), explicitly call e.g.DataFrame.ffill()orDataFrame.bfill()before callingpct_changeinstead (GH 53491)Deprecated
method,limit, andfill_axiskeywords inDataFrame.align()andSeries.align(), explicitly callDataFrame.fillna()orSeries.fillna()on the alignment results instead (GH 51856)Deprecated
quantilekeyword inRolling.quantile()andExpanding.quantile(), renamed toqinstead (GH 52550)Deprecated accepting slices in
DataFrame.take(), callobj[slicer]or pass a sequence of integers instead (GH 51539)Deprecated behavior of
DataFrame.idxmax(),DataFrame.idxmin(),Series.idxmax(),Series.idxmin()in with all-NA entries or any-NA andskipna=False; in a future version these will raiseValueError(GH 51276)Deprecated explicit support for subclassing
Index(GH 45289)Deprecated making functions given to
Series.agg()attempt to operate on each element in theSeriesand only operate on the wholeSeriesif the elementwise operations failed. In the future, functions given toSeries.agg()will always operate on the wholeSeriesonly. To keep the current behavior, useSeries.transform()instead (GH 53325)Deprecated making the functions in a list of functions given to
DataFrame.agg()attempt to operate on each element in theDataFrameand only operate on the columns of theDataFrameif the elementwise operations failed. To keep the current behavior, useDataFrame.transform()instead (GH 53325)Deprecated passing a
DataFrametoDataFrame.from_records(), useDataFrame.set_index()orDataFrame.drop()instead (GH 51353)Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (GH 18702)
Deprecated the
axiskeyword inDataFrame.ewm(),Series.ewm(),DataFrame.rolling(),Series.rolling(),DataFrame.expanding(),Series.expanding()(GH 51778)Deprecated the
axiskeyword inDataFrame.resample(),Series.resample()(GH 51778)Deprecated the
downcastkeyword inSeries.interpolate(),DataFrame.interpolate(),Series.fillna(),DataFrame.fillna(),Series.ffill(),DataFrame.ffill(),Series.bfill(),DataFrame.bfill()(GH 40988)Deprecated the behavior of
concat()with bothlen(keys) != len(objs), in a future version this will raise instead of truncating to the shorter of the two sequences (GH 43485)Deprecated the behavior of
Series.argsort()in the presence of NA values; in a future version these will be sorted at the end instead of giving -1 (GH 54219)Deprecated the default of
observed=FalseinDataFrame.groupby()andSeries.groupby(); this will default toTruein a future version (GH 43999)Deprecating pinning
group.nameto each group inSeriesGroupBy.aggregate()aggregations; if your operation requires utilizing the groupby keys, iterate over the groupby object instead (GH 41090)Deprecated the
axiskeyword inDataFrameGroupBy.idxmax(),DataFrameGroupBy.idxmin(),DataFrameGroupBy.fillna(),DataFrameGroupBy.take(),DataFrameGroupBy.skew(),DataFrameGroupBy.rank(),DataFrameGroupBy.cumprod(),DataFrameGroupBy.cumsum(),DataFrameGroupBy.cummax(),DataFrameGroupBy.cummin(),DataFrameGroupBy.pct_change(),DataFrameGroupBy.diff(),DataFrameGroupBy.shift(), andDataFrameGroupBy.corrwith(); foraxis=1operate on the underlyingDataFrameinstead (GH 50405, GH 51046)Deprecated
DataFrameGroupBywithas_index=Falsenot including groupings in the result when they are not columns of the DataFrame (GH 49519)Deprecated
is_categorical_dtype(), useisinstance(obj.dtype, pd.CategoricalDtype)instead (GH 52527)Deprecated
is_datetime64tz_dtype(), checkisinstance(dtype, pd.DatetimeTZDtype)instead (GH 52607)Deprecated
is_int64_dtype(), checkdtype == np.dtype(np.int64)instead (GH 52564)Deprecated
is_interval_dtype(), checkisinstance(dtype, pd.IntervalDtype)instead (GH 52607)Deprecated
is_period_dtype(), checkisinstance(dtype, pd.PeriodDtype)instead (GH 52642)Deprecated
is_sparse(), checkisinstance(dtype, pd.SparseDtype)instead (GH 52642)Deprecated
Styler.applymap_index(). Use the newStyler.map_index()method instead (GH 52708)Deprecated
Styler.applymap(). Use the newStyler.map()method instead (GH 52708)Deprecated
DataFrame.applymap(). Use the newDataFrame.map()method instead (GH 52353)Deprecated
DataFrame.swapaxes()andSeries.swapaxes(), useDataFrame.transpose()orSeries.transpose()instead (GH 51946)Deprecated
freqparameter inPeriodArrayconstructor, passdtypeinstead (GH 52462)Deprecated allowing non-standard inputs in
take(), pass either anumpy.ndarray,ExtensionArray,Index, orSeries(GH 52981)Deprecated allowing non-standard sequences for
isin(),value_counts(),unique(),factorize(), case to one ofnumpy.ndarray,Index,ExtensionArray, orSeriesbefore calling (GH 52986)Deprecated behavior of
DataFramereductionssum,prod,std,var,semwithaxis=None, in a future version this will operate over both axes returning a scalar instead of behaving likeaxis=0; note this also affects numpy functions e.g.np.sum(df)(GH 21597)Deprecated behavior of
concat()whenDataFramehas columns that are all-NA, in a future version these will not be discarded when determining the resulting dtype (GH 40893)Deprecated behavior of
Series.dt.to_pydatetime(), in a future version this will return aSeriescontaining pythondatetimeobjects instead of anndarrayof datetimes; this matches the behavior of otherSeries.dtproperties (GH 20306)Deprecated logical operations (
|,&,^) between pandas objects and dtype-less sequences (e.g.list,tuple), wrap a sequence in aSeriesor NumPy array before operating instead (GH 51521)Deprecated parameter
convert_typeinSeries.apply()(GH 52140)Deprecated passing a dictionary to
SeriesGroupBy.agg(); pass a list of aggregations instead (GH 50684)Deprecated the
fastpathkeyword inCategoricalconstructor, useCategorical.from_codes()instead (GH 20110)Deprecated the behavior of
is_bool_dtype()returningTruefor object-dtypeIndexof bool objects (GH 52680)Deprecated the methods
Series.bool()andDataFrame.bool()(GH 51749)Deprecated unused
closedandnormalizekeywords in theDatetimeIndexconstructor (GH 52628)Deprecated unused
closedkeyword in theTimedeltaIndexconstructor (GH 52628)Deprecated logical operation between two non boolean
Serieswith different indexes always coercing the result to bool dtype. In a future version, this will maintain the return type of the inputs (GH 52500, GH 52538)Deprecated
PeriodandPeriodDtypewithBDayfreq, use aDatetimeIndexwithBDayfreq instead (GH 53446)Deprecated
value_counts(), usepd.Series(obj).value_counts()instead (GH 47862)Deprecated
Series.first()andDataFrame.first(); create a mask and filter using.locinstead (GH 45908)Deprecated
Series.interpolate()andDataFrame.interpolate()for object-dtype (GH 53631)Deprecated
Series.last()andDataFrame.last(); create a mask and filter using.locinstead (GH 53692)Deprecated allowing arbitrary
fill_valueinSparseDtype, in a future version thefill_valuewill need to be compatible with thedtype.subtype, either a scalar that can be held by that subtype orNaNfor integer or bool subtypes (GH 23124)Deprecated allowing bool dtype in
DataFrameGroupBy.quantile()andSeriesGroupBy.quantile(), consistent with theSeries.quantile()andDataFrame.quantile()behavior (GH 51424)Deprecated behavior of
testing.assert_series_equal()andtesting.assert_frame_equal()considering NA-like values (e.g.NaNvsNoneas equivalent) (GH 52081)Deprecated bytes input to
read_excel(). To read a file path, use a string or path-like object (GH 53767)Deprecated constructing
SparseArrayfrom scalar data, pass a sequence instead (GH 53039)Deprecated falling back to filling when
valueis not specified inDataFrame.replace()andSeries.replace()with non-dict-liketo_replace(GH 33302)Deprecated literal json input to
read_json(). Wrap literal json string input inio.StringIOinstead (GH 53409)Deprecated literal string input to
read_xml(). Wrap literal string/bytes input inio.StringIO/io.BytesIOinstead (GH 53767)Deprecated literal string/bytes input to
read_html(). Wrap literal string/bytes input inio.StringIO/io.BytesIOinstead (GH 53767)Deprecated option
mode.use_inf_as_na, convert inf entries toNaNbefore instead (GH 51684)Deprecated parameter
objinDataFrameGroupBy.get_group()(GH 53545)Deprecated positional indexing on
SerieswithSeries.__getitem__()andSeries.__setitem__(), in a future versionser[item]will always interpretitemas a label, not a position (GH 50617)Deprecated replacing builtin and NumPy functions in
.agg,.apply, and.transform; use the corresponding string alias (e.g."sum"forsumornp.sum) instead (GH 53425)Deprecated strings
T,t,Landldenoting units into_timedelta()(GH 52536)Deprecated the “method” and “limit” keywords in
.ExtensionArray.fillna, implement_pad_or_backfillinstead (GH 53621)Deprecated the
methodandlimitkeywords inDataFrame.replace()andSeries.replace()(GH 33302)Deprecated the
methodandlimitkeywords onSeries.fillna(),DataFrame.fillna(),SeriesGroupBy.fillna(),DataFrameGroupBy.fillna(), andResampler.fillna(), useobj.bfill()orobj.ffill()instead (GH 53394)Deprecated the behavior of
Series.__getitem__(),Series.__setitem__(),DataFrame.__getitem__(),DataFrame.__setitem__()with an integer slice on objects with a floating-dtype index, in a future version this will be treated as positional indexing (GH 49612)Deprecated the use of non-supported datetime64 and timedelta64 resolutions with
pandas.array(). Supported resolutions are: “s”, “ms”, “us”, “ns” resolutions (GH 53058)Deprecated values
"pad","ffill","bfill","backfill"forSeries.interpolate()andDataFrame.interpolate(), useobj.ffill()orobj.bfill()instead (GH 53581)Deprecated the behavior of
Index.argmax(),Index.argmin(),Series.argmax(),Series.argmin()with either all-NAs andskipna=Trueor any-NAs andskipna=Falsereturning -1; in a future version this will raiseValueError(GH 33941, GH 33942)Deprecated allowing non-keyword arguments in
DataFrame.to_sql()exceptnameandcon(GH 54229)Deprecated silently ignoring
fill_valuewhen passing bothfreqandfill_valuetoDataFrame.shift(),Series.shift()andDataFrameGroupBy.shift(); in a future version this will raiseValueError(GH 53832)
Performance improvements#
Performance improvement in
concat()with homogeneousnp.float64ornp.float32dtypes (GH 52685)Performance improvement in
factorize()for object columns not containing strings (GH 51921)Performance improvement in
read_orc()when reading a remote URI file path (GH 51609)Performance improvement in
read_parquet()andDataFrame.to_parquet()when reading a remote file withengine="pyarrow"(GH 51609)Performance improvement in
read_parquet()on string columns when usinguse_nullable_dtypes=True(GH 47345)Performance improvement in
DataFrame.clip()andSeries.clip()(GH 51472)Performance improvement in
DataFrame.filter()whenitemsis given (GH 52941)Performance improvement in
DataFrame.first_valid_index()andDataFrame.last_valid_index()for extension array dtypes (GH 51549)Performance improvement in
DataFrame.where()whencondis backed by an extension dtype (GH 51574)Performance improvement in
MultiIndex.set_levels()andMultiIndex.set_codes()whenverify_integrity=True(GH 51873)Performance improvement in
MultiIndex.sortlevel()whenascendingis a list (GH 51612)Performance improvement in
Series.combine_first()(GH 51777)Performance improvement in
fillna()when array does not contain nulls (GH 51635)Performance improvement in
isna()when array has zero nulls or is all nulls (GH 51630)Performance improvement when parsing strings to
boolean[pyarrow]dtype (GH 51730)Performance improvement when searching an
Indexsliced from other indexes (GH 51738)Period’s default formatter (period_format) is now significantly (~twice) faster. This improves performance ofstr(Period),repr(Period), andPeriod.strftime(fmt=None)(), as well as.PeriodArray.strftime(fmt=None),.PeriodIndex.strftime(fmt=None)and.PeriodIndex.format(fmt=None).to_csvoperations involvingPeriodArrayorPeriodIndexwith defaultdate_formatare also significantly accelerated (GH 51459)Performance improvement accessing
arrays.IntegerArrays.dtype&arrays.FloatingArray.dtype(GH 52998)Performance improvement for
DataFrameGroupBy/SeriesGroupByaggregations (e.g.DataFrameGroupBy.sum()) withengine="numba"(GH 53731)Performance improvement in
DataFramereductions withaxis=1and extension dtypes (GH 54341)Performance improvement in
DataFramereductions withaxis=Noneand extension dtypes (GH 54308)Performance improvement in
MultiIndexand multi-column operations (e.g.DataFrame.sort_values(),DataFrame.groupby(),Series.unstack()) when index/column values are already sorted (GH 53806)Performance improvement in
concat()whenaxis=1and objects have different indexes (GH 52541)Performance improvement in
concat()when the concatenation axis is aMultiIndex(GH 53574)Performance improvement in
merge()for PyArrow backed strings (GH 54443)Performance improvement in
read_csv()withengine="c"(GH 52632)Performance improvement in
ArrowExtensionArray.to_numpy()(GH 52525)Performance improvement in
DataFrameGroupBy.groups()(GH 53088)Performance improvement in
DataFrame.astype()whendtypeis an extension dtype (GH 54299)Performance improvement in
DataFrame.iloc()when input is an single integer and dataframe is backed by extension dtypes (GH 54508)Performance improvement in
DataFrame.isin()for extension dtypes (GH 53514)Performance improvement in
DataFrame.loc()when selecting rows and columns (GH 53014)Performance improvement in
DataFrame.transpose()when transposing a DataFrame with a single PyArrow dtype (GH 54224)Performance improvement in
DataFrame.transpose()when transposing a DataFrame with a single masked dtype, e.g.Int64(GH 52836)Performance improvement in
Series.add()for PyArrow string and binary dtypes (GH 53150)Performance improvement in
Series.corr()andSeries.cov()for extension dtypes (GH 52502)Performance improvement in
Series.drop_duplicates()forArrowDtype(GH 54667).Performance improvement in
Series.ffill(),Series.bfill(),DataFrame.ffill(),DataFrame.bfill()with PyArrow dtypes (GH 53950)Performance improvement in
Series.str.get_dummies()for PyArrow-backed strings (GH 53655)Performance improvement in
Series.str.get()for PyArrow-backed strings (GH 53152)Performance improvement in
Series.str.split()withexpand=Truefor PyArrow-backed strings (GH 53585)Performance improvement in
Series.to_numpy()when dtype is a NumPy float dtype andna_valueisnp.nan(GH 52430)Performance improvement in
astype()when converting from a PyArrow timestamp or duration dtype to NumPy (GH 53326)Performance improvement in various
MultiIndexset and indexing operations (GH 53955)Performance improvement when doing various reshaping operations on
arrays.IntegerArray&arrays.FloatingArrayby avoiding doing unnecessary validation (GH 53013)Performance improvement when indexing with PyArrow timestamp and duration dtypes (GH 53368)
Performance improvement when passing an array to
RangeIndex.take(),DataFrame.loc(), orDataFrame.iloc()and the DataFrame is using a RangeIndex (GH 53387)
Bug fixes#
Categorical#
Bug in
CategoricalIndex.remove_categories()where ordered categories would not be maintained (GH 53935).Bug in
Series.astype()withdtype="category"for nullable arrays with read-only null value masks (GH 53658)Bug in
Series.map(), where the value of thena_actionparameter was not used if the series held aCategorical(GH 22527).
Datetimelike#
DatetimeIndex.map()withna_action="ignore"now works as expected (GH 51644)DatetimeIndex.slice_indexer()now raisesKeyErrorfor non-monotonic indexes if either of the slice bounds is not in the index; this behaviour was previously deprecated but inconsistently handled (GH 53983)Bug in
DateOffsetwhich had inconsistent behavior when multiplying aDateOffsetobject by a constant (GH 47953)Bug in
date_range()whenfreqwas aDateOffsetwithnanoseconds(GH 46877)Bug in
to_datetime()convertingSeriesorDataFramecontainingarrays.ArrowExtensionArrayof PyArrow timestamps to numpy datetimes (GH 52545)Bug in
DatetimeArray.map()andDatetimeIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)Bug in
DataFrame.to_sql()raisingValueErrorfor PyArrow-backed date like dtypes (GH 53854)Bug in
Timestamp.date(),Timestamp.isocalendar(),Timestamp.timetuple(), andTimestamp.toordinal()were returning incorrect results for inputs outside those supported by the Python standard library’s datetime module (GH 53668)Bug in
Timestamp.round()with values close to the implementation bounds returning incorrect results instead of raisingOutOfBoundsDatetime(GH 51494)Bug in constructing a
SeriesorDataFramefrom a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (GH 52212)Bug in constructing a
Timestampfrom a string representing a time without a date inferring an incorrect unit (GH 54097)Bug in constructing a
Timestampwithts_input=pd.NAraisingTypeError(GH 45481)Bug in parsing datetime strings with weekday but no day e.g. “2023 Sept Thu” incorrectly raising
AttributeErrorinstead ofValueError(GH 52659)Bug in the repr for
Serieswhen dtype is a timezone aware datetime with non-nanosecond resolution raisingOutOfBoundsDatetime(GH 54623)
Timedelta#
Bug in
TimedeltaIndexdivision or multiplication leading to.freqof “0 Days” instead ofNone(GH 51575)Bug in
Timedeltawith NumPytimedelta64objects not properly raisingValueError(GH 52806)Bug in
to_timedelta()convertingSeriesorDataFramecontainingArrowDtypeofpyarrow.durationto NumPytimedelta64(GH 54298)Bug in
Timedelta.__hash__(), raising anOutOfBoundsTimedeltaon certain large values of second resolution (GH 54037)Bug in
Timedelta.round()with values close to the implementation bounds returning incorrect results instead of raisingOutOfBoundsTimedelta(GH 51494)Bug in
TimedeltaIndex.map()withna_action="ignore"(GH 51644)Bug in
arrays.TimedeltaArray.map()andTimedeltaIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)
Timezones#
Bug in
infer_freq()that raisesTypeErrorforSeriesof timezone-aware timestamps (GH 52456)Bug in
DatetimeTZDtype.base()that always returns a NumPy dtype with nanosecond resolution (GH 52705)
Numeric#
Bug in
RangeIndexsettingstepincorrectly when being the subtrahend with minuend a numeric value (GH 53255)Bug in
Series.corr()andSeries.cov()raisingAttributeErrorfor masked dtypes (GH 51422)Bug when calling
Series.kurt()andSeries.skew()on NumPy data of all zero returning a Python type instead of a NumPy type (GH 53482)Bug in
Series.mean(),DataFrame.mean()with object-dtype values containing strings that can be converted to numbers (e.g. “2”) returning incorrect numeric results; these now raiseTypeError(GH 36703, GH 44008)Bug in
DataFrame.corrwith()raisingNotImplementedErrorfor PyArrow-backed dtypes (GH 52314)Bug in
DataFrame.size()andSeries.size()returning 64-bit integer instead of a Python int (GH 52897)Bug in
DateFrame.dot()returningobjectdtype forArrowDtypedata (GH 53979)Bug in
Series.any(),Series.all(),DataFrame.any(), andDataFrame.all()had the default value ofbool_onlyset toNoneinstead ofFalse; this change should have no impact on users (GH 53258)Bug in
Series.corr()andSeries.cov()raisingAttributeErrorfor masked dtypes (GH 51422)Bug in
Series.median()andDataFrame.median()with object-dtype values containing strings that can be converted to numbers (e.g. “2”) returning incorrect numeric results; these now raiseTypeError(GH 34671)Bug in
Series.sum()converting dtypeuint64toint64(GH 53401)
Conversion#
Bug in
DataFrame.style.to_latex()andDataFrame.style.to_html()if the DataFrame contains integers with more digits than can be represented by floating point double precision (GH 52272)Bug in
array()when given adatetime64ortimedelta64dtype with unit of “s”, “us”, or “ms” returningNumpyExtensionArrayinstead ofDatetimeArrayorTimedeltaArray(GH 52859)Bug in
array()when given an empty list and no dtype returningNumpyExtensionArrayinstead ofFloatingArray(GH 54371)Bug in
ArrowDtype.numpy_dtype()returning nanosecond units for non-nanosecondpyarrow.timestampandpyarrow.durationtypes (GH 51800)Bug in
DataFrame.__repr__()incorrectly raising aTypeErrorwhen the dtype of a column isnp.record(GH 48526)Bug in
DataFrame.info()raisingValueErrorwhenuse_numbais set (GH 51922)Bug in
DataFrame.insert()raisingTypeErroriflocisnp.int64(GH 53193)Bug in
HDFStore.select()loses precision of large int when stored and retrieved (GH 54186)Bug in
Series.astype()not supportingobject_(GH 54251)
Strings#
Bug in
Series.str()that did not raise aTypeErrorwhen iterated (GH 54173)Bug in
reprforDataFrame`with string-dtype columns (GH 54797)
Interval#
IntervalIndex.get_indexer()andIntervalIndex.get_indexer_nonunique()raising iftargetis read-only array (GH 53703)Bug in
IntervalDtypewhere the object could be kept alive when deleted (GH 54184)Bug in
interval_range()where a floatstepwould produce incorrect intervals from floating point artifacts (GH 54477)
Indexing#
Bug in
DataFrame.__setitem__()losing dtype when setting aDataFrameinto duplicated columns (GH 53143)Bug in
DataFrame.__setitem__()with a boolean mask andDataFrame.putmask()with mixed non-numeric dtypes and a value other thanNaNincorrectly raisingTypeError(GH 53291)Bug in
DataFrame.iloc()when usingnanas the only element (GH 52234)Bug in
Series.loc()castingSeriestonp.dnarraywhen assigningSeriesat predefined index ofobjectdtypeSeries(GH 48933)
Missing#
Bug in
DataFrame.interpolate()failing to fill across data whenmethodis"pad","ffill","bfill", or"backfill"(GH 53898)Bug in
DataFrame.interpolate()ignoringinplacewhenDataFrameis empty (GH 53199)Bug in
Series.idxmin(),Series.idxmax(),DataFrame.idxmin(),DataFrame.idxmax()with aDatetimeIndexindex containingNaTincorrectly returningNaNinstead ofNaT(GH 43587)Bug in
Series.interpolate()andDataFrame.interpolate()failing to raise on invaliddowncastkeyword, which can be onlyNoneor"infer"(GH 53103)Bug in
Series.interpolate()andDataFrame.interpolate()with complex dtype incorrectly failing to fillNaNentries (GH 53635)
MultiIndex#
Bug in
MultiIndex.set_levels()not preserving dtypes forCategorical(GH 52125)Bug in displaying a
MultiIndexwith a long element (GH 52960)
I/O#
DataFrame.to_orc()now raisingValueErrorwhen non-defaultIndexis given (GH 51828)DataFrame.to_sql()now raisingValueErrorwhen the name param is left empty while using SQLAlchemy to connect (GH 52675)Bug in
json_normalize()could not parse metadata fields list type (GH 37782)Bug in
read_csv()where it would error whenparse_dateswas set to a list or dictionary withengine="pyarrow"(GH 47961)Bug in
read_csv()withengine="pyarrow"raising when specifying adtypewithindex_col(GH 53229)Bug in
read_hdf()not properly closing store after anIndexErroris raised (GH 52781)Bug in
read_html()where style elements were read into DataFrames (GH 52197)Bug in
read_html()where tail texts were removed together with elements containingdisplay:nonestyle (GH 51629)Bug in
read_sql_table()raising an exception when reading a view (GH 52969)Bug in
read_sql()when reading multiple timezone aware columns with the same column name (GH 44421)Bug in
read_xml()stripping whitespace in string data (GH 53811)Bug in
DataFrame.to_html()wherecolspacewas incorrectly applied in case of multi index columns (GH 53885)Bug in
DataFrame.to_html()where conversion for an emptyDataFramewith complex dtype raised aValueError(GH 54167)Bug in
DataFrame.to_json()whereDateTimeArray/DateTimeIndexwith non nanosecond precision could not be serialized correctly (GH 53686)Bug when writing and reading empty Stata dta files where dtype information was lost (GH 46240)
Bug where
bz2was treated as a hard requirement (GH 53857)
Period#
Bug in
PeriodDtypeconstructor failing to raiseTypeErrorwhen no argument is passed or whenNoneis passed (GH 27388)Bug in
PeriodDtypeconstructor incorrectly returning the samenormalizefor differentDateOffsetfreqinputs (GH 24121)Bug in
PeriodDtypeconstructor raisingValueErrorinstead ofTypeErrorwhen an invalid type is passed (GH 51790)Bug in
PeriodDtypewhere the object could be kept alive when deleted (GH 54184)Bug in
read_csv()not processing empty strings as a null value, withengine="pyarrow"(GH 52087)Bug in
read_csv()returningobjectdtype columns instead offloat64dtype columns withengine="pyarrow"for columns that are all null withengine="pyarrow"(GH 52087)Bug in
Period.now()not accepting thefreqparameter as a keyword argument (GH 53369)Bug in
PeriodIndex.map()withna_action="ignore"(GH 51644)Bug in
arrays.PeriodArray.map()andPeriodIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)Bug in incorrectly allowing construction of
PeriodorPeriodDtypewithCustomBusinessDayfreq; useBusinessDayinstead (GH 52534)
Plotting#
Bug in
Series.plot()when invoked withcolor=None(GH 51953)Fixed UserWarning in
DataFrame.plot.scatter()when invoked withc="b"(GH 53908)
Groupby/resample/rolling#
Bug in
DataFrameGroupBy.idxmin(),SeriesGroupBy.idxmin(),DataFrameGroupBy.idxmax(),SeriesGroupBy.idxmax()returns wrong dtype when used on an empty DataFrameGroupBy or SeriesGroupBy (GH 51423)Bug in
DataFrame.groupby.rank()on nullable datatypes when passingna_option="bottom"orna_option="top"(GH 54206)Bug in
DataFrame.resample()andSeries.resample()in incorrectly allowing non-fixedfreqwhen resampling on aTimedeltaIndex(GH 51896)Bug in
DataFrame.resample()andSeries.resample()losing time zone when resampling empty data (GH 53664)Bug in
DataFrame.resample()andSeries.resample()whereoriginhas no effect in resample when values are outside of axis (GH 53662)Bug in weighted rolling aggregations when specifying
min_periods=0(GH 51449)Bug in
DataFrame.groupby()andSeries.groupby()where, when the index of the groupedSeriesorDataFramewas aDatetimeIndex,TimedeltaIndexorPeriodIndex, and thegroupbymethod was given a function as its first argument, the function operated on the whole index rather than each element of the index (GH 51979)Bug in
DataFrameGroupBy.agg()with lists not respectingas_index=False(GH 52849)Bug in
DataFrameGroupBy.apply()causing an error to be raised when the inputDataFramewas subset as aDataFrameafter groupby ([['a']]and not['a']) and the given callable returnedSeriesthat were not all indexed the same (GH 52444)Bug in
DataFrameGroupBy.apply()raising aTypeErrorwhen selecting multiple columns and providing a function that returnsnp.ndarrayresults (GH 18930)Bug in
DataFrameGroupBy.groups()andSeriesGroupBy.groups()with a datetime key in conjunction with another key produced an incorrect number of group keys (GH 51158)Bug in
DataFrameGroupBy.quantile()andSeriesGroupBy.quantile()may implicitly sort the result index withsort=False(GH 53009)Bug in
SeriesGroupBy.size()where the dtype would benp.int64for data withArrowDtypeor masked dtypes (e.g.Int64) (GH 53831)Bug in
DataFrame.groupby()with column selection on the resulting groupby object not returning names as tuples when grouping by a list consisting of a single element (GH 53500)Bug in
DataFrameGroupBy.var()andSeriesGroupBy.var()failing to raiseTypeErrorwhen called with datetime64, timedelta64 orPeriodDtypevalues (GH 52128, GH 53045)Bug in
DataFrameGroupBy.resample()withkind="period"raisingAttributeError(GH 24103)Bug in
Resampler.ohlc()with empty object returning aSeriesinstead of emptyDataFrame(GH 42902)Bug in
SeriesGroupBy.count()andDataFrameGroupBy.count()where the dtype would benp.int64for data withArrowDtypeor masked dtypes (e.g.Int64) (GH 53831)Bug in
SeriesGroupBy.nth()andDataFrameGroupBy.nth()after performing column selection when usingdropna="any"ordropna="all"would not subset columns (GH 53518)Bug in
SeriesGroupBy.nth()andDataFrameGroupBy.nth()raised after performing column selection when usingdropna="any"ordropna="all"resulted in rows being dropped (GH 53518)Bug in
SeriesGroupBy.sum()andDataFrameGroupBy.sum()summingnp.inf + np.infand(-np.inf) + (-np.inf)tonp.naninstead ofnp.infand-np.infrespectively (GH 53606)Bug in
Series.groupby()raising an error when groupedSerieshas aDatetimeIndexindex and aSerieswith a name that is a month is given to thebyargument (GH 48509)
Reshaping#
Bug in
concat()coercing toobjectdtype when one column haspa.null()dtype (GH 53702)Bug in
crosstab()whendropna=Falsewould not keepnp.nanin the result (GH 10772)Bug in
melt()where thevariablecolumn would lose extension dtypes (GH 54297)Bug in
merge_asof()raisingKeyErrorfor extension dtypes (GH 52904)Bug in
merge_asof()raisingValueErrorfor data backed by read-only ndarrays (GH 53513)Bug in
merge_asof()withleft_index=Trueorright_index=Truewith mismatched index dtypes giving incorrect results in some cases instead of raisingMergeError(GH 53870)Bug in
merge()when merging on integerExtensionDtypeand float NumPy dtype raisingTypeError(GH 46178)Bug in
DataFrame.agg()andSeries.agg()on non-unique columns would return incorrect type when dist-like argument passed in (GH 51099)Bug in
DataFrame.combine_first()ignoring other’s columns ifotheris empty (GH 53792)Bug in
DataFrame.idxmin()andDataFrame.idxmax(), where the axis dtype would be lost for empty frames (GH 53265)Bug in
DataFrame.merge()not merging correctly when havingMultiIndexwith single level (GH 52331)Bug in
DataFrame.stack()losing extension dtypes when columns is aMultiIndexand frame contains mixed dtypes (GH 45740)Bug in
DataFrame.stack()sorting columns lexicographically (GH 53786)Bug in
DataFrame.transpose()inferring dtype for object column (GH 51546)Bug in
Series.combine_first()convertingint64dtype tofloat64and losing precision on very large integers (GH 51764)Bug when joining empty
DataFrameobjects, where the joined index would be aRangeIndexinstead of the joined index type (GH 52777)
Sparse#
Bug in
SparseDtypeconstructor failing to raiseTypeErrorwhen given an incompatibledtypefor its subtype, which must be a NumPy dtype (GH 53160)Bug in
arrays.SparseArray.map()allowed the fill value to be included in the sparse values (GH 52095)
ExtensionArray#
Bug in
ArrowStringArrayconstructor raisesValueErrorwith dictionary types of strings (GH 54074)Bug in
DataFrameconstructor not copyingSerieswith extension dtype when given in dict (GH 53744)Bug in
ArrowExtensionArrayconverting pandas non-nanosecond temporal objects from non-zero values to zero values (GH 53171)Bug in
Series.quantile()for PyArrow temporal types raisingArrowInvalid(GH 52678)Bug in
Series.rank()returning wrong order for small values withFloat64dtype (GH 52471)Bug in
Series.unique()for booleanArrowDtypewithNAvalues (GH 54667)Bug in
__iter__()and__getitem__()returning python datetime and timedelta objects for non-nano dtypes (GH 53326)Bug in
factorize()returning incorrect uniques for apyarrow.dictionarytypepyarrow.chunked_arraywith more than one chunk (GH 54844)Bug when passing an
ExtensionArraysubclass todtypekeywords. This will now raise aUserWarningto encourage passing an instance instead (GH 31356, GH 54592)Bug where the
DataFramerepr would not work when a column had anArrowDtypewith apyarrow.ExtensionDtype(GH 54063)Bug where the
__from_arrow__method of masked ExtensionDtypes (e.g.Float64Dtype,BooleanDtype) would not accept PyArrow arrays of typepyarrow.null()(GH 52223)
Styler#
Metadata#
Fixed metadata propagation in
DataFrame.max(),DataFrame.min(),DataFrame.prod(),DataFrame.mean(),Series.mode(),DataFrame.median(),DataFrame.sem(),DataFrame.skew(),DataFrame.kurt()(GH 28283)Fixed metadata propagation in
DataFrame.squeeze(), andDataFrame.describe()(GH 28283)Fixed metadata propagation in
DataFrame.std()(GH 28283)
Other#
Bug in
FloatingArray.__contains__withNaNitem incorrectly returningFalsewhenNaNvalues are present (GH 52840)Bug in
DataFrameandSeriesraising for data of complex dtype whenNaNvalues are present (GH 53627)Bug in
DatetimeIndexwherereprof index passed with time does not print time is midnight and non-day based freq(GH 53470)Bug in
testing.assert_frame_equal()andtesting.assert_series_equal()now throw assertion error for two unequal sets (GH 51727)Bug in
testing.assert_frame_equal()checks category dtypes even when asked not to check index type (GH 52126)Bug in
api.interchange.from_dataframe()was not respectingallow_copyargument (GH 54322)Bug in
api.interchange.from_dataframe()was raising during interchanging from non-pandas tz-aware data containing null values (GH 54287)Bug in
api.interchange.from_dataframe()when converting an empty DataFrame object (GH 53155)Bug in
from_dummies()where the resultingIndexdid not match the originalIndex(GH 54300)Bug in
from_dummies()where the resulting data would always beobjectdtype instead of the dtype of the columns (GH 54300)Bug in
DataFrameGroupBy.first(),DataFrameGroupBy.last(),SeriesGroupBy.first(), andSeriesGroupBy.last()where an empty group would returnnp.naninstead of the correspondingExtensionArrayNA value (GH 39098)Bug in
DataFrame.pivot_table()with casting the mean of ints back to an int (GH 16676)Bug in
DataFrame.reindex()with afill_valuethat should be inferred with aExtensionDtypeincorrectly inferringobjectdtype (GH 52586)Bug in
DataFrame.shift()withaxis=1on aDataFramewith a singleExtensionDtypecolumn giving incorrect results (GH 53832)Bug in
Index.sort_values()when akeyis passed (GH 52764)Bug in
Series.align(),DataFrame.align(),Series.reindex(),DataFrame.reindex(),Series.interpolate(),DataFrame.interpolate(), incorrectly failing to raise with method=”asfreq” (GH 53620)Bug in
Series.argsort()failing to raise when an invalidaxisis passed (GH 54257)Bug in
Series.map()when giving a callable to an empty series, the returned series hadobjectdtype. It now keeps the original dtype (GH 52384)Bug in
Series.memory_usage()whendeep=Truethrow an error with Series of objects and the returned value is incorrect, as it does not take into account GC corrections (GH 51858)Bug in
period_range()the default behavior when freq was not passed as an argument was incorrect(GH 53687)Fixed incorrect
__name__attribute ofpandas._libs.json(GH 52898)
Contributors#
A total of 266 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
AG +
Aarni Koskela
Adrian D’Alessandro +
Adrien RUAULT +
Ahmad +
Aidos Kanapyanov +
Alex Malins
Alexander Seiler +
Ali Asgar +
Allison Kwan
Amanda Bizzinotto +
Andres Algaba +
Angela Seo +
Anirudh Hegde +
Antony Evmorfopoulos +
Anushka Bishnoi
ArnaudChanoine +
Artem Vorobyev +
Arya Sarkar +
Ashwin Srinath
Austin Au-Yeung +
Austin Burnett +
Bear +
Ben Mangold +
Bernardo Gameiro +
Boyd Kane +
Brayan Alexander Muñoz B +
Brock
Chetan0402 +
Chris Carini
ChristofKaufmann
Clark-W +
Conrad Mcgee Stocks
Corrie Bartelheimer +
Coulton Theuer +
D067751 +
Daniel Isaac
Daniele Nicolodi +
David Samuel +
David Seifert +
Dea Leon +
Dea María Léon
Deepyaman Datta
Denis Sapozhnikov +
Dharani Akurathi +
DimiGrammatikakis +
Dirk Ulbricht +
Dmitry Shemetov +
Dominik Berger
Efkan S. Goktepe +
Ege Özgüroğlu
Eli Schwartz
Erdi +
Fabrizio Primerano +
Facundo Batista +
Fangchen Li
Felipe Maion +
Francis +
Future Programmer +
Gabriel Kabbe +
Gaétan Ramet +
Gianluca Ficarelli
Godwill Agbehonou +
Guillaume Lemaitre
Guo Ci
Gustavo Vargas +
Hamidreza Sanaee +
HappyHorse +
Harald Husum +
Hugo van Kemenade
Ido Ronen +
Irv Lustig
JHM Darbyshire
JHM Darbyshire (iMac)
JJ +
Jarrod Millman
Jay +
Jeff Reback
Jessica Greene +
Jiawei Zhang +
Jinli Xiao +
Joanna Ge +
Jona Sassenhagen +
Jonas Haag
Joris Van den Bossche
Joshua Shew +
Julian Badillo
Julian Ortiz +
Julien Palard +
Justin Tyson +
Justus Magin
Kabiir Krishna +
Kang Su Min
Ketu Patel +
Kevin +
Kevin Anderson
Kevin Jan Anker
Kevin Klein +
Kevin Sheppard
Kostya Farber
LM +
Lars Lien Ankile +
Lawrence Mitchell
Liwei Cai +
Loic Diridollou
Luciana Solorzano +
Luke Manley
Lumberbot (aka Jack)
Marat Kopytjuk +
Marc Garcia
Marco Edward Gorelli
MarcoGorelli
Maria Telenczuk +
MarvinGravert +
Mateusz Sokół +
Matt Richards
Matthew Barber +
Matthew Roeschke
Matus Valo +
Mia Reimer +
Michael Terry +
Michael Tiemann +
Milad Maani Jou +
Miles Cranmer +
MirijaH +
Miyuu +
Natalia Mokeeva
Nathan Goldbaum +
Nicklaus Roach +
Nicolas Camenisch +
Nikolay Boev +
Nirav
Nishu Choudhary
Noa Tamir
Noy Hanan +
Numan +
Numan Ijaz +
Omar Elbaz +
Pandas Development Team
Parfait Gasana
Parthi
Patrick Hoefler
Patrick Schleiter +
Pawel Kranzberg +
Philip
Philip Meier +
Pranav Saibhushan Ravuri
PrathumP +
Rahul Siloniya +
Rajasvi Vinayak +
Rajat Subhra Mukherjee +
Ralf Gommers
RaphSku
Rebecca Chen +
Renato Cotrim Maciel +
Reza (Milad) Maanijou +
Richard Shadrach
Rithik Reddy +
Robert Luce +
Ronalido +
Rylie Wei +
SOUMYADIP MAL +
Sanjith Chockan +
Sayed Qaiser Ali +
Scott Harp +
Se +
Shashwat Agrawal
Simar Bassi +
Simon Brugman +
Simon Hawkins
Simon Høxbro Hansen
Snorf Yang +
Sortofamudkip +
Stefan Krawczyk
Stefanie Molin
Stefanie Senger
Stelios Petrakis +
Stijn Van Hoey
Sven
Sylvain MARIE
Sylvain Marié
Terji Petersen
Thierry Moisan
Thomas
Thomas A Caswell
Thomas Grainger
Thomas Li
Thomas Vranken +
Tianye Song +
Tim Hoffmann
Tim Loderhose +
Tim Swast
Timon Jurschitsch +
Tolker-KU +
Tomas Pavlik +
Toroi +
Torsten Wörtwein
Travis Gibbs +
Umberto Fasci +
Valerii +
VanMyHu +
Victor Momodu +
Vijay Vaidyanathan +
VomV +
William Andrea
William Ayd
Wolf Behrenhoff +
Xiao Yuan
Yao Xiao
Yasin Tatar
Yaxin Li +
Yi Wei +
Yulia +
Yusharth Singh +
Zach Breger +
Zhengbo Wang
abokey1 +
ahmad2901 +
assafam +
auderson
august-tengland +
bunardsheng +
cmmck +
cnguyen-03 +
coco +
dependabot[bot]
giplessis +
github-actions[bot]
gmaiwald +
gmollard +
jbrockmendel
kathleenhang
kevx82 +
lia2710 +
liang3zy22 +
ltartaro +
lusolorz +
m-ganko +
mKlepsch +
mattkeanny +
mrastgoo +
nabdoni +
omar-elbaz +
paulreece +
penelopeysm +
potap75 +
pre-commit-ci[bot] +
raanasn +
raj-thapa +
ramvikrams +
rebecca-palmer
reddyrg1 +
rmhowe425 +
segatrade +
shteken +
sweisss +
taytzehao
tntmatthews +
tpaxman +
tzehaoo +
v-mcoutinho +
wcgonzal +
yonashub
yusharth +
Ádám Lippai
Štěpán Műller +