What’s new in 2.1.0 (Month XX, 2023)#
These are the changes in pandas 2.1.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
enhancement1#
map(func, na_action="ignore")
now works for all array types#
When given a callable, Series.map()
applies the callable to all elements of the Series
.
Similarly, DataFrame.map()
applies the callable to all elements of the DataFrame
,
while Index.map()
applies the callable to all elements of the Index
.
Frequently, it is not desirable to apply the callable to nan-like values of the array and to avoid doing
that, the map
method could be called with na_action="ignore"
, i.e. ser.map(func, na_action="ignore")
.
However, na_action="ignore"
was not implemented for many ExtensionArray
and Index
types
and na_action="ignore"
did not work correctly for any ExtensionArray
subclass except the nullable numeric ones (i.e. with dtype Int64
etc.).
na_action="ignore"
now works for all array types (GH 52219, GH 51645, GH 51809, GH 51936, GH 52033; GH 52096).
Previous behavior:
In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [2]: ser.map(str.upper, na_action="ignore")
NotImplementedError
In [3]: df = pd.DataFrame(ser)
In [4]: df.applymap(str.upper, na_action="ignore") # worked for DataFrame
0
0 A
1 B
2 NaN
In [5]: idx = pd.Index(ser)
In [6]: idx.map(str.upper, na_action="ignore")
TypeError: CategoricalIndex.map() got an unexpected keyword argument 'na_action'
New behavior:
In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [2]: ser.map(str.upper, na_action="ignore")
Out[2]:
0 A
1 B
2 NaN
dtype: category
Categories (2, object): ['A', 'B']
In [3]: df = pd.DataFrame(ser)
In [4]: df.map(str.upper, na_action="ignore")
Out[4]:
0
0 A
1 B
2 NaN
In [5]: idx = pd.Index(ser)
In [6]: idx.map(str.upper, na_action="ignore")
Out[6]: CategoricalIndex(['A', 'B', nan], categories=['A', 'B'], ordered=False, dtype='category')
Notice also that in this version, DataFrame.map()
been added and DataFrame.applymap()
has been deprecated. DataFrame.map()
has the same functionality as DataFrame.applymap()
, but the new name better communicate that this is the DataFrame
version of Series.map()
(GH 52353).
Also, note that Categorical.map()
implicitly has had its na_action
set to "ignore"
by default.
This has been deprecated and will Categorical.map()
in the future change the default
to na_action=None
, like for all the other array types.
Other enhancements#
Categorical.map()
andCategoricalIndex.map()
now have ana_action
parameter.Categorical.map()
implicitly had a default value of"ignore"
forna_action
. This has formally been deprecated and will be changed toNone
in the future. Also notice thatSeries.map()
has defaultna_action=None
and calls to series with categorical data will now usena_action=None
unless explicitly set otherwise (GH 44279)api.extensions.ExtensionArray
now has amap()
method (GH 51809)DataFrame.applymap()
now uses themap()
method of underlyingapi.extensions.ExtensionArray
instances (GH 52219)MultiIndex.sort_values()
now supportsna_position
(GH 51612)MultiIndex.sortlevel()
andIndex.sortlevel()
gained a new keywordna_position
(GH 51612)arrays.DatetimeArray.map()
,arrays.TimedeltaArray.map()
andarrays.PeriodArray.map()
can now take ana_action
argument (GH 51644)arrays.SparseArray.map()
now supportsna_action
(GH 52096).Add dtype of categories to
repr
information ofCategoricalDtype
(GH 52179)Added to the escape mode “latex-math” preserving without escaping all characters between “(” and “)” in formatter (GH 51903)
Adding
engine_kwargs
parameter toDataFrame.read_excel()
(GH 52214)Classes that are useful for type-hinting have been added to the public API in the new submodule
pandas.api.typing
(GH 48577)Implemented
Series.dt.is_month_start
,Series.dt.is_month_end
,Series.dt.is_year_start
,Series.dt.is_year_end
,Series.dt.is_quarter_start
,Series.dt.is_quarter_end
,Series.dt.is_days_in_month
,Series.dt.unit
,Series.dt.is_normalize()
,Series.dt.day_name()
,Series.dt.month_name()
,Series.dt.tz_convert()
forArrowDtype
withpyarrow.timestamp
(GH 52388, GH 51718)Implemented
__from_arrow__
onDatetimeTZDtype
. (GH 52201)Implemented
__pandas_priority__
to allow custom types to take precedence overDataFrame
,Series
,Index
, orExtensionArray
for arithmetic operations, see the developer guide (GH 48347)Improve error message when having incompatible columns using
DataFrame.merge()
(GH 51861)Improve error message when setting
DataFrame
with wrong number of columns throughDataFrame.isetitem()
(GH 51701)Improved error handling when using
DataFrame.to_json()
with incompatibleindex
andorient
arguments (GH 52143)Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns. (GH 52084)
Let
DataFrame.to_feather()
accept a non-defaultIndex
and non-string column names (GH 51787)Performance improvement in
read_csv()
(GH 52632) withengine="c"
Categorical.from_codes()
has gotten avalidate
parameter (GH 50975)DataFrame.stack()
gained thesort
keyword to dictate whether the resultingMultiIndex
levels are sorted (GH 15105)DataFrame.unstack()
gained thesort
keyword to dictate whether the resultingMultiIndex
levels are sorted (GH 15105)DataFrameGroupby.agg()
andDataFrameGroupby.transform()
now support grouping by multiple keys when the index is not aMultiIndex
forengine="numba"
(GH 53486)SeriesGroupby.agg()
andDataFrameGroupby.agg()
now support passing in multiple functions forengine="numba"
(GH 53486)Added
engine_kwargs
parameter toDataFrame.to_excel()
(GH 53220)Added a new parameter
by_row
toSeries.apply()
. When set toFalse
the supplied callables will always operate on the whole Series (GH 53400).Many read/to_* functions, such as
DataFrame.to_pickle()
andread_csv()
, support forwarding compression arguments to lzma.LZMAFile (GH 52979)Performance improvement in
concat()
with homogeneousnp.float64
ornp.float32
dtypes (GH 52685)Performance improvement in
DataFrame.filter()
whenitems
is given (GH 52941)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
notable_bug_fix1#
notable_bug_fix2#
Backwards incompatible API changes#
Increased minimum version for Python#
pandas 2.1.0 supports Python 3.9 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
---|---|---|---|
numpy |
1.21.6 |
X |
X |
mypy (dev) |
1.2 |
X |
|
beautifulsoup4 |
4.11.1 |
X |
|
bottleneck |
1.3.4 |
X |
|
fastparquet |
0.8.1 |
X |
|
fsspec |
2022.05.0 |
X |
|
hypothesis |
6.46.1 |
X |
|
gcsfs |
2022.05.0 |
X |
|
jinja2 |
3.1.2 |
X |
|
lxml |
4.8.0 |
X |
|
numba |
0.55.2 |
X |
|
numexpr |
2.8.0 |
X |
|
openpyxl |
3.0.10 |
X |
|
pandas-gbq |
0.17.5 |
X |
|
psycopg2 |
2.9.3 |
X |
|
pyreadstat |
1.1.5 |
X |
|
pyqt5 |
5.15.6 |
X |
|
pytables |
3.7.0 |
X |
|
python-snappy |
0.6.1 |
X |
|
pyxlsb |
1.0.9 |
X |
|
s3fs |
2022.05.0 |
X |
|
scipy |
1.8.1 |
X |
|
sqlalchemy |
1.4.36 |
X |
|
tabulate |
0.8.10 |
X |
|
xarray |
2022.03.0 |
X |
|
xlsxwriter |
3.0.3 |
X |
|
zstandard |
0.17.0 |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
---|---|---|
X |
See Dependencies and Optional dependencies for more.
Other API changes#
Deprecations#
Deprecated ‘broadcast_axis’ keyword in
Series.align()
andDataFrame.align()
, upcast before callingalign
withleft = DataFrame({col: left for col in right.columns}, index=right.index)
(GH 51856)Deprecated ‘method’, ‘limit’, and ‘fill_axis’ keywords in
DataFrame.align()
andSeries.align()
, explicitly callfillna
on the alignment results instead (GH 51856)Deprecated ‘quantile’ keyword in
Rolling.quantile()
andExpanding.quantile()
, renamed as ‘q’ instead (GH 52550)Deprecated
DataFrameGroupBy.apply()
and methods on the objects returned byDataFrameGroupBy.resample()
operating on the grouping column(s); select the columns to operate on after groupby to either explicitly include or exclude the groupings and avoid theFutureWarning
(GH 7155)Deprecated
Groupby.all()
andGroupBy.any()
with datetime64 orPeriodDtype
values, matching theSeries
andDataFrame
deprecations (GH 34479)Deprecated
Categorical.to_list()
, useobj.tolist()
instead (GH 51254)Deprecated
DataFrame._data()
andSeries._data()
, use public APIs instead (GH 33333)Deprecated
DataFrameGroupBy.dtypes()
, checkdtypes
on the underlying object instead (GH 51045)Deprecated
axis=1
inDataFrame.ewm()
,DataFrame.rolling()
,DataFrame.expanding()
, transpose before calling the method instead (GH 51778)Deprecated
axis=1
inDataFrame.groupby()
and inGrouper
constructor, doframe.T.groupby(...)
instead (GH 51203)Deprecated accepting slices in
DataFrame.take()
, callobj[slicer]
or pass a sequence of integers instead (GH 51539)Deprecated explicit support for subclassing
Index
(GH 45289)Deprecated making functions given to
Series.agg()
attempt to operate on each element in theSeries
and only operate on the wholeSeries
if the elementwise operations failed. In the future, functions given toSeries.agg()
will always operate on the wholeSeries
only. To keep the current behavior, useSeries.transform()
instead. (GH 53325)Deprecated making the functions in a list of functions given to
DataFrame.agg()
attempt to operate on each element in theDataFrame
and only operate on the columns of theDataFrame
if the elementwise operations failed. To keep the current behavior, useDataFrame.transform()
instead. (GH 53325)Deprecated passing a
DataFrame
toDataFrame.from_records()
, useDataFrame.set_index()
orDataFrame.drop()
instead (GH 51353)Deprecated silently dropping unrecognized timezones when parsing strings to datetimes (GH 18702)
Deprecated the
axis
keyword inDataFrame.ewm()
,Series.ewm()
,DataFrame.rolling()
,Series.rolling()
,DataFrame.expanding()
,Series.expanding()
(GH 51778)Deprecated the
axis
keyword inDataFrame.resample()
,Series.resample()
(GH 51778)Deprecated the behavior of
concat()
with bothlen(keys) != len(objs)
, in a future version this will raise instead of truncating to the shorter of the two sequences (GH 43485)Deprecated the default of
observed=False
inDataFrame.groupby()
andSeries.groupby()
; this will default toTrue
in a future version (GH 43999)Deprecating pinning
group.name
to each group inSeriesGroupBy.aggregate()
aggregations; if your operation requires utilizing the groupby keys, iterate over the groupby object instead (GH 41090)Deprecated the ‘axis’ keyword in
GroupBy.idxmax()
,GroupBy.idxmin()
,GroupBy.fillna()
,GroupBy.take()
,GroupBy.skew()
,GroupBy.rank()
,GroupBy.cumprod()
,GroupBy.cumsum()
,GroupBy.cummax()
,GroupBy.cummin()
,GroupBy.pct_change()
,GroupBy.diff()
,GroupBy.shift()
, andDataFrameGroupBy.corrwith()
; foraxis=1
operate on the underlyingDataFrame
instead (GH 50405, GH 51046)Deprecated
DataFrameGroupBy
withas_index=False
not including groupings in the result when they are not columns of the DataFrame (GH 49519)Deprecated
is_categorical_dtype()
, useisinstance(obj.dtype, pd.CategoricalDtype)
instead (GH 52527)Deprecated
is_datetime64tz_dtype()
, checkisinstance(dtype, pd.DatetimeTZDtype)
instead (GH 52607)Deprecated
is_int64_dtype()
, checkdtype == np.dtype(np.int64)
instead (GH 52564)Deprecated
is_interval_dtype()
, checkisinstance(dtype, pd.IntervalDtype)
instead (GH 52607)Deprecated
is_period_dtype()
, checkisinstance(dtype, pd.PeriodDtype)
instead (GH 52642)Deprecated
is_sparse()
, checkisinstance(dtype, pd.SparseDtype)
instead (GH 52642)Deprecated
Styler.applymap_index()
. Use the newStyler.map_index()
method instead (GH 52708)Deprecated
Styler.applymap()
. Use the newStyler.map()
method instead (GH 52708)Deprecated
DataFrame.applymap()
. Use the newDataFrame.map()
method instead (GH 52353)Deprecated
DataFrame.swapaxes()
andSeries.swapaxes()
, useDataFrame.transpose()
orSeries.transpose()
instead (GH 51946)Deprecated
freq
parameter inPeriodArray
constructor, passdtype
instead (GH 52462)Deprecated allowing non-standard inputs in
take()
, pass either anumpy.ndarray
,ExtensionArray
,Index
, orSeries
(GH 52981)Deprecated allowing non-standard sequences for
isin()
,value_counts()
,unique()
,factorize()
, case to one ofnumpy.ndarray
,Index
,ExtensionArray
, orSeries
before calling (GH 52986)Deprecated behavior of
DataFrame
reductionssum
,prod
,std
,var
,sem
withaxis=None
, in a future version this will operate over both axes returning a scalar instead of behaving likeaxis=0
; note this also affects numpy functions e.g.np.sum(df)
(GH 21597)Deprecated behavior of
concat()
whenDataFrame
has columns that are all-NA, in a future version these will not be discarded when determining the resulting dtype (GH 40893)Deprecated behavior of
Series.dt.to_pydatetime()
, in a future version this will return aSeries
containing pythondatetime
objects instead of anndarray
of datetimes; this matches the behavior of otherSeries.dt()
properties (GH 20306)Deprecated logical operations (
|
,&
,^
) between pandas objects and dtype-less sequences (e.g.list
,tuple
), wrap a sequence in aSeries
or numpy array before operating instead (GH 51521)Deprecated making
Series.apply()
return aDataFrame
when the passed-in callable returns aSeries
object. In the future this will return aSeries
whose values are themselvesSeries
. This pattern was very slow and it’s recommended to use alternative methods to archive the same goal (GH 52116)Deprecated parameter
convert_type
inSeries.apply()
(GH 52140)Deprecated passing a dictionary to
SeriesGroupBy.agg()
; pass a list of aggregations instead (GH 50684)Deprecated the “fastpath” keyword in
Categorical
constructor, useCategorical.from_codes()
instead (GH 20110)Deprecated the behavior of
is_bool_dtype()
returningTrue
for object-dtypeIndex
of bool objects (GH 52680)Deprecated the methods
Series.bool()
andDataFrame.bool()
(GH 51749)Deprecated unused “closed” and “normalize” keywords in the
DatetimeIndex
constructor (GH 52628)Deprecated unused “closed” keyword in the
TimedeltaIndex
constructor (GH 52628)Deprecated logical operation between two non boolean
Series
with different indexes always coercing the result to bool dtype. In a future version, this will maintain the return type of the inputs. (GH 52500, GH 52538)Deprecated
Series.first()
andDataFrame.first()
(please create a mask and filter using.loc
instead) (GH 45908)Deprecated allowing
downcast
keyword other thanNone
,False
, “infer”, or a dict with these as values inSeries.fillna()
,DataFrame.fillna()
(GH 40988)Deprecated allowing arbitrary
fill_value
inSparseDtype
, in a future version thefill_value
will need to be compatible with thedtype.subtype
, either a scalar that can be held by that subtype orNaN
for integer or bool subtypes (GH 23124)Deprecated behavior of
assert_series_equal()
andassert_frame_equal()
considering NA-like values (e.g.NaN
vsNone
as equivalent) (GH 52081)Deprecated constructing
SparseArray
from scalar data, pass a sequence instead (GH 53039)Deprecated option “mode.use_inf_as_na”, convert inf entries to
NaN
before instead (GH 51684)Deprecated positional indexing on
Series
withSeries.__getitem__()
andSeries.__setitem__()
, in a future versionser[item]
will always interpretitem
as a label, not a position (GH 50617)Deprecated the “method” and “limit” keywords on
Series.fillna()
,DataFrame.fillna()
,SeriesGroupBy.fillna()
,DataFrameGroupBy.fillna()
, andResampler.fillna()
, useobj.bfill()
orobj.ffill()
instead (GH 53394)
Performance improvements#
Performance improvement in
factorize()
for object columns not containing strings (GH 51921)Performance improvement in
read_orc()
when reading a remote URI file path. (GH 51609)Performance improvement in
read_parquet()
andDataFrame.to_parquet()
when reading a remote file withengine="pyarrow"
(GH 51609)Performance improvement in
read_parquet()
on string columns when usinguse_nullable_dtypes=True
(GH 47345)Performance improvement in
DataFrame.clip()
andSeries.clip()
(GH 51472)Performance improvement in
DataFrame.first_valid_index()
andDataFrame.last_valid_index()
for extension array dtypes (GH 51549)Performance improvement in
DataFrame.where()
whencond
is backed by an extension dtype (GH 51574)Performance improvement in
MultiIndex.set_levels()
andMultiIndex.set_codes()
whenverify_integrity=True
(GH 51873)Performance improvement in
MultiIndex.sortlevel()
whenascending
is a list (GH 51612)Performance improvement in
Series.combine_first()
(GH 51777)Performance improvement in
fillna()
when array does not contain nulls (GH 51635)Performance improvement in
isna()
when array has zero nulls or is all nulls (GH 51630)Performance improvement when parsing strings to
boolean[pyarrow]
dtype (GH 51730)Performance improvement when searching an
Index
sliced from other indexes (GH 51738)Period
’s default formatter (period_format) is now significantly (~twice) faster. This improves performance ofstr(Period)
,repr(Period)
, andPeriod.strftime(fmt=None)()
, as well asPeriodArray.strftime(fmt=None)
,PeriodIndex.strftime(fmt=None)
andPeriodIndex.format(fmt=None)
. Finally,to_csv
operations involvingPeriodArray
orPeriodIndex
with defaultdate_format
are also significantly accelerated. (GH 51459)Performance improvement accessing
arrays.IntegerArrays.dtype
&arrays.FloatingArray.dtype
(GH 52998)Performance improvement in
concat()
whenaxis=1
and objects have different indexes (GH 52541)Performance improvement in
DataFrameGroupBy.groups()
(GH 53088)Performance improvement in
DataFrame.isin()
for extension dtypes (GH 53514)Performance improvement in
DataFrame.loc()
when selecting rows and columns (GH 53014)Performance improvement in
Series.add()
for pyarrow string and binary dtypes (GH 53150)Performance improvement in
Series.corr()
andSeries.cov()
for extension dtypes (GH 52502)Performance improvement in
Series.str.get()
for pyarrow-backed strings (GH 53152)Performance improvement in
Series.to_numpy()
when dtype is a numpy float dtype andna_value
isnp.nan
(GH 52430)Performance improvement in
astype()
when converting from a pyarrow timestamp or duration dtype to numpy (GH 53326)Performance improvement in
to_numpy()
(GH 52525)Performance improvement when doing various reshaping operations on
arrays.IntegerArrays
&arrays.FloatingArray
by avoiding doing unnecessary validation (GH 53013)Performance improvement when indexing with pyarrow timestamp and duration dtypes (GH 53368)
Bug fixes#
Categorical#
Bug in
Series.map()
, where the value of thena_action
parameter was not used if the series held aCategorical
(GH 22527).
Datetimelike#
DatetimeIndex.map()
withna_action="ignore"
now works as expected. (GH 51644)Bug in
date_range()
whenfreq
was aDateOffset
withnanoseconds
(GH 46877)Bug in
Timestamp.round()
with values close to the implementation bounds returning incorrect results instead of raisingOutOfBoundsDatetime
(GH 51494)Bug in
arrays.DatetimeArray.map()
andDatetimeIndex.map()
, where the supplied callable operated array-wise instead of element-wise (GH 51977)Bug in constructing a
Series
orDataFrame
from a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (GH 52212)Bug in parsing datetime strings with weekday but no day e.g. “2023 Sept Thu” incorrectly raising
AttributeError
instead ofValueError
(GH 52659)
Timedelta#
TimedeltaIndex.map()
withna_action="ignore"
now works as expected (GH 51644)Bug in
TimedeltaIndex
division or multiplication leading to.freq
of “0 Days” instead ofNone
(GH 51575)Bug in
Timedelta.round()
with values close to the implementation bounds returning incorrect results instead of raisingOutOfBoundsTimedelta
(GH 51494)Bug in
arrays.TimedeltaArray.map()
andTimedeltaIndex.map()
, where the supplied callable operated array-wise instead of element-wise (GH 51977)
Timezones#
Bug in
infer_freq()
that raisesTypeError
forSeries
of timezone-aware timestamps (GH 52456)Bug in
DatetimeTZDtype.base()
that always returns a NumPy dtype with nanosecond resolution (GH 52705)
Numeric#
Bug in
RangeIndex
settingstep
incorrectly when being the subtrahend with minuend a numeric value (GH 53255)Bug in
Series.corr()
andSeries.cov()
raisingAttributeError
for masked dtypes (GH 51422)Bug when calling
Series.kurt()
andSeries.skew()
on numpy data of all zero returning a python type instead of a numpy type (GH 53482)Bug in
Series.mean()
,DataFrame.mean()
with object-dtype values containing strings that can be converted to numbers (e.g. “2”) returning incorrect numeric results; these now raiseTypeError
(GH 36703, GH 44008)Bug in
DataFrame.corrwith()
raisingNotImplementedError
for pyarrow-backed dtypes (GH 52314)Bug in
DataFrame.size()
andSeries.size()
returning 64-bit integer instead of int (GH 52897)Bug in
Series.any()
,Series.all()
,DataFrame.any()
, andDataFrame.all()
had the default value ofbool_only
set toNone
instead ofFalse
; this change should have no impact on users (GH 53258)Bug in
Series.corr()
andSeries.cov()
raisingAttributeError
for masked dtypes (GH 51422)Bug in
Series.median()
andDataFrame.median()
with object-dtype values containing strings that can be converted to numbers (e.g. “2”) returning incorrect numeric results; these now raiseTypeError
(GH 34671)Bug in
Series.sum()
converting dtypeuint64
toint64
(GH 53401)
Conversion#
Bug in
DataFrame.style.to_latex()
andDataFrame.style.to_html()
if the DataFrame contains integers with more digits than can be represented by floating point double precision (GH 52272)Bug in
array()
when given adatetime64
ortimedelta64
dtype with unit of “s”, “us”, or “ms” returningPandasArray
instead ofDatetimeArray
orTimedeltaArray
(GH 52859)Bug in
ArrowDtype.numpy_dtype()
returning nanosecond units for non-nanosecondpyarrow.timestamp
andpyarrow.duration
types (GH 51800)Bug in
DataFrame.__repr__()
incorrectly raising aTypeError
when the dtype of a column isnp.record
(GH 48526)Bug in
DataFrame.info()
raisingValueError
whenuse_numba
is set (GH 51922)Bug in
DataFrame.insert()
raisingTypeError
ifloc
isnp.int64
(GH 53193)
Strings#
Interval#
Indexing#
Missing#
Bug in
DataFrame.interpolate()
ignoringinplace
whenDataFrame
is empty (GH 53199)Bug in
Series.interpolate()
andDataFrame.interpolate()
failing to raise on invaliddowncast
keyword, which can be onlyNone
or “infer” (GH 53103)
MultiIndex#
Bug in
MultiIndex.set_levels()
not preserving dtypes forCategorical
(GH 52125)Bug in displaying a
MultiIndex
with a long element (GH 52960)
I/O#
DataFrame.to_orc()
now raisingValueError
when non-defaultIndex
is given (GH 51828)DataFrame.to_sql()
now raisingValueError
when the name param is left empty while using SQLAlchemy to connect (GH 52675)Bug in
json_normalize()
, fix json_normalize cannot parse metadata fields list type (GH 37782)Bug in
read_csv()
where it would error whenparse_dates
was set to a list or dictionary withengine="pyarrow"
(GH 47961)Bug in
read_csv()
, withengine="pyarrow"
erroring when specifying adtype
withindex_col
(GH 53229)Bug in
read_hdf()
not properly closing store after aIndexError
is raised (GH 52781)Bug in
read_html()
, style elements were read into DataFrames (GH 52197)Bug in
read_html()
, tail texts were removed together with elements containingdisplay:none
style (GH 51629)Bug in
read_sql()
when reading multiple timezone aware columns with the same column name (GH 44421)Bug when writing and reading empty Stata dta files where dtype information was lost (GH 46240)
Period#
PeriodIndex.map()
withna_action="ignore"
now works as expected (GH 51644)Bug in
PeriodDtype
constructor failing to raiseTypeError
when no argument is passed or whenNone
is passed (GH 27388)Bug in
PeriodDtype
constructor incorrectly returning the samenormalize
for differentDateOffset
freq
inputs (GH 24121)Bug in
PeriodDtype
constructor raisingValueError
instead ofTypeError
when an invalid type is passed (GH 51790)Bug in
read_csv()
not processing empty strings as a null value, withengine="pyarrow"
(GH 52087)Bug in
read_csv()
returningobject
dtype columns instead offloat64
dtype columns withengine="pyarrow"
for columns that are all null withengine="pyarrow"
(GH 52087)Bug in
arrays.PeriodArray.map()
andPeriodIndex.map()
, where the supplied callable operated array-wise instead of element-wise (GH 51977)Bug in incorrectly allowing construction of
Period
orPeriodDtype
withCustomBusinessDay
freq; useBusinessDay
instead (GH 52534)
Plotting#
Bug in
Series.plot()
when invoked withcolor=None
(GH 51953)
Groupby/resample/rolling#
Bug in
DataFrame.resample()
andSeries.resample()
in incorrectly allowing non-fixedfreq
when resampling on aTimedeltaIndex
(GH 51896)Bug in
DataFrameGroupBy.idxmin()
,SeriesGroupBy.idxmin()
,DataFrameGroupBy.idxmax()
,SeriesGroupBy.idxmax()
return wrong dtype when used on empty DataFrameGroupBy or SeriesGroupBy (GH 51423)Bug in weighted rolling aggregations when specifying
min_periods=0
(GH 51449)Bug in
DataFrame.groupby()
andSeries.groupby()
, where, when the index of the groupedSeries
orDataFrame
was aDatetimeIndex
,TimedeltaIndex
orPeriodIndex
, and thegroupby
method was given a function as its first argument, the function operated on the whole index rather than each element of the index. (GH 51979)Bug in
DataFrame.groupby()
with column selection on the resulting groupby object not returning names as tuples when grouping by a list of a single element. (GH 53500)Bug in
DataFrameGroupBy.agg()
with lists not respectingas_index=False
(GH 52849)Bug in
DataFrameGroupBy.apply()
causing an error to be raised when the inputDataFrame
was subset as aDataFrame
after groupby ([['a']]
and not['a']
) and the given callable returnedSeries
that were not all indexed the same. (GH 52444)Bug in
DataFrameGroupBy.apply()
raising aTypeError
when selecting multiple columns and providing a function that returnsnp.ndarray
results (GH 18930)Bug in
GroupBy.groups()
with a datetime key in conjunction with another key produced incorrect number of group keys (GH 51158)Bug in
GroupBy.quantile()
may implicitly sort the result index withsort=False
(GH 53009)Bug in
GroupBy.var()
failing to raiseTypeError
when called with datetime64, timedelta64 orPeriodDtype
values (GH 52128, GH 53045)Bug in
SeriresGroupBy.nth()
andDataFrameGroupBy.nth()
after performing column selection when usingdropna="any"
ordropna="all"
would not subset columns (GH 53518)Bug in
SeriresGroupBy.nth()
andDataFrameGroupBy.nth()
raised after performing column selection when usingdropna="any"
ordropna="all"
resulted in rows being dropped (GH 53518)
Reshaping#
Bug in
crosstab()
whendropna=False
would not keepnp.nan
in the result (GH 10772)Bug in
merge_asof()
raisingKeyError
for extension dtypes (GH 52904)Bug in
merge_asof()
raisingValueError
for data backed by read-only ndarrays (GH 53513)Bug in
DataFrame.agg()
andSeries.agg()
on non-unique columns would return incorrect type when dist-like argument passed in (GH 51099)Bug in
DataFrame.idxmin()
andDataFrame.idxmax()
, where the axis dtype would be lost for empty frames (GH 53265)Bug in
DataFrame.merge()
not merging correctly when havingMultiIndex
with single level (GH 52331)Bug in
DataFrame.stack()
losing extension dtypes when columns is aMultiIndex
and frame contains mixed dtypes (GH 45740)Bug in
DataFrame.transpose()
inferring dtype for object column (GH 51546)Bug in
Series.combine_first()
convertingint64
dtype tofloat64
and losing precision on very large integers (GH 51764)
Sparse#
Bug in
SparseDtype
constructor failing to raiseTypeError
when given an incompatibledtype
for its subtype, which must be anumpy
dtype (GH 53160)Bug in
arrays.SparseArray.map()
allowed the fill value to be included in the sparse values (GH 52095)
ExtensionArray#
Bug in
ArrowExtensionArray
converting pandas non-nanosecond temporal objects from non-zero values to zero values (GH 53171)Bug in
Series.quantile()
for pyarrow temporal types raising ArrowInvalid (GH 52678)Bug in
Series.rank()
returning wrong order for small values withFloat64
dtype (GH 52471)Bug in
__iter__()
and__getitem__()
returning python datetime and timedelta objects for non-nano dtypes (GH 53326)Bug where the
__from_arrow__
method of masked ExtensionDtypes(e.g.Float64Dtype
,BooleanDtype
) would not accept pyarrow arrays of typepyarrow.null()
(GH 52223)
Styler#
Bug in
Styler._copy()
calling overridden methods in subclasses ofStyler
(GH 52728)
Metadata#
Fixed metadata propagation in
DataFrame.squeeze()
, andDataFrame.describe()
(GH 28283)Fixed metadata propagation in
DataFrame.std()
(GH 28283)
Other#
Bug in
FloatingArray.__contains__
withNaN
item incorrectly returningFalse
whenNaN
values are present (GH 52840)Bug in
api.interchange.from_dataframe()
when converting an empty DataFrame object (GH 53155)Bug in
assert_almost_equal()
now throwing assertion error for two unequal sets (GH 51727)Bug in
assert_frame_equal()
checks category dtypes even when asked not to check index type (GH 52126)Bug in
DataFrame.reindex()
with afill_value
that should be inferred with aExtensionDtype
incorrectly inferringobject
dtype (GH 52586)Bug in
Series.map()
when giving a callable to an empty series, the returned series hadobject
dtype. It now keeps the original dtype (GH 52384)Bug in
Series.memory_usage()
whendeep=True
throw an error with Series of objects and the returned value is incorrect, as it does not take into account GC corrections (GH 51858)Fixed incorrect
__name__
attribute ofpandas._libs.json
(GH 52898)