What’s new in 2.2.0 (Month XX, 2024)#
These are the changes in pandas 2.2.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
ADBC Driver support in to_sql and read_sql#
read_sql()
and to_sql()
now work with Apache Arrow ADBC drivers. Compared to
traditional drivers used via SQLAlchemy, ADBC drivers should provide
significant performance improvements, better type support and cleaner
nullability handling.
import adbc_driver_postgresql.dbapi as pg_dbapi
df = pd.DataFrame(
[
[1, 2, 3],
[4, 5, 6],
],
columns=['a', 'b', 'c']
)
uri = "postgresql://postgres:postgres@localhost/postgres"
with pg_dbapi.connect(uri) as conn:
df.to_sql("pandas_table", conn, index=False)
# for roundtripping
with pg_dbapi.connect(uri) as conn:
df2 = pd.read_sql("pandas_table", conn)
The Arrow type system offers a wider array of types that can more closely match what databases like PostgreSQL can offer. To illustrate, note this (non-exhaustive) listing of types available in different databases and pandas backends:
numpy/pandas |
arrow |
postgres |
sqlite |
---|---|---|---|
int16/Int16 |
int16 |
SMALLINT |
INTEGER |
int32/Int32 |
int32 |
INTEGER |
INTEGER |
int64/Int64 |
int64 |
BIGINT |
INTEGER |
float32 |
float32 |
REAL |
REAL |
float64 |
float64 |
DOUBLE PRECISION |
REAL |
object |
string |
TEXT |
TEXT |
bool |
|
BOOLEAN |
|
datetime64[ns] |
timestamp(us) |
TIMESTAMP |
|
datetime64[ns,tz] |
timestamp(us,tz) |
TIMESTAMPTZ |
|
date32 |
DATE |
||
month_day_nano_interval |
INTERVAL |
||
binary |
BINARY |
BLOB |
|
decimal128 |
DECIMAL [1] |
||
list |
ARRAY [1] |
||
struct |
|
Footnotes
If you are interested in preserving database types as best as possible
throughout the lifecycle of your DataFrame, users are encouraged to
leverage the dtype_backend="pyarrow"
argument of read_sql()
# for roundtripping
with pg_dbapi.connect(uri) as conn:
df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")
This will prevent your data from being converted to the traditional pandas/NumPy type system, which often converts SQL types in ways that make them impossible to round-trip.
For a full list of ADBC drivers and their development status, see the ADBC Driver Implementation Status documentation.
Series.struct accessor to with PyArrow structured data#
The Series.struct
accessor provides attributes and methods for processing
data with struct[pyarrow]
dtype Series. For example,
Series.struct.explode()
converts PyArrow structured data to a pandas
DataFrame. (GH 54938)
In [1]: import pyarrow as pa
In [2]: series = pd.Series(
...: [
...: {"project": "pandas", "version": "2.2.0"},
...: {"project": "numpy", "version": "1.25.2"},
...: {"project": "pyarrow", "version": "13.0.0"},
...: ],
...: dtype=pd.ArrowDtype(
...: pa.struct([
...: ("project", pa.string()),
...: ("version", pa.string()),
...: ])
...: ),
...: )
...:
In [3]: series.struct.explode()
Out[3]:
project version
0 pandas 2.2.0
1 numpy 1.25.2
2 pyarrow 13.0.0
Series.list accessor for PyArrow list data#
The Series.list
accessor provides attributes and methods for processing
data with list[pyarrow]
dtype Series. For example,
Series.list.__getitem__()
allows indexing pyarrow lists in
a Series. (GH 55323)
In [4]: import pyarrow as pa
In [5]: series = pd.Series(
...: [
...: [1, 2, 3],
...: [4, 5],
...: [6],
...: ],
...: dtype=pd.ArrowDtype(
...: pa.list_(pa.int64())
...: ),
...: )
...:
In [6]: series.list[0]
Out[6]:
0 1
1 4
2 6
dtype: int64[pyarrow]
Calamine engine for read_excel()
#
The calamine
engine was added to read_excel()
.
It uses python-calamine
, which provides Python bindings for the Rust library calamine.
This engine supports Excel files (.xlsx
, .xlsm
, .xls
, .xlsb
) and OpenDocument spreadsheets (.ods
) (GH 50395).
There are two advantages of this engine:
Calamine is often faster than other engines, some benchmarks show results up to 5x faster than ‘openpyxl’, 20x - ‘odf’, 4x - ‘pyxlsb’, and 1.5x - ‘xlrd’. But, ‘openpyxl’ and ‘pyxlsb’ are faster in reading a few rows from large files because of lazy iteration over rows.
Calamine supports the recognition of datetime in
.xlsb
files, unlike ‘pyxlsb’ which is the only other engine in pandas that can read.xlsb
files.
pd.read_excel("path_to_file.xlsb", engine="calamine")
For more, see Calamine (Excel and ODS files) in the user guide on IO tools.
Other enhancements#
to_sql()
with method parameter set tomulti
works with Oracle on the backendSeries.attrs
/DataFrame.attrs
now uses a deepcopy for propagatingattrs
(GH 54134).read_csv()
now supportson_bad_lines
parameter withengine="pyarrow"
. (GH 54480)read_spss()
now returns aDataFrame
that stores the metadata inDataFrame.attrs
. (GH 54264)tseries.api.guess_datetime_format()
is now part of the public API (GH 54727)ExtensionArray._explode()
interface method added to allow extension type implementations of theexplode
method (GH 54833)ExtensionArray.duplicated()
added to allow extension type implementations of theduplicated
method (GH 55255)Allow passing
read_only
,data_only
andkeep_links
arguments to openpyxl usingengine_kwargs
ofread_excel()
(GH 55027)DataFrame.apply now allows the usage of numba (via
engine="numba"
) to JIT compile the passed function, allowing for potential speedups (GH 54666)Implement masked algorithms for
Series.value_counts()
(GH 54984)Improved error message when constructing
Period
with invalid offsets such as “QS” (GH 55785)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
merge()
and DataFrame.join()
now consistently follow documented sort behavior#
In previous versions of pandas, merge()
and DataFrame.join()
did not
always return a result that followed the documented sort behavior. pandas now
follows the documented sort behavior in merge and join operations (GH 54611).
As documented, sort=True
sorts the join keys lexicographically in the resulting
DataFrame
. With sort=False
, the order of the join keys depends on the
join type (how
keyword):
how="left"
: preserve the order of the left keyshow="right"
: preserve the order of the right keyshow="inner"
: preserve the order of the left keyshow="outer"
: sort keys lexicographically
One example with changing behavior is inner joins with non-unique left join keys
and sort=False
:
In [7]: left = pd.DataFrame({"a": [1, 2, 1]})
In [8]: right = pd.DataFrame({"a": [1, 2]})
In [9]: result = pd.merge(left, right, how="inner", on="a", sort=False)
Old Behavior
In [5]: result
Out[5]:
a
0 1
1 1
2 2
New Behavior
In [10]: result
Out[10]:
a
0 1
1 2
2 1
merge()
and DataFrame.join()
no longer reorder levels when levels differ#
In previous versions of pandas, merge()
and DataFrame.join()
would reorder
index levels when joining on two indexes with different levels (GH 34133).
In [11]: left = pd.DataFrame({"left": 1}, index=pd.MultiIndex.from_tuples([("x", 1), ("x", 2)], names=["A", "B"]))
In [12]: right = pd.DataFrame({"right": 2}, index=pd.MultiIndex.from_tuples([(1, 1), (2, 2)], names=["B", "C"]))
In [13]: result = left.join(right)
Old Behavior
In [5]: result
Out[5]:
left right
B A C
1 x 1 1 2
2 x 2 1 2
New Behavior
In [14]: result
Out[14]:
left right
A B C
x 1 1 1 2
2 2 1 2
Backwards incompatible API changes#
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
---|---|---|---|
X |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
---|---|---|
X |
See Dependencies and Optional dependencies for more.
Other API changes#
check_exact
now only takes effect for floating-point dtypes intesting.assert_frame_equal()
andtesting.assert_series_equal()
. In particular, integer dtypes are always checked exactly (GH 55882)
Deprecations#
Deprecate aliases M
, SM
, BM
, CBM
, Q
, BQ
, Y
, and BY
in favour of ME
, SME
, BME
, CBME
, QE
, BQE
, YE
, and BYE
for offsets#
Deprecated the following frequency aliases (GH 9586):
M
(month end) has been renamedME
for offsetsSM
(semi month end) has been renamedSME
for offsetsBM
(business month end) has been renamedBME
for offsetsCBM
(custom business month end) has been renamedCBME
for offsetsQ
(quarter end) has been renamedQE
for offsetsBQ
(business quarter end) has been renamedBQE
for offsetsY
(year end) has been renamedYE
for offsetsBY
(business year end) has been renamedBYE
for offsets
For example:
Previous behavior:
In [8]: pd.date_range('2020-01-01', periods=3, freq='Q-NOV')
Out[8]:
DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'],
dtype='datetime64[ns]', freq='Q-NOV')
Future behavior:
In [15]: pd.date_range('2020-01-01', periods=3, freq='QE-NOV')
Out[15]: DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'], dtype='datetime64[ns]', freq='QE-NOV')
Other Deprecations#
Changed
Timedelta.resolution_string()
to returnh
,min
,s
,ms
,us
, andns
instead ofH
,T
,S
,L
,U
, andN
, for compatibility with respective deprecations in frequency aliases (GH 52536)Deprecated
pandas.api.types.is_interval()
andpandas.api.types.is_period()
, useisinstance(obj, pd.Interval)
andisinstance(obj, pd.Period)
instead (GH 55264)Deprecated
read_gbq()
andDataFrame.to_gbq()
. Usepandas_gbq.read_gbq
andpandas_gbq.to_gbq
instead https://pandas-gbq.readthedocs.io/en/latest/api.html (GH 55525)Deprecated
DataFrameGroupBy.fillna()
andSeriesGroupBy.fillna()
; useDataFrameGroupBy.ffill()
,DataFrameGroupBy.bfill()
for forward and backward filling orDataFrame.fillna()
to fill with a single value (or the Series equivalents) (GH 55718)Deprecated
Index.format()
, useindex.astype(str)
orindex.map(formatter)
instead (GH 55413)Deprecated
Series.ravel()
, the underlying array is already 1D, so ravel is not necessary (GH 52511)Deprecated
Series.view()
, useastype
instead to change the dtype (GH 20251)Deprecated
core.internals
membersBlock
,ExtensionBlock
, andDatetimeTZBlock
, use public APIs instead (GH 55139)Deprecated
year
,month
,quarter
,day
,hour
,minute
, andsecond
keywords in thePeriodIndex
constructor, usePeriodIndex.from_fields()
instead (GH 55960)Deprecated allowing non-integer
periods
argument indate_range()
,timedelta_range()
,period_range()
, andinterval_range()
(GH 56036)Deprecated allowing non-keyword arguments in
DataFrame.to_clipboard()
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_csv()
exceptpath_or_buf
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_dict()
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_excel()
exceptexcel_writer
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_gbq()
exceptdestination_table
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_hdf()
exceptpath_or_buf
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_html()
exceptbuf
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_json()
exceptpath_or_buf
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_latex()
exceptbuf
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_markdown()
exceptbuf
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_parquet()
exceptpath
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_pickle()
exceptpath
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_string()
exceptbuf
. (GH 54229)Deprecated allowing non-keyword arguments in
DataFrame.to_xml()
exceptpath_or_buffer
. (GH 54229)Deprecated allowing passing
BlockManager
objects toDataFrame
orSingleBlockManager
objects toSeries
(GH 52419)Deprecated automatic downcasting of object-dtype results in
Series.replace()
andDataFrame.replace()
, explicitly callresult = result.infer_objects(copy=False)
instead. To opt in to the future version, usepd.set_option("future.no_silent_downcasting", True)
(GH 54710)Deprecated downcasting behavior in
Series.where()
,DataFrame.where()
,Series.mask()
,DataFrame.mask()
,Series.clip()
,DataFrame.clip()
; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Callresult.infer_objects(copy=False)
on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, usepd.set_option("future.no_silent_downcasting", True)
(GH 53656)Deprecated including the groups in computations when using
DataFrameGroupBy.apply()
andDataFrameGroupBy.resample()
; passinclude_groups=False
to exclude the groups (GH 7155)Deprecated indexing an
Index
with a boolean indexer of length zero (GH 55820)Deprecated not passing a tuple to
DataFrameGroupBy.get_group
orSeriesGroupBy.get_group
when grouping by a length-1 list-like (GH 25971)Deprecated string
AS
denoting frequency inYearBegin
and stringsAS-DEC
,AS-JAN
, etc. denoting annual frequencies with various fiscal year starts (GH 54275)Deprecated string
A
denoting frequency inYearEnd
and stringsA-DEC
,A-JAN
, etc. denoting annual frequencies with various fiscal year ends (GH 54275)Deprecated string
BAS
denoting frequency inBYearBegin
and stringsBAS-DEC
,BAS-JAN
, etc. denoting annual frequencies with various fiscal year starts (GH 54275)Deprecated string
BA
denoting frequency inBYearEnd
and stringsBA-DEC
,BA-JAN
, etc. denoting annual frequencies with various fiscal year ends (GH 54275)Deprecated strings
H
,BH
, andCBH
denoting frequencies inHour
,BusinessHour
,CustomBusinessHour
(GH 52536)Deprecated strings
H
,S
,U
, andN
denoting units into_timedelta()
(GH 52536)Deprecated strings
H
,T
,S
,L
,U
, andN
denoting units inTimedelta
(GH 52536)Deprecated strings
T
,S
,L
,U
, andN
denoting frequencies inMinute
,Second
,Milli
,Micro
,Nano
(GH 52536)Deprecated the
BaseGrouper
attributesgroup_keys_seq
andreconstructed_codes
; these will be removed in a future version of pandas (GH 56148)Deprecated the
Grouping
attributesgroup_index
,result_index
, andgroup_arraylike
; these will be removed in a future version of pandas (GH 56148)Deprecated the
errors="ignore"
option into_datetime()
,to_timedelta()
, andto_numeric()
; explicitly catch exceptions instead (GH 54467)Deprecated the
fastpath
keyword in theSeries
constructor (GH 20110)Deprecated the
ordinal
keyword inPeriodIndex
, usePeriodIndex.from_ordinals()
instead (GH 55960)Deprecated the behavior of
Series.value_counts()
andIndex.value_counts()
with object dtype; in a future version these will not perform dtype inference on the resultingIndex
, doresult.index = result.index.infer_objects()
to retain the old behavior (GH 56161)Deprecated the extension test classes
BaseNoReduceTests
,BaseBooleanReduceTests
, andBaseNumericReduceTests
, useBaseReduceTests
instead (GH 54663)Deprecated the option
mode.data_manager
and theArrayManager
; only theBlockManager
will be available in future versions (GH 55043)Deprecated the previous implementation of
DataFrame.stack
; specifyfuture_stack=True
to adopt the future version (GH 53515)Deprecating downcasting the results of
DataFrame.fillna()
,Series.fillna()
,DataFrame.ffill()
,Series.ffill()
,DataFrame.bfill()
,Series.bfill()
in object-dtype cases. To opt in to the future version, usepd.set_option("future.no_silent_downcasting", True)
(GH 54261)
Performance improvements#
Performance improvement in
testing.assert_frame_equal()
andtesting.assert_series_equal()
(GH 55949, GH 55971)Performance improvement in
concat()
withaxis=1
and objects with unaligned indexes (GH 55084)Performance improvement in
get_dummies()
(GH 56089)Performance improvement in
merge_asof()
whenby
is notNone
(GH 55580, GH 55678)Performance improvement in
read_stata()
for files with many variables (GH 55515)Performance improvement in
to_dict()
on converting DataFrame to dictionary (GH 50990)Performance improvement in
DataFrame.groupby()
when aggregating pyarrow timestamp and duration dtypes (GH 55031)Performance improvement in
DataFrame.loc()
andSeries.loc()
when indexing with aMultiIndex
(GH 56062)Performance improvement in
DataFrame.sort_index()
andSeries.sort_index()
when indexed by aMultiIndex
(GH 54835)Performance improvement in
Index.difference()
(GH 55108)Performance improvement in
Index.sort_values()
when index is already sorted (GH 56128)Performance improvement in
MultiIndex.get_indexer()
whenmethod
is notNone
(GH 55839)Performance improvement in
Series.duplicated()
for pyarrow dtypes (GH 55255)Performance improvement in
Series.str.get_dummies()
when dtype is"string[pyarrow]"
or"string[pyarrow_numpy]"
(GH 56110)Performance improvement in
Series.str()
methods (GH 55736)Performance improvement in
Series.value_counts()
andSeries.mode()
for masked dtypes (GH 54984, GH 55340)Performance improvement in
SeriesGroupBy.idxmax()
,SeriesGroupBy.idxmin()
,DataFrameGroupBy.idxmax()
,DataFrameGroupBy.idxmin()
(GH 54234)Performance improvement when indexing into a non-unique index (GH 55816)
Performance improvement when indexing with more than 4 keys (GH 54550)
Performance improvement when localizing time to UTC (GH 55241)
Bug fixes#
Categorical#
Datetimelike#
Bug in
DatetimeIndex
construction when passing both atz
and eitherdayfirst
oryearfirst
ignoring dayfirst/yearfirst (GH 55813)Bug in
DatetimeIndex
when passing an object-dtype ndarray of float objects and atz
incorrectly localizing the result (GH 55780)Bug in
concat()
raisingAttributeError
when concatenating all-NA DataFrame withDatetimeTZDtype
dtype DataFrame. (GH 52093)Bug in
testing.assert_extension_array_equal()
that could use the wrong unit when comparing resolutions (GH 55730)Bug in
to_datetime()
andDatetimeIndex
when passing a list of mixed-string-and-numeric types incorrectly raising (GH 55780)Bug in
to_datetime()
andDatetimeIndex
when passing mixed-type objects with a mix of timezones or mix of timezone-awareness failing to raiseValueError
(GH 55693)Bug in
DatetimeIndex.shift()
with non-nanosecond resolution incorrectly returning with nanosecond resolution (GH 56117)Bug in
DatetimeIndex.union()
returning object dtype for tz-aware indexes with the same timezone but different units (GH 55238)Bug in
Index.is_monotonic_increasing()
andIndex.is_monotonic_decreasing()
always cachingIndex.is_unique()
asTrue
when first value in index isNaT
(GH 55755)Bug in
Index.view()
to a datetime64 dtype with non-supported resolution incorrectly raising (GH 55710)Bug in
Series.dt.round()
with non-nanosecond resolution andNaT
entries incorrectly raisingOverflowError
(GH 56158)Bug in
Tick.delta()
with very large ticks raisingOverflowError
instead ofOutOfBoundsTimedelta
(GH 55503)Bug in
Timestamp.unit()
being inferred incorrectly from an ISO8601 format string with minute or hour resolution and a timezone offset (GH 56208)Bug in
.astype
converting from a higher-resolutiondatetime64
dtype to a lower-resolutiondatetime64
dtype (e.g.datetime64[us]->datetim64[ms]
) silently overflowing with values near the lower implementation bound (GH 55979)Bug in adding or subtracting a
Week
offset to adatetime64
Series
,Index
, orDataFrame
column with non-nanosecond resolution returning incorrect results (GH 55583)Bug in addition or subtraction of
BusinessDay
offset withoffset
attribute to non-nanosecondIndex
,Series
, orDataFrame
column giving incorrect results (GH 55608)Bug in addition or subtraction of
DateOffset
objects with microsecond components todatetime64
Index
,Series
, orDataFrame
columns with non-nanosecond resolution (GH 55595)Bug in addition or subtraction of very large
Tick
objects withTimestamp
orTimedelta
objects raisingOverflowError
instead ofOutOfBoundsTimedelta
(GH 55503)Bug in creating a
Index
,Series
, orDataFrame
with a non-nanosecondDatetimeTZDtype
and inputs that would be out of bounds with nanosecond resolution incorrectly raisingOutOfBoundsDatetime
(GH 54620)Bug in creating a
Index
,Series
, orDataFrame
with a non-nanoseconddatetime64
(orDatetimeTZDtype
) from mixed-numeric inputs treating those as nanoseconds instead of as multiples of the dtype’s unit (which would happen with non-mixed numeric inputs) (GH 56004)Bug in creating a
Index
,Series
, orDataFrame
with a non-nanoseconddatetime64
dtype and inputs that would be out of bounds for adatetime64[ns]
incorrectly raisingOutOfBoundsDatetime
(GH 55756)Bug in parsing datetime strings with nanosecond resolution with non-ISO8601 formats incorrectly truncating sub-microsecond components (GH 56051)
Bug in parsing datetime strings with sub-second resolution and trailing zeros incorrectly inferring second or millisecond resolution (GH 55737)
Bug in the results of
pd.to_datetime()
with an floating-dtype argument withunit
not matching the pointwise results ofTimestamp
(GH 56037)
Timedelta#
Bug in
Timedelta
construction raisingOverflowError
instead ofOutOfBoundsTimedelta
(GH 55503)Bug in rendering (
__repr__
) ofTimedeltaIndex
andSeries
with timedelta64 values with non-nanosecond resolution entries that are all multiples of 24 hours failing to use the compact representation used in the nanosecond cases (GH 55405)
Timezones#
Bug in
AbstractHolidayCalendar
where timezone data was not propagated when computing holiday observances (GH 54580)Bug in
Timestamp
construction with an ambiguous value and apytz
timezone failing to raisepytz.AmbiguousTimeError
(GH 55657)Bug in
Timestamp.tz_localize()
withnonexistent="shift_forward
around UTC+0 during DST (GH 51501)
Numeric#
Bug in
read_csv()
withengine="pyarrow"
causing rounding errors for large integers (GH 52505)Bug in
Series.pow()
not filling missing values correctly (GH 55512)
Conversion#
Bug in
astype()
when called withstr
on unpickled array - the array might change in-place (GH 54654)Bug in
Series.convert_dtypes()
not converting all NA column tonull[pyarrow]
(GH 55346)
Strings#
Bug in
pandas.api.types.is_string_dtype()
while checking object array with no elements is of the string dtype (GH 54661)Bug in
DataFrame.apply()
failing whenengine="numba"
and columns or index haveStringDtype
(GH 56189)Bug in
Series.str.startswith()
andSeries.str.endswith()
with arguments of typetuple[str, ...]
forstring[pyarrow]
(GH 54942)
Interval#
Bug in
Interval
__repr__
not displaying UTC offsets forTimestamp
bounds. Additionally the hour, minute and second components will now be shown. (GH 55015)Bug in
IntervalIndex.factorize()
andSeries.factorize()
withIntervalDtype
with datetime64 or timedelta64 intervals not preserving non-nanosecond units (GH 56099)Bug in
IntervalIndex.from_arrays()
when passeddatetime64
ortimedelta64
arrays with mismatched resolutions constructing an invalidIntervalArray
object (GH 55714)Bug in
IntervalIndex.get_indexer()
with datetime or timedelta intervals incorrectly matching on integer targets (GH 47772)Bug in
IntervalIndex.get_indexer()
with timezone-aware datetime intervals incorrectly matching on a sequence of timezone-naive targets (GH 47772)Bug in setting values on a
Series
with anIntervalIndex
using a slice incorrectly raising (GH 54722)
Indexing#
Bug in
DataFrame.loc()
when settingSeries
with extension dtype into NumPy dtype (GH 55604)Bug in
Index.difference()
not returning a unique set of values whenother
is empty orother
is considered non-comparable (GH 55113)Bug in setting
Categorical
values into aDataFrame
with numpy dtypes raisingRecursionError
(GH 52927)
Missing#
MultiIndex#
Bug in
MultiIndex.get_indexer()
not raisingValueError
whenmethod
provided and index is non-monotonic (GH 53452)
I/O#
Bug in
read_csv()
whereon_bad_lines="warn"
would write tostderr
instead of raise a Python warning. This now yields aerrors.ParserWarning
(GH 54296)Bug in
read_csv()
withengine="pyarrow"
whereusecols
wasn’t working with a csv with no headers (GH 54459)Bug in
read_excel()
, withengine="xlrd"
(xls
files) erroring when file contains NaNs/Infs (GH 54564)Bug in
read_json()
not handling dtype conversion properly ifinfer_string
is set (GH 56195)Bug in
to_excel()
, withOdsWriter
(ods
files) writing boolean/string value (GH 54994)Bug in
DataFrame.to_hdf()
andread_hdf()
withdatetime64
dtypes with non-nanosecond resolution failing to round-trip correctly (GH 55622)Bug in
pandas.read_excel()
withengine="odf"
(ods
files) when string contains annotation (GH 55200)Bug in
pandas.read_excel()
with an ODS file without cached formatted cell for float values (GH 55219)Bug where
DataFrame.to_json()
would raise anOverflowError
instead of aTypeError
with unsupported NumPy types (GH 55403)
Period#
Bug in
PeriodIndex
construction when more than one ofdata
,ordinal
and**fields
are passed failing to raiseValueError
(GH 55961)Bug in
Period
addition silently wrapping around instead of raisingOverflowError
(GH 55503)Bug in casting from
PeriodDtype
withastype
todatetime64
orDatetimeTZDtype
with non-nanosecond unit incorrectly returning with nanosecond unit (GH 55958)
Plotting#
Bug in
DataFrame.plot.box()
withvert=False
and a matplotlibAxes
created withsharey=True
(GH 54941)Bug in
DataFrame.plot.scatter()
discaring string columns (GH 56142)Bug in
Series.plot()
when reusing anax
object failing to raise when ahow
keyword is passed (GH 55953)
Groupby/resample/rolling#
Bug in
Rolling
where duplicate datetimelike indexes are treated as consecutive rather than equal withclosed='left'
andclosed='neither'
(GH 20712)Bug in
DataFrameGroupBy.idxmin()
,DataFrameGroupBy.idxmax()
,SeriesGroupBy.idxmin()
, andSeriesGroupBy.idxmax()
would not retainCategorical
dtype when the index was aCategoricalIndex
that contained NA values (GH 54234)Bug in
DataFrameGroupBy.transform()
andSeriesGroupBy.transform()
whenobserved=False
andf="idxmin"
orf="idxmax"
would incorrectly raise on unobserved categories (GH 54234)Bug in
DataFrame.asfreq()
andSeries.asfreq()
with aDatetimeIndex
with non-nanosecond resolution incorrectly converting to nanosecond resolution (GH 55958)Bug in
DataFrame.resample()
not respectingclosed
andlabel
arguments forBusinessDay
(GH 55282)Bug in
DataFrame.resample()
where bin edges were not correct forBusinessDay
(GH 55281)Bug in
DataFrame.resample()
where bin edges were not correct forMonthBegin
(GH 55271)Bug in
DataFrameGroupBy.value_counts()
andSeriesGroupBy.value_count()
could result in incorrect sorting if the columns of the DataFrame or name of the Series are integers (GH 55951)Bug in
DataFrameGroupBy.value_counts()
andSeriesGroupBy.value_count()
would not respectsort=False
inDataFrame.groupby()
andSeries.groupby()
(GH 55951)Bug in
DataFrameGroupBy.value_counts()
andSeriesGroupBy.value_count()
would sort by proportions rather than frequencies whensort=True
andnormalize=True
(GH 55951)
Reshaping#
Bug in
concat()
ignoringsort
parameter when passedDatetimeIndex
indexes (GH 54769)Bug in
merge_asof()
raisingTypeError
whenby
dtype is notobject
,int64
, oruint64
(GH 22794)Bug in
merge()
returning columns in incorrect order when left and/or right is empty (GH 51929)Bug in
pandas.DataFrame.melt()
where an exception was raised ifvar_name
was not a string (GH 55948)Bug in
pandas.DataFrame.melt()
where it would not preserve the datetime (GH 55254)Bug in
pandas.DataFrame.pivot_table()
where the row margin is incorrect when the columns have numeric names (GH 26568)
Sparse#
Bug in
SparseArray.take()
when using a different fill value than the array’s fill value (GH 55181)
ExtensionArray#
Styler#
Other#
Bug in
DataFrame.describe()
when formatting percentiles in the resulting percentile 99.999% is rounded to 100% (GH 55765)Bug in
cut()
incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (GH 54964)Bug in
infer_freq()
andDatetimeIndex.inferred_freq()
with weekly frequencies and non-nanosecond resolutions (GH 55609)Bug in
DataFrame.apply()
where passingraw=True
ignoredargs
passed to the applied function (GH 55009)Bug in
Dataframe.from_dict()
which would always sort the rows of the createdDataFrame
. (GH 55683)Bug in rendering
inf
values inside a aDataFrame
with theuse_inf_as_na
option enabled (GH 55483)Bug in rendering a
Series
with aMultiIndex
when one of the index level’s names is 0 not having that name displayed (GH 55415)Bug in the error message when assigning an empty dataframe to a column (GH 55956)