What’s new in 2.0.0 (April 3, 2023)#
These are the changes in pandas 2.0.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
Installing optional dependencies with pip extras#
When installing pandas using pip, sets of optional dependencies can also be installed by specifying extras.
pip install "pandas[performance, aws]>=2.0.0"
The available extras, found in the installation guide, are
[all, performance, computation, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
sql-other, html, xml, plot, output_formatting, clipboard, compression, test]
(GH39164).
Index
can now hold numpy numeric dtypes#
It is now possible to use any numpy numeric dtype in a Index
(GH42717).
Previously it was only possible to use int64
, uint64
& float64
dtypes:
In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Int64Index([1, 2, 3], dtype="int64")
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: UInt64Index([1, 2, 3], dtype="uint64")
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64")
Int64Index
, UInt64Index
& Float64Index
were deprecated in pandas
version 1.4 and have now been removed. Instead Index
should be used directly, and
can it now take all numpy numeric dtypes, i.e.
int8
/ int16
/int32
/int64
/uint8
/uint16
/uint32
/uint64
/float32
/float64
dtypes:
In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Index([1, 2, 3], dtype='int8')
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: Index([1, 2, 3], dtype='uint16')
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Index([1.0, 2.0, 3.0], dtype='float32')
The ability for Index
to hold the numpy numeric dtypes has meant some changes in Pandas
functionality. In particular, operations that previously were forced to create 64-bit indexes,
can now create indexes with lower bit sizes, e.g. 32-bit indexes.
Below is a possibly non-exhaustive list of changes:
Instantiating using a numpy numeric array now follows the dtype of the numpy array. Previously, all indexes created from numpy numeric arrays were forced to 64-bit. Now, for example,
Index(np.array([1, 2, 3]))
will beint32
on 32-bit systems, where it previously would have beenint64
even on 32-bit systems. InstantiatingIndex
using a list of numbers will still return 64bit dtypes, e.g.Index([1, 2, 3])
will have aint64
dtype, which is the same as previously.The various numeric datetime attributes of
DatetimeIndex
(day
,month
,year
etc.) were previously in of dtypeint64
, while they wereint32
forarrays.DatetimeArray
. They are nowint32
onDatetimeIndex
also:In [4]: idx = pd.date_range(start='1/1/2018', periods=3, freq='M') In [5]: idx.array.year Out[5]: array([2018, 2018, 2018], dtype=int32) In [6]: idx.year Out[6]: Index([2018, 2018, 2018], dtype='int32')
Level dtypes on Indexes from
Series.sparse.from_coo()
are now of dtypeint32
, the same as they are on therows
/cols
on a scipy sparse matrix. Previously they were of dtypeint64
.In [7]: from scipy import sparse In [8]: A = sparse.coo_matrix( ...: ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4) ...: ) ...: In [9]: ser = pd.Series.sparse.from_coo(A) In [10]: ser.index.dtypes Out[10]: level_0 int32 level_1 int32 dtype: object
Index
cannot be instantiated using a float16 dtype. Previously instantiating anIndex
using dtypefloat16
resulted in aFloat64Index
with afloat64
dtype. It now raises aNotImplementedError
:In [11]: pd.Index([1, 2, 3], dtype=np.float16) --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[11], line 1 ----> 1 pd.Index([1, 2, 3], dtype=np.float16) File ~/work/pandas/pandas/pandas/core/indexes/base.py:562, in Index.__new__(cls, data, dtype, copy, name, tupleize_cols) 558 arr = ensure_wrapped_if_datetimelike(arr) 560 klass = cls._dtype_to_subclass(arr.dtype) --> 562 arr = klass._ensure_array(arr, arr.dtype, copy=False) 563 return klass._simple_new(arr, name, refs=refs) File ~/work/pandas/pandas/pandas/core/indexes/base.py:575, in Index._ensure_array(cls, data, dtype, copy) 572 raise ValueError("Index data must be 1-dimensional") 573 elif dtype == np.float16: 574 # float16 not supported (no indexing engine) --> 575 raise NotImplementedError("float16 indexes are not supported") 577 if copy: 578 # asarray_tuplesafe does not always copy underlying data, 579 # so need to make sure that this happens 580 data = data.copy() NotImplementedError: float16 indexes are not supported
Argument dtype_backend
, to return pyarrow-backed or numpy-backed nullable dtypes#
The following functions gained a new keyword dtype_backend
(GH36712)
When this option is set to "numpy_nullable"
it will return a DataFrame
that is
backed by nullable dtypes.
When this keyword is set to "pyarrow"
, then these functions will return pyarrow-backed nullable ArrowDtype
DataFrames (GH48957, GH49997):
In [12]: import io
In [13]: data = io.StringIO("""a,b,c,d,e,f,g,h,i
....: 1,2.5,True,a,,,,,
....: 3,4.5,False,b,6,7.5,True,a,
....: """)
....:
In [14]: df = pd.read_csv(data, dtype_backend="pyarrow")
In [15]: df.dtypes
Out[15]:
a int64[pyarrow]
b double[pyarrow]
c bool[pyarrow]
d string[pyarrow]
e int64[pyarrow]
f double[pyarrow]
g bool[pyarrow]
h string[pyarrow]
i null[pyarrow]
dtype: object
In [16]: data.seek(0)
Out[16]: 0
In [17]: df_pyarrow = pd.read_csv(data, dtype_backend="pyarrow", engine="pyarrow")
In [18]: df_pyarrow.dtypes
Out[18]:
a int64[pyarrow]
b double[pyarrow]
c bool[pyarrow]
d string[pyarrow]
e int64[pyarrow]
f double[pyarrow]
g bool[pyarrow]
h string[pyarrow]
i null[pyarrow]
dtype: object
Copy-on-Write improvements#
A new lazy copy mechanism that defers the copy until the object in question is modified was added to the methods listed in Copy-on-Write optimizations. These methods return views when Copy-on-Write is enabled, which provides a significant performance improvement compared to the regular execution (GH49473).
Accessing a single column of a DataFrame as a Series (e.g.
df["col"]
) now always returns a new object every time it is constructed when Copy-on-Write is enabled (not returning multiple times an identical, cached Series object). This ensures that those Series objects correctly follow the Copy-on-Write rules (GH49450)The
Series
constructor will now create a lazy copy (deferring the copy until a modification to the data happens) when constructing a Series from an existing Series with the default ofcopy=False
(GH50471)The
DataFrame
constructor will now create a lazy copy (deferring the copy until a modification to the data happens) when constructing from an existingDataFrame
with the default ofcopy=False
(GH51239)The
DataFrame
constructor, when constructing a DataFrame from a dictionary of Series objects and specifyingcopy=False
, will now use a lazy copy of those Series objects for the columns of the DataFrame (GH50777)The
DataFrame
constructor, when constructing a DataFrame from aSeries
orIndex
and specifyingcopy=False
, will now respect Copy-on-Write.The
DataFrame
andSeries
constructors, when constructing from a NumPy array, will now copy the array by default to avoid mutating theDataFrame
/Series
when mutating the array. Specifycopy=False
to get the old behavior. When settingcopy=False
pandas does not guarantee correct Copy-on-Write behavior when the NumPy array is modified after creation of theDataFrame
/Series
.The
DataFrame.from_records()
will now respect Copy-on-Write when called with aDataFrame
.Trying to set values using chained assignment (for example,
df["a"][1:3] = 0
) will now always raise a warning when Copy-on-Write is enabled. In this mode, chained assignment can never work because we are always setting into a temporary object that is the result of an indexing operation (getitem), which under Copy-on-Write always behaves as a copy. Thus, assigning through a chain can never update the original Series or DataFrame. Therefore, an informative warning is raised to the user to avoid silently doing nothing (GH49467)DataFrame.replace()
will now respect the Copy-on-Write mechanism wheninplace=True
.DataFrame.transpose()
will now respect the Copy-on-Write mechanism.Arithmetic operations that can be inplace, e.g.
ser *= 2
will now respect the Copy-on-Write mechanism.DataFrame.__getitem__()
will now respect the Copy-on-Write mechanism when theDataFrame
hasMultiIndex
columns.Series.__getitem__()
will now respect the Copy-on-Write mechanism when theSeries
has aMultiIndex
.
Series.view()
will now respect the Copy-on-Write mechanism.
Copy-on-Write can be enabled through one of
pd.set_option("mode.copy_on_write", True)
pd.options.mode.copy_on_write = True
Alternatively, copy on write can be enabled locally through:
with pd.option_context("mode.copy_on_write", True):
...
Other enhancements#
Added support for
str
accessor methods when usingArrowDtype
with apyarrow.string
type (GH50325)Added support for
dt
accessor methods when usingArrowDtype
with apyarrow.timestamp
type (GH50954)read_sas()
now supports usingencoding='infer'
to correctly read and use the encoding specified by the sas file. (GH48048)DataFrameGroupBy.quantile()
,SeriesGroupBy.quantile()
andDataFrameGroupBy.std()
now preserve nullable dtypes instead of casting to numpy dtypes (GH37493)DataFrameGroupBy.std()
,SeriesGroupBy.std()
now support datetime64, timedelta64, andDatetimeTZDtype
dtypes (GH48481)Series.add_suffix()
,DataFrame.add_suffix()
,Series.add_prefix()
andDataFrame.add_prefix()
support anaxis
argument. Ifaxis
is set, the default behaviour of which axis to consider can be overwritten (GH47819)testing.assert_frame_equal()
now shows the first element where the DataFrames differ, analogously topytest
’s output (GH47910)Added
index
parameter toDataFrame.to_dict()
(GH46398)Added support for extension array dtypes in
merge()
(GH44240)Added metadata propagation for binary operators on
DataFrame
(GH28283)Added
cumsum
,cumprod
,cummin
andcummax
to theExtensionArray
interface via_accumulate
(GH28385)CategoricalConversionWarning
,InvalidComparison
,InvalidVersion
,LossySetitemError
, andNoBufferPresent
are now exposed inpandas.errors
(GH27656)Fix
test
optional_extra by adding missing test packagepytest-asyncio
(GH48361)DataFrame.astype()
exception message thrown improved to include column name when type conversion is not possible. (GH47571)date_range()
now supports aunit
keyword (“s”, “ms”, “us”, or “ns”) to specify the desired resolution of the output index (GH49106)timedelta_range()
now supports aunit
keyword (“s”, “ms”, “us”, or “ns”) to specify the desired resolution of the output index (GH49824)DataFrame.to_json()
now supports amode
keyword with supported inputs ‘w’ and ‘a’. Defaulting to ‘w’, ‘a’ can be used when lines=True and orient=’records’ to append record oriented json lines to an existing json file. (GH35849)Added
name
parameter toIntervalIndex.from_breaks()
,IntervalIndex.from_arrays()
andIntervalIndex.from_tuples()
(GH48911)Improve exception message when using
testing.assert_frame_equal()
on aDataFrame
to include the column that is compared (GH50323)Improved error message for
merge_asof()
when join-columns were duplicated (GH50102)Added support for extension array dtypes to
get_dummies()
(GH32430)Added
Index.infer_objects()
analogous toSeries.infer_objects()
(GH50034)Added
copy
parameter toSeries.infer_objects()
andDataFrame.infer_objects()
, passingFalse
will avoid making copies for series or columns that are already non-object or where no better dtype can be inferred (GH50096)DataFrame.plot.hist()
now recognizesxlabel
andylabel
arguments (GH49793)Series.drop_duplicates()
has gainedignore_index
keyword to reset index (GH48304)Series.dropna()
andDataFrame.dropna()
has gainedignore_index
keyword to reset index (GH31725)Improved error message in
to_datetime()
for non-ISO8601 formats, informing users about the position of the first error (GH50361)Improved error message when trying to align
DataFrame
objects (for example, inDataFrame.compare()
) to clarify that “identically labelled” refers to both index and columns (GH50083)Added support for
Index.min()
andIndex.max()
for pyarrow string dtypes (GH51397)Added
DatetimeIndex.as_unit()
andTimedeltaIndex.as_unit()
to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns” (GH50616)Added
Series.dt.unit()
andSeries.dt.as_unit()
to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns” (GH51223)Added new argument
dtype
toread_sql()
to be consistent withread_sql_query()
(GH50797)read_csv()
,read_table()
,read_fwf()
andread_excel()
now acceptdate_format
(GH50601)to_datetime()
now accepts"ISO8601"
as an argument toformat
, which will match any ISO8601 string (but possibly not identically-formatted) (GH50411)to_datetime()
now accepts"mixed"
as an argument toformat
, which will infer the format for each element individually (GH50972)Added new argument
engine
toread_json()
to support parsing JSON with pyarrow by specifyingengine="pyarrow"
(GH48893)Added support for SQLAlchemy 2.0 (GH40686)
Added support for
decimal
parameter whenengine="pyarrow"
inread_csv()
(GH51302)Index
set operationsIndex.union()
,Index.intersection()
,Index.difference()
, andIndex.symmetric_difference()
now supportsort=True
, which will always return a sorted result, unlike the defaultsort=None
which does not sort in some cases (GH25151)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
DataFrameGroupBy.cumsum()
and DataFrameGroupBy.cumprod()
overflow instead of lossy casting to float#
In previous versions we cast to float when applying cumsum
and cumprod
which
lead to incorrect results even if the result could be hold by int64
dtype.
Additionally, the aggregation overflows consistent with numpy and the regular
DataFrame.cumprod()
and DataFrame.cumsum()
methods when the limit of
int64
is reached (GH37493).
Old Behavior
In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16
We return incorrect results with the 6th value.
New Behavior
In [19]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [20]: df.groupby("key")["value"].cumprod()
Out[20]:
0 625
1 390625
2 244140625
3 152587890625
4 95367431640625
5 59604644775390625
6 359414837200037393
Name: value, dtype: int64
We overflow with the 7th value, but the 6th value is still correct.
DataFrameGroupBy.nth()
and SeriesGroupBy.nth()
now behave as filtrations#
In previous versions of pandas, DataFrameGroupBy.nth()
and
SeriesGroupBy.nth()
acted as if they were aggregations. However, for most
inputs n
, they may return either zero or multiple rows per group. This means
that they are filtrations, similar to e.g. DataFrameGroupBy.head()
. pandas
now treats them as filtrations (GH13666).
In [21]: df = pd.DataFrame({"a": [1, 1, 2, 1, 2], "b": [np.nan, 2.0, 3.0, 4.0, 5.0]})
In [22]: gb = df.groupby("a")
Old Behavior
In [5]: gb.nth(n=1)
Out[5]:
A B
1 1 2.0
4 2 5.0
New Behavior
In [23]: gb.nth(n=1)
Out[23]:
a b
1 1 2.0
4 2 5.0
In particular, the index of the result is derived from the input by selecting
the appropriate rows. Also, when n
is larger than the group, no rows instead of
NaN
is returned.
Old Behavior
In [5]: gb.nth(n=3, dropna="any")
Out[5]:
B
A
1 NaN
2 NaN
New Behavior
In [24]: gb.nth(n=3, dropna="any")
Out[24]:
Empty DataFrame
Columns: [a, b]
Index: []
Backwards incompatible API changes#
Construction with datetime64 or timedelta64 dtype with unsupported resolution#
In past versions, when constructing a Series
or DataFrame
and
passing a “datetime64” or “timedelta64” dtype with unsupported resolution
(i.e. anything other than “ns”), pandas would silently replace the given dtype
with its nanosecond analogue:
Previous behavior:
In [5]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[5]:
0 2016-01-01
dtype: datetime64[ns]
In [6] pd.Series(["2016-01-01"], dtype="datetime64[D]")
Out[6]:
0 2016-01-01
dtype: datetime64[ns]
In pandas 2.0 we support resolutions “s”, “ms”, “us”, and “ns”. When passing a supported dtype (e.g. “datetime64[s]”), the result now has exactly the requested dtype:
New behavior:
In [25]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[25]:
0 2016-01-01
dtype: datetime64[s]
With an un-supported dtype, pandas now raises instead of silently swapping in a supported dtype:
New behavior:
In [26]: pd.Series(["2016-01-01"], dtype="datetime64[D]")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[26], line 1
----> 1 pd.Series(["2016-01-01"], dtype="datetime64[D]")
File ~/work/pandas/pandas/pandas/core/series.py:509, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
507 data = data.copy()
508 else:
--> 509 data = sanitize_array(data, index, dtype, copy)
511 manager = get_option("mode.data_manager")
512 if manager == "block":
File ~/work/pandas/pandas/pandas/core/construction.py:599, in sanitize_array(data, index, dtype, copy, allow_2d)
596 subarr = np.array([], dtype=np.float64)
598 elif dtype is not None:
--> 599 subarr = _try_cast(data, dtype, copy)
601 else:
602 subarr = maybe_convert_platform(data)
File ~/work/pandas/pandas/pandas/core/construction.py:756, in _try_cast(arr, dtype, copy)
751 return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(
752 shape
753 )
755 elif dtype.kind in ["m", "M"]:
--> 756 return maybe_cast_to_datetime(arr, dtype)
758 # GH#15832: Check if we are requesting a numeric dtype and
759 # that we can convert the data to the requested dtype.
760 elif is_integer_dtype(dtype):
761 # this will raise if we have e.g. floats
File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1221, in maybe_cast_to_datetime(value, dtype)
1217 raise TypeError("value must be listlike")
1219 # TODO: _from_sequence would raise ValueError in cases where
1220 # _ensure_nanosecond_dtype raises TypeError
-> 1221 _ensure_nanosecond_dtype(dtype)
1223 if is_timedelta64_dtype(dtype):
1224 res = TimedeltaArray._from_sequence(value, dtype=dtype)
File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1279, in _ensure_nanosecond_dtype(dtype)
1276 raise ValueError(msg)
1277 # TODO: ValueError or TypeError? existing test
1278 # test_constructor_generic_timestamp_bad_frequency expects TypeError
-> 1279 raise TypeError(
1280 f"dtype={dtype} is not supported. Supported resolutions are 's', "
1281 "'ms', 'us', and 'ns'"
1282 )
TypeError: dtype=datetime64[D] is not supported. Supported resolutions are 's', 'ms', 'us', and 'ns'
Value counts sets the resulting name to count
#
In past versions, when running Series.value_counts()
, the result would inherit
the original object’s name, and the result index would be nameless. This would cause
confusion when resetting the index, and the column names would not correspond with the
column values.
Now, the result name will be 'count'
(or 'proportion'
if normalize=True
was passed),
and the index will be named after the original object (GH49497).
Previous behavior:
In [8]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
Out[2]:
quetzal 2
elk 1
Name: animal, dtype: int64
New behavior:
In [27]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
Out[27]:
animal
quetzal 2
elk 1
Name: count, dtype: int64
Likewise for other value_counts
methods (for example, DataFrame.value_counts()
).
Disallow astype conversion to non-supported datetime64/timedelta64 dtypes#
In previous versions, converting a Series
or DataFrame
from datetime64[ns]
to a different datetime64[X]
dtype would return
with datetime64[ns]
dtype instead of the requested dtype. In pandas 2.0,
support is added for “datetime64[s]”, “datetime64[ms]”, and “datetime64[us]” dtypes,
so converting to those dtypes gives exactly the requested dtype:
Previous behavior:
In [28]: idx = pd.date_range("2016-01-01", periods=3)
In [29]: ser = pd.Series(idx)
Previous behavior:
In [4]: ser.astype("datetime64[s]")
Out[4]:
0 2016-01-01
1 2016-01-02
2 2016-01-03
dtype: datetime64[ns]
With the new behavior, we get exactly the requested dtype:
New behavior:
In [30]: ser.astype("datetime64[s]")
Out[30]:
0 2016-01-01
1 2016-01-02
2 2016-01-03
dtype: datetime64[s]
For non-supported resolutions e.g. “datetime64[D]”, we raise instead of silently ignoring the requested dtype:
New behavior:
In [31]: ser.astype("datetime64[D]")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[31], line 1
----> 1 ser.astype("datetime64[D]")
File ~/work/pandas/pandas/pandas/core/generic.py:6324, in NDFrame.astype(self, dtype, copy, errors)
6317 results = [
6318 self.iloc[:, i].astype(dtype, copy=copy)
6319 for i in range(len(self.columns))
6320 ]
6322 else:
6323 # else, only a single dtype is given
-> 6324 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6325 return self._constructor(new_data).__finalize__(self, method="astype")
6327 # GH 33113: handle empty frame or series
File ~/work/pandas/pandas/pandas/core/internals/managers.py:451, in BaseBlockManager.astype(self, dtype, copy, errors)
448 elif using_copy_on_write():
449 copy = False
--> 451 return self.apply(
452 "astype",
453 dtype=dtype,
454 copy=copy,
455 errors=errors,
456 using_cow=using_copy_on_write(),
457 )
File ~/work/pandas/pandas/pandas/core/internals/managers.py:352, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
350 applied = b.apply(f, **kwargs)
351 else:
--> 352 applied = getattr(b, f)(**kwargs)
353 result_blocks = extend_blocks(applied, result_blocks)
355 out = type(self).from_blocks(result_blocks, self.axes)
File ~/work/pandas/pandas/pandas/core/internals/blocks.py:511, in Block.astype(self, dtype, copy, errors, using_cow)
491 """
492 Coerce to the new dtype.
493
(...)
507 Block
508 """
509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
513 new_values = maybe_coerce_values(new_values)
515 refs = None
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:242, in astype_array_safe(values, dtype, copy, errors)
239 dtype = dtype.numpy_dtype
241 try:
--> 242 new_values = astype_array(values, dtype, copy=copy)
243 except (ValueError, TypeError):
244 # e.g. _astype_nansafe can fail on object-dtype of strings
245 # trying to convert to float
246 if errors == "ignore":
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:184, in astype_array(values, dtype, copy)
180 return values
182 if not isinstance(values, np.ndarray):
183 # i.e. ExtensionArray
--> 184 values = values.astype(dtype, copy=copy)
186 else:
187 values = _astype_nansafe(values, dtype, copy=copy)
File ~/work/pandas/pandas/pandas/core/arrays/datetimes.py:701, in DatetimeArray.astype(self, dtype, copy)
699 elif is_period_dtype(dtype):
700 return self.to_period(freq=dtype.freq)
--> 701 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
File ~/work/pandas/pandas/pandas/core/arrays/datetimelike.py:487, in DatetimeLikeArrayMixin.astype(self, dtype, copy)
480 elif (
481 is_datetime_or_timedelta_dtype(dtype)
482 and not is_dtype_equal(self.dtype, dtype)
483 ) or is_float_dtype(dtype):
484 # disallow conversion between datetime/timedelta,
485 # and conversions for any datetimelike to float
486 msg = f"Cannot cast {type(self).__name__} to dtype {dtype}"
--> 487 raise TypeError(msg)
488 else:
489 return np.asarray(self, dtype=dtype)
TypeError: Cannot cast DatetimeArray to dtype datetime64[D]
For conversion from timedelta64[ns]
dtypes, the old behavior converted
to a floating point format.
Previous behavior:
In [32]: idx = pd.timedelta_range("1 Day", periods=3)
In [33]: ser = pd.Series(idx)
Previous behavior:
In [7]: ser.astype("timedelta64[s]")
Out[7]:
0 86400.0
1 172800.0
2 259200.0
dtype: float64
In [8]: ser.astype("timedelta64[D]")
Out[8]:
0 1.0
1 2.0
2 3.0
dtype: float64
The new behavior, as for datetime64, either gives exactly the requested dtype or raises:
New behavior:
In [34]: ser.astype("timedelta64[s]")
Out[34]:
0 1 days 00:00:00
1 2 days 00:00:00
2 3 days 00:00:00
dtype: timedelta64[s]
In [35]: ser.astype("timedelta64[D]")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[35], line 1
----> 1 ser.astype("timedelta64[D]")
File ~/work/pandas/pandas/pandas/core/generic.py:6324, in NDFrame.astype(self, dtype, copy, errors)
6317 results = [
6318 self.iloc[:, i].astype(dtype, copy=copy)
6319 for i in range(len(self.columns))
6320 ]
6322 else:
6323 # else, only a single dtype is given
-> 6324 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6325 return self._constructor(new_data).__finalize__(self, method="astype")
6327 # GH 33113: handle empty frame or series
File ~/work/pandas/pandas/pandas/core/internals/managers.py:451, in BaseBlockManager.astype(self, dtype, copy, errors)
448 elif using_copy_on_write():
449 copy = False
--> 451 return self.apply(
452 "astype",
453 dtype=dtype,
454 copy=copy,
455 errors=errors,
456 using_cow=using_copy_on_write(),
457 )
File ~/work/pandas/pandas/pandas/core/internals/managers.py:352, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
350 applied = b.apply(f, **kwargs)
351 else:
--> 352 applied = getattr(b, f)(**kwargs)
353 result_blocks = extend_blocks(applied, result_blocks)
355 out = type(self).from_blocks(result_blocks, self.axes)
File ~/work/pandas/pandas/pandas/core/internals/blocks.py:511, in Block.astype(self, dtype, copy, errors, using_cow)
491 """
492 Coerce to the new dtype.
493
(...)
507 Block
508 """
509 values = self.values
--> 511 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
513 new_values = maybe_coerce_values(new_values)
515 refs = None
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:242, in astype_array_safe(values, dtype, copy, errors)
239 dtype = dtype.numpy_dtype
241 try:
--> 242 new_values = astype_array(values, dtype, copy=copy)
243 except (ValueError, TypeError):
244 # e.g. _astype_nansafe can fail on object-dtype of strings
245 # trying to convert to float
246 if errors == "ignore":
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:184, in astype_array(values, dtype, copy)
180 return values
182 if not isinstance(values, np.ndarray):
183 # i.e. ExtensionArray
--> 184 values = values.astype(dtype, copy=copy)
186 else:
187 values = _astype_nansafe(values, dtype, copy=copy)
File ~/work/pandas/pandas/pandas/core/arrays/timedeltas.py:363, in TimedeltaArray.astype(self, dtype, copy)
359 return type(self)._simple_new(
360 res_values, dtype=res_values.dtype, freq=self.freq
361 )
362 else:
--> 363 raise ValueError(
364 f"Cannot convert from {self.dtype} to {dtype}. "
365 "Supported resolutions are 's', 'ms', 'us', 'ns'"
366 )
368 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)
ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'
UTC and fixed-offset timezones default to standard-library tzinfo objects#
In previous versions, the default tzinfo
object used to represent UTC
was pytz.UTC
. In pandas 2.0, we default to datetime.timezone.utc
instead.
Similarly, for timezones represent fixed UTC offsets, we use datetime.timezone
objects instead of pytz.FixedOffset
objects. See (GH34916)
Previous behavior:
In [2]: ts = pd.Timestamp("2016-01-01", tz="UTC")
In [3]: type(ts.tzinfo)
Out[3]: pytz.UTC
In [4]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")
In [3]: type(ts2.tzinfo)
Out[5]: pytz._FixedOffset
New behavior:
In [36]: ts = pd.Timestamp("2016-01-01", tz="UTC")
In [37]: type(ts.tzinfo)
Out[37]: datetime.timezone
In [38]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")
In [39]: type(ts2.tzinfo)
Out[39]: datetime.timezone
For timezones that are neither UTC nor fixed offsets, e.g. “US/Pacific”, we
continue to default to pytz
objects.
Empty DataFrames/Series will now default to have a RangeIndex
#
Before, constructing an empty (where data
is None
or an empty list-like argument) Series
or DataFrame
without
specifying the axes (index=None
, columns=None
) would return the axes as empty Index
with object dtype.
Now, the axes return an empty RangeIndex
(GH49572).
Previous behavior:
In [8]: pd.Series().index
Out[8]:
Index([], dtype='object')
In [9] pd.DataFrame().axes
Out[9]:
[Index([], dtype='object'), Index([], dtype='object')]
New behavior:
In [40]: pd.Series().index
Out[40]: RangeIndex(start=0, stop=0, step=1)
In [41]: pd.DataFrame().axes
Out[41]: [RangeIndex(start=0, stop=0, step=1), RangeIndex(start=0, stop=0, step=1)]
DataFrame to LaTeX has a new render engine#
The existing DataFrame.to_latex()
has been restructured to utilise the
extended implementation previously available under Styler.to_latex()
.
The arguments signature is similar, albeit col_space
has been removed since
it is ignored by LaTeX engines. This render engine also requires jinja2
as a
dependency which needs to be installed, since rendering is based upon jinja2 templates.
The pandas latex options below are no longer used and have been removed. The generic max rows and columns arguments remain but for this functionality should be replaced by the Styler equivalents. The alternative options giving similar functionality are indicated below:
display.latex.escape
: replaced withstyler.format.escape
,display.latex.longtable
: replaced withstyler.latex.environment
,display.latex.multicolumn
,display.latex.multicolumn_format
anddisplay.latex.multirow
: replaced withstyler.sparse.rows
,styler.sparse.columns
,styler.latex.multirow_align
andstyler.latex.multicol_align
,display.latex.repr
: replaced withstyler.render.repr
,display.max_rows
anddisplay.max_columns
: replace withstyler.render.max_rows
,styler.render.max_columns
andstyler.render.max_elements
.
Note that due to this change some defaults have also changed:
multirow
now defaults to True.multirow_align
defaults to “r” instead of “l”.multicol_align
defaults to “r” instead of “l”.escape
now defaults to False.
Note that the behaviour of _repr_latex_
is also changed. Previously
setting display.latex.repr
would generate LaTeX only when using nbconvert for a
JupyterNotebook, and not when the user is running the notebook. Now the
styler.render.repr
option allows control of the specific output
within JupyterNotebooks for operations (not just on nbconvert). See GH39911.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
---|---|---|---|
mypy (dev) |
1.0 |
X |
|
pytest (dev) |
7.0.0 |
X |
|
pytest-xdist (dev) |
2.2.0 |
X |
|
hypothesis (dev) |
6.34.2 |
X |
|
python-dateutil |
2.8.2 |
X |
X |
tzdata |
2022.1 |
X |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
---|---|---|
pyarrow |
7.0.0 |
X |
matplotlib |
3.6.1 |
X |
fastparquet |
0.6.3 |
X |
xarray |
0.21.0 |
X |
See Dependencies and Optional dependencies for more.
Datetimes are now parsed with a consistent format#
In the past, to_datetime()
guessed the format for each element independently. This was appropriate for some cases where elements had mixed date formats - however, it would regularly cause problems when users expected a consistent format but the function would switch formats between elements. As of version 2.0.0, parsing will use a consistent format, determined by the first non-NA value (unless the user specifies a format, in which case that is used).
Old behavior:
In [1]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [2]: pd.to_datetime(ser)
Out[2]:
0 2000-01-13
1 2000-12-01
dtype: datetime64[ns]
New behavior:
In [42]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [43]: pd.to_datetime(ser)
Out[43]:
0 2000-01-13
1 2000-01-12
dtype: datetime64[ns]
Note that this affects read_csv()
as well.
If you still need to parse dates with inconsistent formats, you can use
format='mixed'
(possibly alongside dayfirst
)
ser = pd.Series(['13-01-2000', '12 January 2000'])
pd.to_datetime(ser, format='mixed', dayfirst=True)
or, if your formats are all ISO8601 (but possibly not identically-formatted)
ser = pd.Series(['2020-01-01', '2020-01-01 03:00'])
pd.to_datetime(ser, format='ISO8601')
Other API changes#
The
freq
,tz
,nanosecond
, andunit
keywords in theTimestamp
constructor are now keyword-only (GH45307, GH32526)Passing
nanoseconds
greater than 999 or less than 0 inTimestamp
now raises aValueError
(GH48538, GH48255)read_csv()
: specifying an incorrect number of columns withindex_col
of now raisesParserError
instead ofIndexError
when using the c parser.Default value of
dtype
inget_dummies()
is changed tobool
fromuint8
(GH45848)DataFrame.astype()
,Series.astype()
, andDatetimeIndex.astype()
casting datetime64 data to any of “datetime64[s]”, “datetime64[ms]”, “datetime64[us]” will return an object with the given resolution instead of coercing back to “datetime64[ns]” (GH48928)DataFrame.astype()
,Series.astype()
, andDatetimeIndex.astype()
casting timedelta64 data to any of “timedelta64[s]”, “timedelta64[ms]”, “timedelta64[us]” will return an object with the given resolution instead of coercing to “float64” dtype (GH48963)DatetimeIndex.astype()
,TimedeltaIndex.astype()
,PeriodIndex.astype()
Series.astype()
,DataFrame.astype()
withdatetime64
,timedelta64
orPeriodDtype
dtypes no longer allow converting to integer dtypes other than “int64”, doobj.astype('int64', copy=False).astype(dtype)
instead (GH49715)Index.astype()
now allows casting fromfloat64
dtype to datetime-like dtypes, matchingSeries
behavior (GH49660)Passing data with dtype of “timedelta64[s]”, “timedelta64[ms]”, or “timedelta64[us]” to
TimedeltaIndex
,Series
, orDataFrame
constructors will now retain that dtype instead of casting to “timedelta64[ns]”; timedelta64 data with lower resolution will be cast to the lowest supported resolution “timedelta64[s]” (GH49014)Passing
dtype
of “timedelta64[s]”, “timedelta64[ms]”, or “timedelta64[us]” toTimedeltaIndex
,Series
, orDataFrame
constructors will now retain that dtype instead of casting to “timedelta64[ns]”; passing a dtype with lower resolution forSeries
orDataFrame
will be cast to the lowest supported resolution “timedelta64[s]” (GH49014)Passing a
np.datetime64
object with non-nanosecond resolution toTimestamp
will retain the input resolution if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH49008)Passing
datetime64
values with resolution other than nanosecond toto_datetime()
will retain the input resolution if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH50369)Passing integer values and a non-nanosecond datetime64 dtype (e.g. “datetime64[s]”)
DataFrame
,Series
, orIndex
will treat the values as multiples of the dtype’s unit, matching the behavior of e.g.Series(np.array(values, dtype="M8[s]"))
(GH51092)Passing a string in ISO-8601 format to
Timestamp
will retain the resolution of the parsed input if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH49737)The
other
argument inDataFrame.mask()
andSeries.mask()
now defaults tono_default
instead ofnp.nan
consistent withDataFrame.where()
andSeries.where()
. Entries will be filled with the corresponding NULL value (np.nan
for numpy dtypes,pd.NA
for extension dtypes). (GH49111)Changed behavior of
Series.quantile()
andDataFrame.quantile()
withSparseDtype
to retain sparse dtype (GH49583)When creating a
Series
with a object-dtypeIndex
of datetime objects, pandas no longer silently converts the index to aDatetimeIndex
(GH39307, GH23598)pandas.testing.assert_index_equal()
with parameterexact="equiv"
now considers two indexes equal when both are either aRangeIndex
orIndex
with anint64
dtype. Previously it meant either aRangeIndex
or aInt64Index
(GH51098)Series.unique()
with dtype “timedelta64[ns]” or “datetime64[ns]” now returnsTimedeltaArray
orDatetimeArray
instead ofnumpy.ndarray
(GH49176)to_datetime()
andDatetimeIndex
now allow sequences containing bothdatetime
objects and numeric entries, matchingSeries
behavior (GH49037, GH50453)pandas.api.types.is_string_dtype()
now only returnsTrue
for array-likes withdtype=object
when the elements are inferred to be strings (GH15585)Passing a sequence containing
datetime
objects anddate
objects toSeries
constructor will return withobject
dtype instead ofdatetime64[ns]
dtype, consistent withIndex
behavior (GH49341)Passing strings that cannot be parsed as datetimes to
Series
orDataFrame
withdtype="datetime64[ns]"
will raise instead of silently ignoring the keyword and returningobject
dtype (GH24435)Passing a sequence containing a type that cannot be converted to
Timedelta
toto_timedelta()
or to theSeries
orDataFrame
constructor withdtype="timedelta64[ns]"
or toTimedeltaIndex
now raisesTypeError
instead ofValueError
(GH49525)Changed behavior of
Index
constructor with sequence containing at least oneNaT
and everything else eitherNone
orNaN
to inferdatetime64[ns]
dtype instead ofobject
, matchingSeries
behavior (GH49340)read_stata()
with parameterindex_col
set toNone
(the default) will now set the index on the returnedDataFrame
to aRangeIndex
instead of aInt64Index
(GH49745)Changed behavior of
Index
,Series
, andDataFrame
arithmetic methods when working with object-dtypes, the results no longer do type inference on the result of the array operations, useresult.infer_objects(copy=False)
to do type inference on the result (GH49999, GH49714)Changed behavior of
Index
constructor with an object-dtypenumpy.ndarray
containing all-bool
values or all-complex values, this will now retain object dtype, consistent with theSeries
behavior (GH49594)Changed behavior of
Series.astype()
from object-dtype containingbytes
objects to string dtypes; this now doesval.decode()
on bytes objects instead ofstr(val)
, matchingIndex.astype()
behavior (GH45326)Added
"None"
to defaultna_values
inread_csv()
(GH50286)Changed behavior of
Series
andDataFrame
constructors when given an integer dtype and floating-point data that is not round numbers, this now raisesValueError
instead of silently retaining the float dtype; doSeries(data)
orDataFrame(data)
to get the old behavior, andSeries(data).astype(dtype)
orDataFrame(data).astype(dtype)
to get the specified dtype (GH49599)Changed behavior of
DataFrame.shift()
withaxis=1
, an integerfill_value
, and homogeneous datetime-like dtype, this now fills new columns with integer dtypes instead of casting to datetimelike (GH49842)Files are now closed when encountering an exception in
read_json()
(GH49921)Changed behavior of
read_csv()
,read_json()
&read_fwf()
, where the index will now always be aRangeIndex
, when no index is specified. Previously the index would be aIndex
with dtypeobject
if the new DataFrame/Series has length 0 (GH49572)DataFrame.values()
,DataFrame.to_numpy()
,DataFrame.xs()
,DataFrame.reindex()
,DataFrame.fillna()
, andDataFrame.replace()
no longer silently consolidate the underlying arrays; dodf = df.copy()
to ensure consolidation (GH49356)Creating a new DataFrame using a full slice on both axes with
loc
oriloc
(thus,df.loc[:, :]
ordf.iloc[:, :]
) now returns a new DataFrame (shallow copy) instead of the original DataFrame, consistent with other methods to get a full slice (for exampledf.loc[:]
ordf[:]
) (GH49469)The
Series
andDataFrame
constructors will now return a shallow copy (i.e. share data, but not attributes) when passed a Series and DataFrame, respectively, and with the default ofcopy=False
(and if no other keyword triggers a copy). Previously, the new Series or DataFrame would share the index attribute (e.g.df.index = ...
would also update the index of the parent or child) (GH49523)Disallow computing
cumprod
forTimedelta
object; previously this returned incorrect values (GH50246)DataFrame
objects read from aHDFStore
file without an index now have aRangeIndex
instead of anint64
index (GH51076)Instantiating an
Index
with an numeric numpy dtype with data containingNA
and/orNaT
now raises aValueError
. Previously aTypeError
was raised (GH51050)Loading a JSON file with duplicate columns using
read_json(orient='split')
renames columns to avoid duplicates, asread_csv()
and the other readers do (GH50370)The levels of the index of the
Series
returned fromSeries.sparse.from_coo
now always have dtypeint32
. Previously they had dtypeint64
(GH50926)to_datetime()
withunit
of either “Y” or “M” will now raise if a sequence contains a non-roundfloat
value, matching theTimestamp
behavior (GH50301)The methods
Series.round()
,DataFrame.__invert__()
,Series.__invert__()
,DataFrame.swapaxes()
,DataFrame.first()
,DataFrame.last()
,Series.first()
,Series.last()
andDataFrame.align()
will now always return new objects (GH51032)DataFrame
andDataFrameGroupBy
aggregations (e.g. “sum”) with object-dtype columns no longer infer non-object dtypes for their results, explicitly callresult.infer_objects(copy=False)
on the result to obtain the old behavior (GH51205, GH49603)Division by zero with
ArrowDtype
dtypes returns-inf
,nan
, orinf
depending on the numerator, instead of raising (GH51541)Added
pandas.api.types.is_any_real_numeric_dtype()
to check for real numeric dtypes (GH51152)value_counts()
now returns data withArrowDtype
withpyarrow.int64
type instead of"Int64"
type (GH51462)ArrowExtensionArray
comparison methods now return data withArrowDtype
withpyarrow.bool_
type instead of"boolean"
dtype (GH51643)factorize()
andunique()
preserve the original dtype when passed numpy timedelta64 or datetime64 with non-nanosecond resolution (GH48670)
Note
A current PDEP proposes the deprecation and removal of the keywords inplace
and copy
for all but a small subset of methods from the pandas API. The current discussion takes place
at here. The keywords won’t be necessary
anymore in the context of Copy-on-Write. If this proposal is accepted, both
keywords would be deprecated in the next release of pandas and removed in pandas 3.0.
Deprecations#
Deprecated parsing datetime strings with system-local timezone to
tzlocal
, pass atz
keyword or explicitly calltz_localize
instead (GH50791)Deprecated argument
infer_datetime_format
into_datetime()
andread_csv()
, as a strict version of it is now the default (GH48621)Deprecated behavior of
to_datetime()
withunit
when parsing strings, in a future version these will be parsed as datetimes (matching unit-less behavior) instead of cast to floats. To retain the old behavior, cast strings to numeric types before callingto_datetime()
(GH50735)Deprecated
pandas.io.sql.execute()
(GH50185)Index.is_boolean()
has been deprecated. Usepandas.api.types.is_bool_dtype()
instead (GH50042)Index.is_integer()
has been deprecated. Usepandas.api.types.is_integer_dtype()
instead (GH50042)Index.is_floating()
has been deprecated. Usepandas.api.types.is_float_dtype()
instead (GH50042)Index.holds_integer()
has been deprecated. Usepandas.api.types.infer_dtype()
instead (GH50243)Index.is_numeric()
has been deprecated. Usepandas.api.types.is_any_real_numeric_dtype()
instead (GH50042,:issue:51152)Index.is_categorical()
has been deprecated. Usepandas.api.types.is_categorical_dtype()
instead (GH50042)Index.is_object()
has been deprecated. Usepandas.api.types.is_object_dtype()
instead (GH50042)Index.is_interval()
has been deprecated. Usepandas.api.types.is_interval_dtype()
instead (GH50042)Deprecated argument
date_parser
inread_csv()
,read_table()
,read_fwf()
, andread_excel()
in favour ofdate_format
(GH50601)Deprecated
all
andany
reductions withdatetime64
andDatetimeTZDtype
dtypes, use e.g.(obj != pd.Timestamp(0), tz=obj.tz).all()
instead (GH34479)Deprecated unused arguments
*args
and**kwargs
inResampler
(GH50977)Deprecated calling
float
orint
on a single elementSeries
to return afloat
orint
respectively. Extract the element before callingfloat
orint
instead (GH51101)Deprecated
Grouper.groups()
, useGroupby.groups()
instead (GH51182)Deprecated
Grouper.grouper()
, useGroupby.grouper()
instead (GH51182)Deprecated
Grouper.obj()
, useGroupby.obj()
instead (GH51206)Deprecated
Grouper.indexer()
, useResampler.indexer()
instead (GH51206)Deprecated
Grouper.ax()
, useResampler.ax()
instead (GH51206)Deprecated keyword
use_nullable_dtypes
inread_parquet()
, usedtype_backend
instead (GH51853)Deprecated
Series.pad()
in favor ofSeries.ffill()
(GH33396)Deprecated
Series.backfill()
in favor ofSeries.bfill()
(GH33396)Deprecated
DataFrame.pad()
in favor ofDataFrame.ffill()
(GH33396)Deprecated
DataFrame.backfill()
in favor ofDataFrame.bfill()
(GH33396)Deprecated
close()
. UseStataReader
as a context manager instead (GH49228)
Removal of prior version deprecations/changes#
Removed
Int64Index
,UInt64Index
andFloat64Index
. See also here for more information (GH42717)Removed deprecated
Timestamp.freq
,Timestamp.freqstr
and argumentfreq
from theTimestamp
constructor andTimestamp.fromordinal()
(GH14146)Removed deprecated
CategoricalBlock
,Block.is_categorical()
, require datetime64 and timedelta64 values to be wrapped inDatetimeArray
orTimedeltaArray
before passing toBlock.make_block_same_class()
, requireDatetimeTZBlock.values
to have the correct ndim when passing to theBlockManager
constructor, and removed the “fastpath” keyword from theSingleBlockManager
constructor (GH40226, GH40571)Removed deprecated global option
use_inf_as_null
in favor ofuse_inf_as_na
(GH17126)Removed deprecated module
pandas.core.index
(GH30193)Removed deprecated alias
pandas.core.tools.datetimes.to_time
, import the function directly frompandas.core.tools.times
instead (GH34145)Removed deprecated alias
pandas.io.json.json_normalize
, import the function directly frompandas.json_normalize
instead (GH27615)Removed deprecated
Categorical.to_dense()
, usenp.asarray(cat)
instead (GH32639)Removed deprecated
Categorical.take_nd()
(GH27745)Removed deprecated
Categorical.mode()
, useSeries(cat).mode()
instead (GH45033)Removed deprecated
Categorical.is_dtype_equal()
andCategoricalIndex.is_dtype_equal()
(GH37545)Removed deprecated
CategoricalIndex.take_nd()
(GH30702)Removed deprecated
Index.is_type_compatible()
(GH42113)Removed deprecated
Index.is_mixed()
, checkindex.inferred_type
directly instead (GH32922)Removed deprecated
pandas.api.types.is_categorical()
; usepandas.api.types.is_categorical_dtype()
instead (GH33385)Removed deprecated
Index.asi8()
(GH37877)Enforced deprecation changing behavior when passing
datetime64[ns]
dtype data and timezone-aware dtype toSeries
, interpreting the values as wall-times instead of UTC times, matchingDatetimeIndex
behavior (GH41662)Enforced deprecation changing behavior when applying a numpy ufunc on multiple non-aligned (on the index or columns)
DataFrame
that will now align the inputs first (GH39239)Removed deprecated
DataFrame._AXIS_NUMBERS()
,DataFrame._AXIS_NAMES()
,Series._AXIS_NUMBERS()
,Series._AXIS_NAMES()
(GH33637)Removed deprecated
Index.to_native_types()
, useobj.astype(str)
instead (GH36418)Removed deprecated
Series.iteritems()
,DataFrame.iteritems()
, useobj.items
instead (GH45321)Removed deprecated
DataFrame.lookup()
(GH35224)Removed deprecated
Series.append()
,DataFrame.append()
, useconcat()
instead (GH35407)Removed deprecated
Series.iteritems()
,DataFrame.iteritems()
andHDFStore.iteritems()
useobj.items
instead (GH45321)Removed deprecated
DatetimeIndex.union_many()
(GH45018)Removed deprecated
weekofyear
andweek
attributes ofDatetimeArray
,DatetimeIndex
anddt
accessor in favor ofisocalendar().week
(GH33595)Removed deprecated
RangeIndex._start()
,RangeIndex._stop()
,RangeIndex._step()
, usestart
,stop
,step
instead (GH30482)Removed deprecated
DatetimeIndex.to_perioddelta()
, Usedtindex - dtindex.to_period(freq).to_timestamp()
instead (GH34853)Removed deprecated
Styler.hide_index()
andStyler.hide_columns()
(GH49397)Removed deprecated
Styler.set_na_rep()
andStyler.set_precision()
(GH49397)Removed deprecated
Styler.where()
(GH49397)Removed deprecated
Styler.render()
(GH49397)Removed deprecated argument
col_space
inDataFrame.to_latex()
(GH47970)Removed deprecated argument
null_color
inStyler.highlight_null()
(GH49397)Removed deprecated argument
check_less_precise
intesting.assert_frame_equal()
,testing.assert_extension_array_equal()
,testing.assert_series_equal()
,testing.assert_index_equal()
(GH30562)Removed deprecated
null_counts
argument inDataFrame.info()
. Useshow_counts
instead (GH37999)Removed deprecated
Index.is_monotonic()
, andSeries.is_monotonic()
; useobj.is_monotonic_increasing
instead (GH45422)Removed deprecated
Index.is_all_dates()
(GH36697)Enforced deprecation disallowing passing a timezone-aware
Timestamp
anddtype="datetime64[ns]"
toSeries
orDataFrame
constructors (GH41555)Enforced deprecation disallowing passing a sequence of timezone-aware values and
dtype="datetime64[ns]"
to toSeries
orDataFrame
constructors (GH41555)Enforced deprecation disallowing
numpy.ma.mrecords.MaskedRecords
in theDataFrame
constructor; pass"{name: data[name] for name in data.dtype.names}
instead (GH40363)Enforced deprecation disallowing unit-less “datetime64” dtype in
Series.astype()
andDataFrame.astype()
(GH47844)Enforced deprecation disallowing using
.astype
to convert adatetime64[ns]
Series
,DataFrame
, orDatetimeIndex
to timezone-aware dtype, useobj.tz_localize
orser.dt.tz_localize
instead (GH39258)Enforced deprecation disallowing using
.astype
to convert a timezone-awareSeries
,DataFrame
, orDatetimeIndex
to timezone-naivedatetime64[ns]
dtype, useobj.tz_localize(None)
orobj.tz_convert("UTC").tz_localize(None)
instead (GH39258)Enforced deprecation disallowing passing non boolean argument to sort in
concat()
(GH44629)Removed Date parser functions
parse_date_time()
,parse_date_fields()
,parse_all_fields()
andgeneric_parser()
(GH24518)Removed argument
index
from thecore.arrays.SparseArray
constructor (GH43523)Remove argument
squeeze
fromDataFrame.groupby()
andSeries.groupby()
(GH32380)Removed deprecated
apply
,apply_index
,__call__
,onOffset
, andisAnchored
attributes fromDateOffset
(GH34171)Removed
keep_tz
argument inDatetimeIndex.to_series()
(GH29731)Remove arguments
names
anddtype
fromIndex.copy()
andlevels
andcodes
fromMultiIndex.copy()
(GH35853, GH36685)Remove argument
inplace
fromMultiIndex.set_levels()
andMultiIndex.set_codes()
(GH35626)Removed arguments
verbose
andencoding
fromDataFrame.to_excel()
andSeries.to_excel()
(GH47912)Removed argument
line_terminator
fromDataFrame.to_csv()
andSeries.to_csv()
, uselineterminator
instead (GH45302)Removed argument
inplace
fromDataFrame.set_axis()
andSeries.set_axis()
, useobj = obj.set_axis(..., copy=False)
instead (GH48130)Disallow passing positional arguments to
MultiIndex.set_levels()
andMultiIndex.set_codes()
(GH41485)Disallow parsing to Timedelta strings with components with units “Y”, “y”, or “M”, as these do not represent unambiguous durations (GH36838)
Removed
MultiIndex.is_lexsorted()
andMultiIndex.lexsort_depth()
(GH38701)Removed argument
how
fromPeriodIndex.astype()
, usePeriodIndex.to_timestamp()
instead (GH37982)Removed argument
try_cast
fromDataFrame.mask()
,DataFrame.where()
,Series.mask()
andSeries.where()
(GH38836)Removed argument
tz
fromPeriod.to_timestamp()
, useobj.to_timestamp(...).tz_localize(tz)
instead (GH34522)Removed argument
sort_columns
inDataFrame.plot()
andSeries.plot()
(GH47563)Removed argument
is_copy
fromDataFrame.take()
andSeries.take()
(GH30615)Removed argument
kind
fromIndex.get_slice_bound()
,Index.slice_indexer()
andIndex.slice_locs()
(GH41378)Removed arguments
prefix
,squeeze
,error_bad_lines
andwarn_bad_lines
fromread_csv()
(GH40413, GH43427)Removed arguments
squeeze
fromread_excel()
(GH43427)Removed argument
datetime_is_numeric
fromDataFrame.describe()
andSeries.describe()
as datetime data will always be summarized as numeric data (GH34798)Disallow passing list
key
toSeries.xs()
andDataFrame.xs()
, pass a tuple instead (GH41789)Disallow subclass-specific keywords (e.g. “freq”, “tz”, “names”, “closed”) in the
Index
constructor (GH38597)Removed argument
inplace
fromCategorical.remove_unused_categories()
(GH37918)Disallow passing non-round floats to
Timestamp
withunit="M"
orunit="Y"
(GH47266)Remove keywords
convert_float
andmangle_dupe_cols
fromread_excel()
(GH41176)Remove keyword
mangle_dupe_cols
fromread_csv()
andread_table()
(GH48137)Removed
errors
keyword fromDataFrame.where()
,Series.where()
,DataFrame.mask()
andSeries.mask()
(GH47728)Disallow passing non-keyword arguments to
read_excel()
exceptio
andsheet_name
(GH34418)Disallow passing non-keyword arguments to
DataFrame.drop()
andSeries.drop()
exceptlabels
(GH41486)Disallow passing non-keyword arguments to
DataFrame.fillna()
andSeries.fillna()
exceptvalue
(GH41485)Disallow passing non-keyword arguments to
StringMethods.split()
andStringMethods.rsplit()
except forpat
(GH47448)Disallow passing non-keyword arguments to
DataFrame.set_index()
exceptkeys
(GH41495)Disallow passing non-keyword arguments to
Resampler.interpolate()
exceptmethod
(GH41699)Disallow passing non-keyword arguments to
DataFrame.reset_index()
andSeries.reset_index()
exceptlevel
(GH41496)Disallow passing non-keyword arguments to
DataFrame.dropna()
andSeries.dropna()
(GH41504)Disallow passing non-keyword arguments to
ExtensionArray.argsort()
(GH46134)Disallow passing non-keyword arguments to
Categorical.sort_values()
(GH47618)Disallow passing non-keyword arguments to
Index.drop_duplicates()
andSeries.drop_duplicates()
(GH41485)Disallow passing non-keyword arguments to
DataFrame.drop_duplicates()
except forsubset
(GH41485)Disallow passing non-keyword arguments to
DataFrame.sort_index()
andSeries.sort_index()
(GH41506)Disallow passing non-keyword arguments to
DataFrame.interpolate()
andSeries.interpolate()
except formethod
(GH41510)Disallow passing non-keyword arguments to
DataFrame.any()
andSeries.any()
(GH44896)Disallow passing non-keyword arguments to
Index.set_names()
except fornames
(GH41551)Disallow passing non-keyword arguments to
Index.join()
except forother
(GH46518)Disallow passing non-keyword arguments to
concat()
except forobjs
(GH41485)Disallow passing non-keyword arguments to
pivot()
except fordata
(GH48301)Disallow passing non-keyword arguments to
DataFrame.pivot()
(GH48301)Disallow passing non-keyword arguments to
read_html()
except forio
(GH27573)Disallow passing non-keyword arguments to
read_json()
except forpath_or_buf
(GH27573)Disallow passing non-keyword arguments to
read_sas()
except forfilepath_or_buffer
(GH47154)Disallow passing non-keyword arguments to
read_stata()
except forfilepath_or_buffer
(GH48128)Disallow passing non-keyword arguments to
read_csv()
exceptfilepath_or_buffer
(GH41485)Disallow passing non-keyword arguments to
read_table()
exceptfilepath_or_buffer
(GH41485)Disallow passing non-keyword arguments to
read_fwf()
exceptfilepath_or_buffer
(GH44710)Disallow passing non-keyword arguments to
read_xml()
except forpath_or_buffer
(GH45133)Disallow passing non-keyword arguments to
Series.mask()
andDataFrame.mask()
exceptcond
andother
(GH41580)Disallow passing non-keyword arguments to
DataFrame.to_stata()
except forpath
(GH48128)Disallow passing non-keyword arguments to
DataFrame.where()
andSeries.where()
except forcond
andother
(GH41523)Disallow passing non-keyword arguments to
Series.set_axis()
andDataFrame.set_axis()
except forlabels
(GH41491)Disallow passing non-keyword arguments to
Series.rename_axis()
andDataFrame.rename_axis()
except formapper
(GH47587)Disallow passing non-keyword arguments to
Series.clip()
andDataFrame.clip()
exceptlower
andupper
(GH41511)Disallow passing non-keyword arguments to
Series.bfill()
,Series.ffill()
,DataFrame.bfill()
andDataFrame.ffill()
(GH41508)Disallow passing non-keyword arguments to
DataFrame.replace()
,Series.replace()
except forto_replace
andvalue
(GH47587)Disallow passing non-keyword arguments to
DataFrame.sort_values()
except forby
(GH41505)Disallow passing non-keyword arguments to
Series.sort_values()
(GH41505)Disallow passing non-keyword arguments to
DataFrame.reindex()
except forlabels
(GH17966)Disallow
Index.reindex()
with non-uniqueIndex
objects (GH42568)Disallowed constructing
Categorical
with scalardata
(GH38433)Disallowed constructing
CategoricalIndex
without passingdata
(GH38944)Removed
Rolling.validate()
,Expanding.validate()
, andExponentialMovingWindow.validate()
(GH43665)Removed
Rolling.win_type
returning"freq"
(GH38963)Removed
Rolling.is_datetimelike
(GH38963)Removed the
level
keyword inDataFrame
andSeries
aggregations; usegroupby
instead (GH39983)Removed deprecated
Timedelta.delta()
,Timedelta.is_populated()
, andTimedelta.freq
(GH46430, GH46476)Removed deprecated
NaT.freq
(GH45071)Removed deprecated
Categorical.replace()
, useSeries.replace()
instead (GH44929)Removed the
numeric_only
keyword fromCategorical.min()
andCategorical.max()
in favor ofskipna
(GH48821)Changed behavior of
DataFrame.median()
andDataFrame.mean()
withnumeric_only=None
to not exclude datetime-like columns THIS NOTE WILL BE IRRELEVANT ONCEnumeric_only=None
DEPRECATION IS ENFORCED (GH29941)Removed
is_extension_type()
in favor ofis_extension_array_dtype()
(GH29457)Removed
.ExponentialMovingWindow.vol
(GH39220)Removed
Index.get_value()
andIndex.set_value()
(GH33907, GH28621)Removed
Series.slice_shift()
andDataFrame.slice_shift()
(GH37601)Remove
DataFrameGroupBy.pad()
andDataFrameGroupBy.backfill()
(GH45076)Remove
numpy
argument fromread_json()
(GH30636)Disallow passing abbreviations for
orient
inDataFrame.to_dict()
(GH32516)Disallow partial slicing on an non-monotonic
DatetimeIndex
with keys which are not in Index. This now raises aKeyError
(GH18531)Removed
get_offset
in favor ofto_offset()
(GH30340)Removed the
warn
keyword ininfer_freq()
(GH45947)Removed the
include_start
andinclude_end
arguments inDataFrame.between_time()
in favor ofinclusive
(GH43248)Removed the
closed
argument indate_range()
andbdate_range()
in favor ofinclusive
argument (GH40245)Removed the
center
keyword inDataFrame.expanding()
(GH20647)Removed the
method
andtolerance
arguments inIndex.get_loc()
. Useindex.get_indexer([label], method=..., tolerance=...)
instead (GH42269)Removed the
pandas.datetime
submodule (GH30489)Removed the
pandas.np
submodule (GH30296)Removed
pandas.util.testing
in favor ofpandas.testing
(GH30745)Removed
Series.str.__iter__()
(GH28277)Removed
pandas.SparseArray
in favor ofarrays.SparseArray
(GH30642)Removed
pandas.SparseSeries
andpandas.SparseDataFrame
, including pickle support. (GH30642)Enforced disallowing passing an integer
fill_value
toDataFrame.shift()
andSeries.shift`()
with datetime64, timedelta64, or period dtypes (GH32591)Enforced disallowing a string column label into
times
inDataFrame.ewm()
(GH43265)Enforced disallowing passing
True
andFalse
intoinclusive
inSeries.between()
in favor of"both"
and"neither"
respectively (GH40628)Enforced disallowing using
usecols
with out of bounds indices forread_csv
withengine="c"
(GH25623)Enforced disallowing the use of
**kwargs
inExcelWriter
; use the keyword argumentengine_kwargs
instead (GH40430)Enforced disallowing a tuple of column labels into
DataFrameGroupBy.__getitem__()
(GH30546)Enforced disallowing missing labels when indexing with a sequence of labels on a level of a
MultiIndex
. This now raises aKeyError
(GH42351)Enforced disallowing setting values with
.loc
using a positional slice. Use.loc
with labels or.iloc
with positions instead (GH31840)Enforced disallowing positional indexing with a
float
key even if that key is a round number, manually cast to integer instead (GH34193)Enforced disallowing using a
DataFrame
indexer with.iloc
, use.loc
instead for automatic alignment (GH39022)Enforced disallowing
set
ordict
indexers in__getitem__
and__setitem__
methods (GH42825)Enforced disallowing indexing on a
Index
or positional indexing on aSeries
producing multi-dimensional objects e.g.obj[:, None]
, convert to numpy before indexing instead (GH35141)Enforced disallowing
dict
orset
objects insuffixes
inmerge()
(GH34810)Enforced disallowing
merge()
to produce duplicated columns through thesuffixes
keyword and already existing columns (GH22818)Enforced disallowing using
merge()
orjoin()
on a different number of levels (GH34862)Enforced disallowing
value_name
argument inDataFrame.melt()
to match an element in theDataFrame
columns (GH35003)Enforced disallowing passing
showindex
into**kwargs
inDataFrame.to_markdown()
andSeries.to_markdown()
in favor ofindex
(GH33091)Removed setting Categorical._codes directly (GH41429)
Removed setting Categorical.categories directly (GH47834)
Removed argument
inplace
fromCategorical.add_categories()
,Categorical.remove_categories()
,Categorical.set_categories()
,Categorical.rename_categories()
,Categorical.reorder_categories()
,Categorical.set_ordered()
,Categorical.as_ordered()
,Categorical.as_unordered()
(GH37981, GH41118, GH41133, GH47834)Enforced
Rolling.count()
withmin_periods=None
to default to the size of the window (GH31302)Renamed
fname
topath
inDataFrame.to_parquet()
,DataFrame.to_stata()
andDataFrame.to_feather()
(GH30338)Enforced disallowing indexing a
Series
with a single item list with a slice (e.g.ser[[slice(0, 2)]]
). Either convert the list to tuple, or pass the slice directly instead (GH31333)Changed behavior indexing on a
DataFrame
with aDatetimeIndex
index using a string indexer, previously this operated as a slice on rows, now it operates like any other column key; useframe.loc[key]
for the old behavior (GH36179)Enforced the
display.max_colwidth
option to not accept negative integers (GH31569)Removed the
display.column_space
option in favor ofdf.to_string(col_space=...)
(GH47280)Removed the deprecated method
mad
from pandas classes (GH11787)Removed the deprecated method
tshift
from pandas classes (GH11631)Changed behavior of empty data passed into
Series
; the default dtype will beobject
instead offloat64
(GH29405)Changed the behavior of
DatetimeIndex.union()
,DatetimeIndex.intersection()
, andDatetimeIndex.symmetric_difference()
with mismatched timezones to convert to UTC instead of casting to object dtype (GH39328)Changed the behavior of
to_datetime()
with argument “now” withutc=False
to matchTimestamp("now")
(GH18705)Changed the behavior of indexing on a timezone-aware
DatetimeIndex
with a timezone-naivedatetime
object or vice-versa; these now behave like any other non-comparable type by raisingKeyError
(GH36148)Changed the behavior of
Index.reindex()
,Series.reindex()
, andDataFrame.reindex()
with adatetime64
dtype and adatetime.date
object forfill_value
; these are no longer considered equivalent todatetime.datetime
objects so the reindex casts to object dtype (GH39767)Changed behavior of
SparseArray.astype()
when given a dtype that is not explicitlySparseDtype
, cast to the exact requested dtype rather than silently using aSparseDtype
instead (GH34457)Changed behavior of
Index.ravel()
to return a view on the originalIndex
instead of anp.ndarray
(GH36900)Changed behavior of
Series.to_frame()
andIndex.to_frame()
with explicitname=None
to useNone
for the column name instead of the index’s name or default0
(GH45523)Changed behavior of
concat()
with one array ofbool
-dtype and another of integer dtype, this now returnsobject
dtype instead of integer dtype; explicitly cast the bool object to integer before concatenating to get the old behavior (GH45101)Changed behavior of
DataFrame
constructor given floating-pointdata
and an integerdtype
, when the data cannot be cast losslessly, the floating point dtype is retained, matchingSeries
behavior (GH41170)Changed behavior of
Index
constructor when given anp.ndarray
with object-dtype containing numeric entries; this now retains object dtype rather than inferring a numeric dtype, consistent withSeries
behavior (GH42870)Changed behavior of
Index.__and__()
,Index.__or__()
andIndex.__xor__()
to behave as logical operations (matchingSeries
behavior) instead of aliases for set operations (GH37374)Changed behavior of
DataFrame
constructor when passed a list whose first element is aCategorical
, this now treats the elements as rows casting toobject
dtype, consistent with behavior for other types (GH38845)Changed behavior of
DataFrame
constructor when passed adtype
(other than int) that the data cannot be cast to; it now raises instead of silently ignoring the dtype (GH41733)Changed the behavior of
Series
constructor, it will no longer infer a datetime64 or timedelta64 dtype from string entries (GH41731)Changed behavior of
Timestamp
constructor with anp.datetime64
object and atz
passed to interpret the input as a wall-time as opposed to a UTC time (GH42288)Changed behavior of
Timestamp.utcfromtimestamp()
to return a timezone-aware object satisfyingTimestamp.utcfromtimestamp(val).timestamp() == val
(GH45083)Changed behavior of
Index
constructor when passed aSparseArray
orSparseDtype
to retain that dtype instead of casting tonumpy.ndarray
(GH43930)Changed behavior of setitem-like operations (
__setitem__
,fillna
,where
,mask
,replace
,insert
, fill_value forshift
) on an object withDatetimeTZDtype
when using a value with a non-matching timezone, the value will be cast to the object’s timezone instead of casting both to object-dtype (GH44243)Changed behavior of
Index
,Series
,DataFrame
constructors with floating-dtype data and aDatetimeTZDtype
, the data are now interpreted as UTC-times instead of wall-times, consistent with how integer-dtype data are treated (GH45573)Changed behavior of
Series
andDataFrame
constructors with integer dtype and floating-point data containingNaN
, this now raisesIntCastingNaNError
(GH40110)Changed behavior of
Series
andDataFrame
constructors with an integerdtype
and values that are too large to losslessly cast to this dtype, this now raisesValueError
(GH41734)Changed behavior of
Series
andDataFrame
constructors with an integerdtype
and values having eitherdatetime64
ortimedelta64
dtypes, this now raisesTypeError
, usevalues.view("int64")
instead (GH41770)Removed the deprecated
base
andloffset
arguments frompandas.DataFrame.resample()
,pandas.Series.resample()
andpandas.Grouper
. Useoffset
ororigin
instead (GH31809)Changed behavior of
Series.fillna()
andDataFrame.fillna()
withtimedelta64[ns]
dtype and an incompatiblefill_value
; this now casts toobject
dtype instead of raising, consistent with the behavior with other dtypes (GH45746)Change the default argument of
regex
forSeries.str.replace()
fromTrue
toFalse
. Additionally, a single characterpat
withregex=True
is now treated as a regular expression instead of a string literal. (GH36695, GH24804)Changed behavior of
DataFrame.any()
andDataFrame.all()
withbool_only=True
; object-dtype columns with all-bool values will no longer be included, manually cast tobool
dtype first (GH46188)Changed behavior of
DataFrame.max()
,DataFrame.min
,DataFrame.mean
,DataFrame.median
,DataFrame.skew
,DataFrame.kurt
withaxis=None
to return a scalar applying the aggregation across both axes (GH45072)Changed behavior of comparison of a
Timestamp
with adatetime.date
object; these now compare as un-equal and raise on inequality comparisons, matching thedatetime.datetime
behavior (GH36131)Changed behavior of comparison of
NaT
with adatetime.date
object; these now raise on inequality comparisons (GH39196)Enforced deprecation of silently dropping columns that raised a
TypeError
inSeries.transform
andDataFrame.transform
when used with a list or dictionary (GH43740)Changed behavior of
DataFrame.apply()
with list-like so that any partial failure will raise an error (GH43740)Changed behaviour of
DataFrame.to_latex()
to now use the Styler implementation viaStyler.to_latex()
(GH47970)Changed behavior of
Series.__setitem__()
with an integer key and aFloat64Index
when the key is not present in the index; previously we treated the key as positional (behaving likeseries.iloc[key] = val
), now we treat it is a label (behaving likeseries.loc[key] = val
), consistent withSeries.__getitem__`()
behavior (GH33469)Removed
na_sentinel
argument fromfactorize()
,Index.factorize()
, andExtensionArray.factorize()
(GH47157)Changed behavior of
Series.diff()
andDataFrame.diff()
withExtensionDtype
dtypes whose arrays do not implementdiff
, these now raiseTypeError
rather than casting to numpy (GH31025)Enforced deprecation of calling numpy “ufunc”s on
DataFrame
withmethod="outer"
; this now raisesNotImplementedError
(GH36955)Enforced deprecation disallowing passing
numeric_only=True
toSeries
reductions (rank
,any
,all
, …) with non-numeric dtype (GH47500)Changed behavior of
DataFrameGroupBy.apply()
andSeriesGroupBy.apply()
so thatgroup_keys
is respected even if a transformer is detected (GH34998)Comparisons between a
DataFrame
and aSeries
where the frame’s columns do not match the series’s index raiseValueError
instead of automatically aligning, doleft, right = left.align(right, axis=1, copy=False)
before comparing (GH36795)Enforced deprecation
numeric_only=None
(the default) in DataFrame reductions that would silently drop columns that raised;numeric_only
now defaults toFalse
(GH41480)Changed default of
numeric_only
toFalse
in all DataFrame methods with that argument (GH46096, GH46906)Changed default of
numeric_only
toFalse
inSeries.rank()
(GH47561)Enforced deprecation of silently dropping nuisance columns in groupby and resample operations when
numeric_only=False
(GH41475)Enforced deprecation of silently dropping nuisance columns in
Rolling
,Expanding
, andExponentialMovingWindow
ops. This will now raise aerrors.DataError
(GH42834)Changed behavior in setting values with
df.loc[:, foo] = bar
ordf.iloc[:, foo] = bar
, these now always attempt to set values inplace before falling back to casting (GH45333)Changed default of
numeric_only
in variousDataFrameGroupBy
methods; all methods now default tonumeric_only=False
(GH46072)Changed default of
numeric_only
toFalse
inResampler
methods (GH47177)Using the method
DataFrameGroupBy.transform()
with a callable that returns DataFrames will align to the input’s index (GH47244)When providing a list of columns of length one to
DataFrame.groupby()
, the keys that are returned by iterating over the resultingDataFrameGroupBy
object will now be tuples of length one (GH47761)Removed deprecated methods
ExcelWriter.write_cells()
,ExcelWriter.save()
,ExcelWriter.cur_sheet()
,ExcelWriter.handles()
,ExcelWriter.path()
(GH45795)The
ExcelWriter
attributebook
can no longer be set; it is still available to be accessed and mutated (GH48943)Removed unused
*args
and**kwargs
inRolling
,Expanding
, andExponentialMovingWindow
ops (GH47851)Removed the deprecated argument
line_terminator
fromDataFrame.to_csv()
(GH45302)Removed the deprecated argument
label
fromlreshape()
(GH30219)Arguments after
expr
inDataFrame.eval()
andDataFrame.query()
are keyword-only (GH47587)Removed
Index._get_attributes_dict()
(GH50648)Removed
Series.__array_wrap__()
(GH50648)Changed behavior of
DataFrame.value_counts()
to return aSeries
withMultiIndex
for any list-like(one element or not) but anIndex
for a single label (GH50829)
Performance improvements#
Performance improvement in
DataFrameGroupBy.median()
andSeriesGroupBy.median()
andDataFrameGroupBy.cumprod()
for nullable dtypes (GH37493)Performance improvement in
DataFrameGroupBy.all()
,DataFrameGroupBy.any()
,SeriesGroupBy.all()
, andSeriesGroupBy.any()
for object dtype (GH50623)Performance improvement in
MultiIndex.argsort()
andMultiIndex.sort_values()
(GH48406)Performance improvement in
MultiIndex.size()
(GH48723)Performance improvement in
MultiIndex.union()
without missing values and without duplicates (GH48505, GH48752)Performance improvement in
MultiIndex.difference()
(GH48606)Performance improvement in
MultiIndex
set operations with sort=None (GH49010)Performance improvement in
DataFrameGroupBy.mean()
,SeriesGroupBy.mean()
,DataFrameGroupBy.var()
, andSeriesGroupBy.var()
for extension array dtypes (GH37493)Performance improvement in
MultiIndex.isin()
whenlevel=None
(GH48622, GH49577)Performance improvement in
MultiIndex.putmask()
(GH49830)Performance improvement in
Index.union()
andMultiIndex.union()
when index contains duplicates (GH48900)Performance improvement in
Series.rank()
for pyarrow-backed dtypes (GH50264)Performance improvement in
Series.searchsorted()
for pyarrow-backed dtypes (GH50447)Performance improvement in
Series.fillna()
for extension array dtypes (GH49722, GH50078)Performance improvement in
Index.join()
,Index.intersection()
andIndex.union()
for masked and arrow dtypes whenIndex
is monotonic (GH50310, GH51365)Performance improvement for
Series.value_counts()
with nullable dtype (GH48338)Performance improvement for
Series
constructor passing integer numpy array with nullable dtype (GH48338)Performance improvement for
DatetimeIndex
constructor passing a list (GH48609)Performance improvement in
merge()
andDataFrame.join()
when joining on a sortedMultiIndex
(GH48504)Performance improvement in
to_datetime()
when parsing strings with timezone offsets (GH50107)Performance improvement in
DataFrame.loc()
andSeries.loc()
for tuple-based indexing of aMultiIndex
(GH48384)Performance improvement for
Series.replace()
with categorical dtype (GH49404)Performance improvement for
MultiIndex.unique()
(GH48335)Performance improvement for indexing operations with nullable and arrow dtypes (GH49420, GH51316)
Performance improvement for
concat()
with extension array backed indexes (GH49128, GH49178)Performance improvement for
api.types.infer_dtype()
(GH51054)Reduce memory usage of
DataFrame.to_pickle()
/Series.to_pickle()
when using BZ2 or LZMA (GH49068)Performance improvement for
StringArray
constructor passing a numpy array with typenp.str_
(GH49109)Performance improvement in
from_tuples()
(GH50620)Performance improvement in
factorize()
(GH49177)Performance improvement in
ArrowExtensionArray
comparison methods when array contains NA (GH50524)Performance improvement when parsing strings to
BooleanDtype
(GH50613)Performance improvement in
DataFrame.join()
when joining on a subset of aMultiIndex
(GH48611)Performance improvement for
MultiIndex.intersection()
(GH48604)Performance improvement in
DataFrame.__setitem__()
(GH46267)Performance improvement in
var
andstd
for nullable dtypes (GH48379).Performance improvement when iterating over pyarrow and nullable dtypes (GH49825, GH49851)
Performance improvements to
read_sas()
(GH47403, GH47405, GH47656, GH48502)Memory improvement in
RangeIndex.sort_values()
(GH48801)Performance improvement in
Series.to_numpy()
ifcopy=True
by avoiding copying twice (GH24345)Performance improvement in
Series.rename()
withMultiIndex
(GH21055)Performance improvement in
DataFrameGroupBy
andSeriesGroupBy
whenby
is a categorical type andsort=False
(GH48976)Performance improvement in
DataFrameGroupBy
andSeriesGroupBy
whenby
is a categorical type andobserved=False
(GH49596)Performance improvement in
read_stata()
with parameterindex_col
set toNone
(the default). Now the index will be aRangeIndex
instead ofInt64Index
(GH49745)Performance improvement in
merge()
when not merging on the index - the new index will now beRangeIndex
instead ofInt64Index
(GH49478)Performance improvement in
DataFrame.to_dict()
andSeries.to_dict()
when using any non-object dtypes (GH46470)Performance improvement in
read_html()
when there are multiple tables (GH49929)Performance improvement in
Period
constructor when constructing from a string or integer (GH38312)Performance improvement in
to_datetime()
when using'%Y%m%d'
format (GH17410)Performance improvement in
to_datetime()
when format is given or can be inferred (GH50465)Performance improvement in
Series.median()
for nullable dtypes (GH50838)Performance improvement in
read_csv()
when passingto_datetime()
lambda-function todate_parser
and inputs have mixed timezone offsetes (GH35296)Performance improvement in
SeriesGroupBy.value_counts()
with categorical dtype (GH46202)Fixed a reference leak in
read_hdf()
(GH37441)Fixed a memory leak in
DataFrame.to_json()
andSeries.to_json()
when serializing datetimes and timedeltas (GH40443)Decreased memory usage in many
DataFrameGroupBy
methods (GH51090)Performance improvement in
DataFrame.round()
for an integerdecimal
parameter (GH17254)Performance improvement in
DataFrame.replace()
andSeries.replace()
when using a large dict forto_replace
(GH6697)Memory improvement in
StataReader
when reading seekable files (GH48922)
Bug fixes#
Categorical#
Bug in
Categorical.set_categories()
losing dtype information (GH48812)Bug in
Series.replace()
with categorical dtype whento_replace
values overlap with new values (GH49404)Bug in
Series.replace()
with categorical dtype losing nullable dtypes of underlying categories (GH49404)Bug in
DataFrame.groupby()
andSeries.groupby()
would reorder categories when used as a grouper (GH48749)Bug in
Categorical
constructor when constructing from aCategorical
object anddtype="category"
losing ordered-ness (GH49309)Bug in
SeriesGroupBy.min()
,SeriesGroupBy.max()
,DataFrameGroupBy.min()
, andDataFrameGroupBy.max()
with unorderedCategoricalDtype
with no groups failing to raiseTypeError
(GH51034)
Datetimelike#
Bug in
pandas.infer_freq()
, raisingTypeError
when inferred onRangeIndex
(GH47084)Bug in
to_datetime()
incorrectly raisingOverflowError
with string arguments corresponding to large integers (GH50533)Bug in
to_datetime()
was raising on invalid offsets witherrors='coerce'
andinfer_datetime_format=True
(GH48633)Bug in
DatetimeIndex
constructor failing to raise whentz=None
is explicitly specified in conjunction with timezone-awaredtype
or data (GH48659)Bug in subtracting a
datetime
scalar fromDatetimeIndex
failing to retain the originalfreq
attribute (GH48818)Bug in
pandas.tseries.holiday.Holiday
where a half-open date interval causes inconsistent return types fromUSFederalHolidayCalendar.holidays()
(GH49075)Bug in rendering
DatetimeIndex
andSeries
andDataFrame
with timezone-aware dtypes withdateutil
orzoneinfo
timezones near daylight-savings transitions (GH49684)Bug in
to_datetime()
was raisingValueError
when parsingTimestamp
,datetime.datetime
,datetime.date
, ornp.datetime64
objects when non-ISO8601format
was passed (GH49298, GH50036)Bug in
to_datetime()
was raisingValueError
when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed asNaT
, for compatibility with how is done for ISO8601 formats (GH50251)Bug in
Timestamp
was showingUserWarning
, which was not actionable by users, when parsing non-ISO8601 delimited date strings (GH50232)Bug in
to_datetime()
was showing misleadingValueError
when parsing dates with format containing ISO week directive and ISO weekday directive (GH50308)Bug in
Timestamp.round()
when thefreq
argument has zero-duration (e.g. “0ns”) returning incorrect results instead of raising (GH49737)Bug in
to_datetime()
was not raisingValueError
when invalid format was passed anderrors
was'ignore'
or'coerce'
(GH50266)Bug in
DateOffset
was throwingTypeError
when constructing with milliseconds and another super-daily argument (GH49897)Bug in
to_datetime()
was not raisingValueError
when parsing string with decimal date with format'%Y%m%d'
(GH50051)Bug in
to_datetime()
was not convertingNone
toNaT
when parsing mixed-offset date strings with ISO8601 format (GH50071)Bug in
to_datetime()
was not returning input when parsing out-of-bounds date string witherrors='ignore'
andformat='%Y%m%d'
(GH14487)Bug in
to_datetime()
was converting timezone-naivedatetime.datetime
to timezone-aware when parsing with timezone-aware strings, ISO8601 format, andutc=False
(GH50254)Bug in
to_datetime()
was throwingValueError
when parsing dates with ISO8601 format where some values were not zero-padded (GH21422)Bug in
to_datetime()
was giving incorrect results when usingformat='%Y%m%d'
anderrors='ignore'
(GH26493)Bug in
to_datetime()
was failing to parse date strings'today'
and'now'
ifformat
was not ISO8601 (GH50359)Bug in
Timestamp.utctimetuple()
raising aTypeError
(GH32174)Bug in
to_datetime()
was raisingValueError
when parsing mixed-offsetTimestamp
witherrors='ignore'
(GH50585)Bug in
to_datetime()
was incorrectly handling floating-point inputs within 1unit
of the overflow boundaries (GH50183)Bug in
to_datetime()
with unit of “Y” or “M” giving incorrect results, not matching pointwiseTimestamp
results (GH50870)Bug in
Series.interpolate()
andDataFrame.interpolate()
with datetime or timedelta dtypes incorrectly raisingValueError
(GH11312)Bug in
to_datetime()
was not returning input witherrors='ignore'
when input was out-of-bounds (GH50587)Bug in
DataFrame.from_records()
when given aDataFrame
input with timezone-aware datetime64 columns incorrectly dropping the timezone-awareness (GH51162)Bug in
to_datetime()
was raisingdecimal.InvalidOperation
when parsing date strings witherrors='coerce'
(GH51084)Bug in
to_datetime()
with bothunit
andorigin
specified returning incorrect results (GH42624)Bug in
Series.astype()
andDataFrame.astype()
when converting an object-dtype object containing timezone-aware datetimes or strings todatetime64[ns]
incorrectly localizing as UTC instead of raisingTypeError
(GH50140)Bug in
DataFrameGroupBy.quantile()
andSeriesGroupBy.quantile()
with datetime or timedelta dtypes giving incorrect results for groups containingNaT
(GH51373)Bug in
DataFrameGroupBy.quantile()
andSeriesGroupBy.quantile()
incorrectly raising withPeriodDtype
orDatetimeTZDtype
(GH51373)
Timedelta#
Bug in
to_timedelta()
raising error when input has nullable dtypeFloat64
(GH48796)Bug in
Timedelta
constructor incorrectly raising instead of returningNaT
when given anp.timedelta64("nat")
(GH48898)Bug in
Timedelta
constructor failing to raise when passed both aTimedelta
object and keywords (e.g. days, seconds) (GH48898)Bug in
Timedelta
comparisons with very largedatetime.timedelta
objects incorrect raisingOutOfBoundsTimedelta
(GH49021)
Timezones#
Bug in
Series.astype()
andDataFrame.astype()
with object-dtype containing multiple timezone-awaredatetime
objects with heterogeneous timezones to aDatetimeTZDtype
incorrectly raising (GH32581)Bug in
to_datetime()
was failing to parse date strings with timezone name whenformat
was specified with%Z
(GH49748)Better error message when passing invalid values to
ambiguous
parameter inTimestamp.tz_localize()
(GH49565)Bug in string parsing incorrectly allowing a
Timestamp
to be constructed with an invalid timezone, which would raise when trying to print (GH50668)
Numeric#
Bug in
DataFrame.add()
cannot apply ufunc when inputs contain mixed DataFrame type and Series type (GH39853)Bug in arithmetic operations on
Series
not propagating mask when combining masked dtypes and numpy dtypes (GH45810, GH42630)Bug in
DataFrame.sem()
andSeries.sem()
where an erroneousTypeError
would always raise when using data backed by anArrowDtype
(GH49759)Bug in
Series.__add__()
casting to object for list and maskedSeries
(GH22962)Bug in
mode()
wheredropna=False
was not respected when there wasNA
values (GH50982)Bug in
DataFrame.query()
withengine="numexpr"
and column names aremin
ormax
would raise aTypeError
(GH50937)Bug in
DataFrame.min()
andDataFrame.max()
with tz-aware data containingpd.NaT
andaxis=1
would return incorrect results (GH51242)
Conversion#
Bug in constructing
Series
withint64
dtype from a string list raising instead of casting (GH44923)Bug in constructing
Series
with masked dtype and boolean values withNA
raising (GH42137)Bug in
DataFrame.eval()
incorrectly raising anAttributeError
when there are negative values in function call (GH46471)Bug in
Series.convert_dtypes()
not converting dtype to nullable dtype whenSeries
containsNA
and has dtypeobject
(GH48791)Bug where any
ExtensionDtype
subclass withkind="M"
would be interpreted as a timezone type (GH34986)Bug in
arrays.ArrowExtensionArray
that would raiseNotImplementedError
when passed a sequence of strings or binary (GH49172)Bug in
Series.astype()
raisingpyarrow.ArrowInvalid
when converting from a non-pyarrow string dtype to a pyarrow numeric type (GH50430)Bug in
DataFrame.astype()
modifying input array inplace when converting tostring
andcopy=False
(GH51073)Bug in
Series.to_numpy()
converting to NumPy array before applyingna_value
(GH48951)Bug in
DataFrame.astype()
not copying data when converting to pyarrow dtype (GH50984)Bug in
to_datetime()
was not respectingexact
argument whenformat
was an ISO8601 format (GH12649)Bug in
TimedeltaArray.astype()
raisingTypeError
when converting to a pyarrow duration type (GH49795)Bug in
DataFrame.eval()
andDataFrame.query()
raising for extension array dtypes (GH29618, GH50261, GH31913)Bug in
Series()
not copying data when created fromIndex
anddtype
is equal todtype
fromIndex
(GH52008)
Strings#
Bug in
pandas.api.types.is_string_dtype()
that would not returnTrue
forStringDtype
orArrowDtype
withpyarrow.string()
(GH15585)Bug in converting string dtypes to “datetime64[ns]” or “timedelta64[ns]” incorrectly raising
TypeError
(GH36153)Bug in setting values in a string-dtype column with an array, mutating the array as side effect when it contains missing values (GH51299)
Interval#
Bug in
IntervalIndex.is_overlapping()
incorrect output if interval has duplicate left boundaries (GH49581)Bug in
Series.infer_objects()
failing to inferIntervalDtype
for an object series ofInterval
objects (GH50090)Bug in
Series.shift()
withIntervalDtype
and invalid nullfill_value
failing to raiseTypeError
(GH51258)
Indexing#
Bug in
DataFrame.__setitem__()
raising when indexer is aDataFrame
withboolean
dtype (GH47125)Bug in
DataFrame.reindex()
filling with wrong values when indexing columns and index foruint
dtypes (GH48184)Bug in
DataFrame.loc()
when settingDataFrame
with different dtypes coercing values to single dtype (GH50467)Bug in
DataFrame.sort_values()
whereNone
was not returned whenby
is empty list andinplace=True
(GH50643)Bug in
DataFrame.loc()
coercing dtypes when setting values with a list indexer (GH49159)Bug in
Series.loc()
raising error for out of bounds end of slice indexer (GH50161)Bug in
DataFrame.loc()
raisingValueError
with allFalse
bool
indexer and empty object (GH51450)Bug in
DataFrame.loc()
raisingValueError
withbool
indexer andMultiIndex
(GH47687)Bug in
DataFrame.loc()
raisingIndexError
when setting values for a pyarrow-backed column with a non-scalar indexer (GH50085)Bug in
DataFrame.__getitem__()
,Series.__getitem__()
,DataFrame.__setitem__()
andSeries.__setitem__()
when indexing on indexes with extension float dtypes (Float64
&Float64
) or complex dtypes using integers (GH51053)Bug in
DataFrame.loc()
modifying object when setting incompatible value with an empty indexer (GH45981)Bug in
DataFrame.__setitem__()
raisingValueError
when right hand side isDataFrame
withMultiIndex
columns (GH49121)Bug in
DataFrame.reindex()
casting dtype toobject
whenDataFrame
has single extension array column when re-indexingcolumns
andindex
(GH48190)Bug in
DataFrame.iloc()
raisingIndexError
when indexer is aSeries
with numeric extension array dtype (GH49521)Bug in
describe()
when formatting percentiles in the resulting index showed more decimals than needed (GH46362)Bug in
DataFrame.compare()
does not recognize differences when comparingNA
with value in nullable dtypes (GH48939)Bug in
Series.rename()
withMultiIndex
losing extension array dtypes (GH21055)Bug in
DataFrame.isetitem()
coercing extension array dtypes inDataFrame
to object (GH49922)Bug in
Series.__getitem__()
returning corrupt object when selecting from an empty pyarrow backed object (GH51734)Bug in
BusinessHour
would cause creation ofDatetimeIndex
to fail when no opening hour was included in the index (GH49835)
Missing#
Bug in
Index.equals()
raisingTypeError
whenIndex
consists of tuples that containNA
(GH48446)Bug in
Series.map()
caused incorrect result when data has NaNs and defaultdict mapping was used (GH48813)Bug in
NA
raising aTypeError
instead of returnNA
when performing a binary operation with abytes
object (GH49108)Bug in
DataFrame.update()
withoverwrite=False
raisingTypeError
whenself
has column withNaT
values and column not present inother
(GH16713)Bug in
Series.replace()
raisingRecursionError
when replacing value in object-dtypeSeries
containingNA
(GH47480)Bug in
Series.replace()
raisingRecursionError
when replacing value in numericSeries
withNA
(GH50758)
MultiIndex#
Bug in
MultiIndex.get_indexer()
not matchingNaN
values (GH29252, GH37222, GH38623, GH42883, GH43222, GH46173, GH48905)Bug in
MultiIndex.argsort()
raisingTypeError
when index containsNA
(GH48495)Bug in
MultiIndex.difference()
losing extension array dtype (GH48606)Bug in
MultiIndex.set_levels
raisingIndexError
when setting empty level (GH48636)Bug in
MultiIndex.unique()
losing extension array dtype (GH48335)Bug in
MultiIndex.intersection()
losing extension array (GH48604)Bug in
MultiIndex.union()
losing extension array (GH48498, GH48505, GH48900)Bug in
MultiIndex.union()
not sorting when sort=None and index contains missing values (GH49010)Bug in
MultiIndex.append()
not checking names for equality (GH48288)Bug in
MultiIndex.symmetric_difference()
losing extension array (GH48607)Bug in
MultiIndex.join()
losing dtypes whenMultiIndex
has duplicates (GH49830)Bug in
MultiIndex.putmask()
losing extension array (GH49830)Bug in
MultiIndex.value_counts()
returning aSeries
indexed by flat index of tuples instead of aMultiIndex
(GH49558)
I/O#
Bug in
read_sas()
caused fragmentation ofDataFrame
and raisederrors.PerformanceWarning
(GH48595)Improved error message in
read_excel()
by including the offending sheet name when an exception is raised while reading a file (GH48706)Bug when a pickling a subset PyArrow-backed data that would serialize the entire data instead of the subset (GH42600)
Bug in
read_sql_query()
ignoringdtype
argument whenchunksize
is specified and result is empty (GH50245)Bug in
read_csv()
for a single-line csv with fewer columns thannames
raisederrors.ParserError
withengine="c"
(GH47566)Bug in
read_json()
raising withorient="table"
andNA
value (GH40255)Bug in displaying
string
dtypes not showing storage option (GH50099)Bug in
DataFrame.to_string()
withheader=False
that printed the index name on the same line as the first row of the data (GH49230)Bug in
DataFrame.to_string()
ignoring float formatter for extension arrays (GH39336)Fixed memory leak which stemmed from the initialization of the internal JSON module (GH49222)
Fixed issue where
json_normalize()
would incorrectly remove leading characters from column names that matched thesep
argument (GH49861)Bug in
read_csv()
unnecessarily overflowing for extension array dtype when containingNA
(GH32134)Bug in
DataFrame.to_dict()
not convertingNA
toNone
(GH50795)Bug in
DataFrame.to_json()
where it would segfault when failing to encode a string (GH50307)Bug in
DataFrame.to_html()
withna_rep
set when theDataFrame
contains non-scalar data (GH47103)Bug in
read_xml()
where file-like objects failed when iterparse is used (GH50641)Bug in
read_csv()
whenengine="pyarrow"
whereencoding
parameter was not handled correctly (GH51302)Bug in
read_xml()
ignored repeated elements when iterparse is used (GH51183)Bug in
ExcelWriter
leaving file handles open if an exception occurred during instantiation (GH51443)Bug in
DataFrame.to_parquet()
where non-string index or columns were raising aValueError
whenengine="pyarrow"
(GH52036)
Period#
Bug in
Period.strftime()
andPeriodIndex.strftime()
, raisingUnicodeDecodeError
when a locale-specific directive was passed (GH46319)Bug in adding a
Period
object to an array ofDateOffset
objects incorrectly raisingTypeError
(GH50162)Bug in
Period
where passing a string with finer resolution than nanosecond would result in aKeyError
instead of dropping the extra precision (GH50417)Bug in parsing strings representing Week-periods e.g. “2017-01-23/2017-01-29” as minute-frequency instead of week-frequency (GH50803)
Bug in
DataFrameGroupBy.sum()
,DataFrameGroupByGroupBy.cumsum()
,DataFrameGroupByGroupBy.prod()
,DataFrameGroupByGroupBy.cumprod()
withPeriodDtype
failing to raiseTypeError
(GH51040)Bug in parsing empty string with
Period
incorrectly raisingValueError
instead of returningNaT
(GH51349)
Plotting#
Bug in
DataFrame.plot.hist()
, not dropping elements ofweights
corresponding toNaN
values indata
(GH48884)ax.set_xlim
was sometimes raisingUserWarning
which users couldn’t address due toset_xlim
not accepting parsing arguments - the converter now usesTimestamp()
instead (GH49148)
Groupby/resample/rolling#
Bug in
ExponentialMovingWindow
withonline
not raising aNotImplementedError
for unsupported operations (GH48834)Bug in
DataFrameGroupBy.sample()
raisesValueError
when the object is empty (GH48459)Bug in
Series.groupby()
raisesValueError
when an entry of the index is equal to the name of the index (GH48567)Bug in
DataFrameGroupBy.resample()
produces inconsistent results when passing empty DataFrame (GH47705)Bug in
DataFrameGroupBy
andSeriesGroupBy
would not include unobserved categories in result when grouping by categorical indexes (GH49354)Bug in
DataFrameGroupBy
andSeriesGroupBy
would change result order depending on the input index when grouping by categoricals (GH49223)Bug in
DataFrameGroupBy
andSeriesGroupBy
when grouping on categorical data would sort result values even when used withsort=False
(GH42482)Bug in
DataFrameGroupBy.apply()
andSeriesGroupBy.apply
withas_index=False
would not attempt the computation without using the grouping keys when using them failed with aTypeError
(GH49256)Bug in
DataFrameGroupBy.describe()
would describe the group keys (GH49256)Bug in
SeriesGroupBy.describe()
withas_index=False
would have the incorrect shape (GH49256)Bug in
DataFrameGroupBy
andSeriesGroupBy
withdropna=False
would drop NA values when the grouper was categorical (GH36327)Bug in
SeriesGroupBy.nunique()
would incorrectly raise when the grouper was an empty categorical andobserved=True
(GH21334)Bug in
SeriesGroupBy.nth()
would raise when grouper contained NA values after subsetting from aDataFrameGroupBy
(GH26454)Bug in
DataFrame.groupby()
would not include aGrouper
specified bykey
in the result whenas_index=False
(GH50413)Bug in
DataFrameGroupBy.value_counts()
would raise when used with aTimeGrouper
(GH50486)Bug in
Resampler.size()
caused a wideDataFrame
to be returned instead of aSeries
withMultiIndex
(GH46826)Bug in
DataFrameGroupBy.transform()
andSeriesGroupBy.transform()
would raise incorrectly when grouper hadaxis=1
for"idxmin"
and"idxmax"
arguments (GH45986)Bug in
DataFrameGroupBy
would raise when used with an empty DataFrame, categorical grouper, anddropna=False
(GH50634)Bug in
SeriesGroupBy.value_counts()
did not respectsort=False
(GH50482)Bug in
DataFrameGroupBy.resample()
raisesKeyError
when getting the result from a key list when resampling on time index (GH50840)Bug in
DataFrameGroupBy.transform()
andSeriesGroupBy.transform()
would raise incorrectly when grouper hadaxis=1
for"ngroup"
argument (GH45986)Bug in
DataFrameGroupBy.describe()
produced incorrect results when data had duplicate columns (GH50806)Bug in
DataFrameGroupBy.agg()
withengine="numba"
failing to respectas_index=False
(GH51228)Bug in
DataFrameGroupBy.agg()
,SeriesGroupBy.agg()
, andResampler.agg()
would ignore arguments when passed a list of functions (GH50863)Bug in
DataFrameGroupBy.ohlc()
ignoringas_index=False
(GH51413)
Reshaping#
Bug in
DataFrame.pivot_table()
raisingTypeError
for nullable dtype andmargins=True
(GH48681)Bug in
DataFrame.unstack()
andSeries.unstack()
unstacking wrong level ofMultiIndex
whenMultiIndex
has mixed names (GH48763)Bug in
DataFrame.melt()
losing extension array dtype (GH41570)Bug in
DataFrame.pivot()
not respectingNone
as column name (GH48293)Bug in
DataFrame.join()
whenleft_on
orright_on
is or includes aCategoricalIndex
incorrectly raisingAttributeError
(GH48464)Bug in
DataFrame.pivot_table()
raisingValueError
with parametermargins=True
when result is an emptyDataFrame
(GH49240)Clarified error message in
merge()
when passing invalidvalidate
option (GH49417)Bug in
DataFrame.explode()
raisingValueError
on multiple columns withNaN
values or empty lists (GH46084)Bug in
DataFrame.transpose()
withIntervalDtype
column withtimedelta64[ns]
endpoints (GH44917)Bug in
DataFrame.agg()
andSeries.agg()
would ignore arguments when passed a list of functions (GH50863)
Sparse#
Bug in
Series.astype()
when converting aSparseDtype
withdatetime64[ns]
subtype toint64
dtype raising, inconsistent with the non-sparse behavior (GH49631,:issue:50087)Bug in
Series.astype()
when converting a fromdatetime64[ns]
toSparse[datetime64[ns]]
incorrectly raising (GH50082)Bug in
Series.sparse.to_coo()
raisingSystemError
whenMultiIndex
contains aExtensionArray
(GH50996)
ExtensionArray#
Bug in
Series.mean()
overflowing unnecessarily with nullable integers (GH48378)Bug in
Series.tolist()
for nullable dtypes returning numpy scalars instead of python scalars (GH49890)Bug in
Series.round()
for pyarrow-backed dtypes raisingAttributeError
(GH50437)Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (GH48510)
Bug in
array.PandasArray.to_numpy()
raising withNA
value whenna_value
is specified (GH40638)Bug in
api.types.is_numeric_dtype()
where a customExtensionDtype
would not returnTrue
if_is_numeric
returnedTrue
(GH50563)Bug in
api.types.is_integer_dtype()
,api.types.is_unsigned_integer_dtype()
,api.types.is_signed_integer_dtype()
,api.types.is_float_dtype()
where a customExtensionDtype
would not returnTrue
ifkind
returned the corresponding NumPy type (GH50667)Bug in
Series
constructor unnecessarily overflowing for nullable unsigned integer dtypes (GH38798, GH25880)Bug in setting non-string value into
StringArray
raisingValueError
instead ofTypeError
(GH49632)Bug in
DataFrame.reindex()
not honoring the defaultcopy=True
keyword in case of columns with ExtensionDtype (and as a result also selecting multiple columns with getitem ([]
) didn’t correctly result in a copy) (GH51197)Bug in
Series.any()
andSeries.all()
returningNA
for empty or all null pyarrow-backed data whenskipna=True
(GH51624)Bug in
ArrowExtensionArray
logical operations&
and|
raisingKeyError
(GH51688)
Styler#
Fix
background_gradient()
for nullable dtypeSeries
withNA
values (GH50712)
Metadata#
Fixed metadata propagation in
DataFrame.corr()
andDataFrame.cov()
(GH28283)
Other#
Contributors#
A total of 260 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
5j9 +
ABCPAN-rank +
Aarni Koskela +
Aashish KC +
Abubeker Mohammed +
Adam Mróz +
Adam Ormondroyd +
Aditya Anulekh +
Ahmed Ibrahim
Akshay Babbar +
Aleksa Radojicic +
Alex +
Alex Buzenet +
Alex Kirko
Allison Kwan +
Amay Patel +
Ambuj Pawar +
Amotz +
Andreas Schwab +
Andrew Chen +
Anton Shevtsov
Antonio Ossa Guerra +
Antonio Ossa-Guerra +
Anushka Bishnoi +
Arda Kosar
Armin Berres
Asadullah Naeem +
Asish Mahapatra
Bailey Lissington +
BarkotBeyene
Ben Beasley
Bhavesh Rajendra Patil +
Bibek Jha +
Bill +
Bishwas +
CarlosGDCJ +
Carlotta Fabian +
Chris Roth +
Chuck Cadman +
Corralien +
DG +
Dan Hendry +
Daniel Isaac
David Kleindienst +
David Poznik +
David Rudel +
DavidKleindienst +
Dea María Léon +
Deepak Sirohiwal +
Dennis Chukwunta
Douglas Lohmann +
Dries Schaumont
Dustin K +
Edoardo Abati +
Eduardo Chaves +
Ege Özgüroğlu +
Ekaterina Borovikova +
Eli Schwartz +
Elvis Lim +
Emily Taylor +
Emma Carballal Haire +
Erik Welch +
Fangchen Li
Florian Hofstetter +
Flynn Owen +
Fredrik Erlandsson +
Gaurav Sheni
Georeth Chow +
George Munyoro +
Guilherme Beltramini
Gulnur Baimukhambetova +
H L +
Hans
Hatim Zahid +
HighYoda +
Hiki +
Himanshu Wagh +
Hugo van Kemenade +
Idil Ismiguzel +
Irv Lustig
Isaac Chung
Isaac Virshup
JHM Darbyshire
JHM Darbyshire (iMac)
JMBurley
Jaime Di Cristina
Jan Koch
JanVHII +
Janosh Riebesell
JasmandeepKaur +
Jeremy Tuloup
Jessica M +
Jonas Haag
Joris Van den Bossche
João Meirelles +
Julia Aoun +
Justus Magin +
Kang Su Min +
Kevin Sheppard
Khor Chean Wei
Kian Eliasi
Kostya Farber +
KotlinIsland +
Lakmal Pinnaduwage +
Lakshya A Agrawal +
Lawrence Mitchell +
Levi Ob +
Loic Diridollou
Lorenzo Vainigli +
Luca Pizzini +
Lucas Damo +
Luke Manley
Madhuri Patil +
Marc Garcia
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Maren Westermann +
Maria Stazherova +
Marie K +
Marielle +
Mark Harfouche +
Marko Pacak +
Martin +
Matheus Cerqueira +
Matheus Pedroni +
Matteo Raso +
Matthew Roeschke
MeeseeksMachine +
Mehdi Mohammadi +
Michael Harris +
Michael Mior +
Natalia Mokeeva +
Neal Muppidi +
Nick Crews
Nishu Choudhary +
Noa Tamir
Noritada Kobayashi
Omkar Yadav +
P. Talley +
Pablo +
Pandas Development Team
Parfait Gasana
Patrick Hoefler
Pedro Nacht +
Philip +
Pietro Battiston
Pooja Subramaniam +
Pranav Saibhushan Ravuri +
Pranav. P. A +
Ralf Gommers +
RaphSku +
Richard Shadrach
Robsdedude +
Roger
Roger Thomas
RogerThomas +
SFuller4 +
Salahuddin +
Sam Rao
Sean Patrick Malloy +
Sebastian Roll +
Shantanu
Shashwat +
Shashwat Agrawal +
Shiko Wamwea +
Shoham Debnath
Shubhankar Lohani +
Siddhartha Gandhi +
Simon Hawkins
Soumik Dutta +
Sowrov Talukder +
Stefanie Molin
Stefanie Senger +
Stepfen Shawn +
Steven Rotondo
Stijn Van Hoey
Sudhansu +
Sven
Sylvain MARIE
Sylvain Marié
Tabea Kossen +
Taylor Packard
Terji Petersen
Thierry Moisan
Thomas H +
Thomas Li
Torsten Wörtwein
Tsvika S +
Tsvika Shapira +
Vamsi Verma +
Vinicius Akira +
William Andrea
William Ayd
William Blum +
Wilson Xing +
Xiao Yuan +
Xnot +
Yasin Tatar +
Yuanhao Geng
Yvan Cywan +
Zachary Moon +
Zhengbo Wang +
abonte +
adrienpacifico +
alm
amotzop +
andyjessen +
anonmouse1 +
bang128 +
bishwas jha +
calhockemeyer +
carla-alves-24 +
carlotta +
casadipietra +
catmar22 +
cfabian +
codamuse +
dataxerik
davidleon123 +
dependabot[bot] +
fdrocha +
github-actions[bot]
himanshu_wagh +
iofall +
jakirkham +
jbrockmendel
jnclt +
joelchen +
joelsonoda +
joshuabello2550
joycewamwea +
kathleenhang +
krasch +
ltoniazzi +
luke396 +
milosz-martynow +
minat-hub +
mliu08 +
monosans +
nealxm
nikitaved +
paradox-lab +
partev
raisadz +
ram vikram singh +
rebecca-palmer
sarvaSanjay +
seljaks +
silviaovo +
smij720 +
soumilbaldota +
stellalin7 +
strawberry beach sandals +
tmoschou +
uzzell +
yqyqyq-W +
yun +
Ádám Lippai
김동현 (Daniel Donghyun Kim) +