What’s new in 2.0.0 (??)#

These are the changes in pandas 2.0.0. See Release notes for a full changelog including other versions of pandas.

Enhancements#

Installing optional dependencies with pip extras#

When installing pandas using pip, sets of optional dependencies can also be installed by specifying extras.

pip install "pandas[performance, aws]>=2.0.0"

The available extras, found in the installation guide, are [all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, plot, output_formatting, clipboard, compression, test] (GH39164).

Configuration option, mode.dtype_backend, to return pyarrow-backed dtypes#

The use_nullable_dtypes keyword argument has been expanded to the following functions to enable automatic conversion to nullable dtypes (GH36712)

To simplify opting-in to nullable dtypes for these functions, a new option nullable_dtypes was added that allows setting the keyword argument globally to True if not specified directly. The option can be enabled through:

In [1]: pd.options.mode.nullable_dtypes = True

The option will only work for functions with the keyword use_nullable_dtypes.

Additionally a new global configuration, mode.dtype_backend can now be used in conjunction with the parameter use_nullable_dtypes=True in the following functions to select the nullable dtypes implementation.

And the following methods will also utilize the mode.dtype_backend option.

By default, mode.dtype_backend is set to "pandas" to return existing, numpy-backed nullable dtypes, but it can also be set to "pyarrow" to return pyarrow-backed, nullable ArrowDtype (GH48957, GH49997).

In [2]: import io

In [3]: data = io.StringIO("""a,b,c,d,e,f,g,h,i
   ...:     1,2.5,True,a,,,,,
   ...:     3,4.5,False,b,6,7.5,True,a,
   ...: """)
   ...: 

In [4]: with pd.option_context("mode.dtype_backend", "pandas"):
   ...:     df = pd.read_csv(data, use_nullable_dtypes=True)
   ...: 

In [5]: df.dtypes
Out[5]: 
a             Int64
b           Float64
c           boolean
d    string[python]
e             Int64
f           Float64
g           boolean
h    string[python]
i             Int64
dtype: object

In [6]: data.seek(0)
Out[6]: 0

In [7]: with pd.option_context("mode.dtype_backend", "pyarrow"):
   ...:     df_pyarrow = pd.read_csv(data, use_nullable_dtypes=True, engine="pyarrow")
   ...: 

In [8]: df_pyarrow.dtypes
Out[8]: 
a     int64[pyarrow]
b    double[pyarrow]
c      bool[pyarrow]
d    string[pyarrow]
e     int64[pyarrow]
f    double[pyarrow]
g      bool[pyarrow]
h    string[pyarrow]
i      null[pyarrow]
dtype: object

Copy-on-Write improvements#

Copy-on-Write can be enabled through

pd.set_option("mode.copy_on_write", True)
pd.options.mode.copy_on_write = True

Alternatively, copy on write can be enabled locally through:

with pd.option_context("mode.copy_on_write", True):
    ...

Other enhancements#

Notable bug fixes#

These are bug fixes that might have notable behavior changes.

DataFrameGroupBy.cumsum() and DataFrameGroupBy.cumprod() overflow instead of lossy casting to float#

In previous versions we cast to float when applying cumsum and cumprod which lead to incorrect results even if the result could be hold by int64 dtype. Additionally, the aggregation overflows consistent with numpy and the regular DataFrame.cumprod() and DataFrame.cumsum() methods when the limit of int64 is reached (GH37493).

Old Behavior

In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16

We return incorrect results with the 6th value.

New Behavior

In [9]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})

In [10]: df.groupby("key")["value"].cumprod()
Out[10]: 
0                   625
1                390625
2             244140625
3          152587890625
4        95367431640625
5     59604644775390625
6    359414837200037393
Name: value, dtype: int64

We overflow with the 7th value, but the 6th value is still correct.

DataFrameGroupBy.nth() and SeriesGroupBy.nth() now behave as filtrations#

In previous versions of pandas, DataFrameGroupBy.nth() and SeriesGroupBy.nth() acted as if they were aggregations. However, for most inputs n, they may return either zero or multiple rows per group. This means that they are filtrations, similar to e.g. DataFrameGroupBy.head(). pandas now treats them as filtrations (GH13666).

In [11]: df = pd.DataFrame({"a": [1, 1, 2, 1, 2], "b": [np.nan, 2.0, 3.0, 4.0, 5.0]})

In [12]: gb = df.groupby("a")

Old Behavior

In [5]: gb.nth(n=1)
Out[5]:
   A    B
1  1  2.0
4  2  5.0

New Behavior

In [13]: gb.nth(n=1)
Out[13]: 
   a    b
1  1  2.0
4  2  5.0

In particular, the index of the result is derived from the input by selecting the appropriate rows. Also, when n is larger than the group, no rows instead of NaN is returned.

Old Behavior

In [5]: gb.nth(n=3, dropna="any")
Out[5]:
    B
A
1 NaN
2 NaN

New Behavior

In [14]: gb.nth(n=3, dropna="any")
Out[14]: 
Empty DataFrame
Columns: [a, b]
Index: []

Backwards incompatible API changes#

Construction with datetime64 or timedelta64 dtype with unsupported resolution#

In past versions, when constructing a Series or DataFrame and passing a “datetime64” or “timedelta64” dtype with unsupported resolution (i.e. anything other than “ns”), pandas would silently replace the given dtype with its nanosecond analogue:

Previous behavior:

In [5]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[5]:
0   2016-01-01
dtype: datetime64[ns]

In [6] pd.Series(["2016-01-01"], dtype="datetime64[D]")
Out[6]:
0   2016-01-01
dtype: datetime64[ns]

In pandas 2.0 we support resolutions “s”, “ms”, “us”, and “ns”. When passing a supported dtype (e.g. “datetime64[s]”), the result now has exactly the requested dtype:

New behavior:

In [15]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[15]: 
0   2016-01-01
dtype: datetime64[s]

With an un-supported dtype, pandas now raises instead of silently swapping in a supported dtype:

New behavior:

In [16]: pd.Series(["2016-01-01"], dtype="datetime64[D]")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 pd.Series(["2016-01-01"], dtype="datetime64[D]")

File ~/work/pandas/pandas/pandas/core/series.py:486, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
    484         data = data.copy()
    485 else:
--> 486     data = sanitize_array(data, index, dtype, copy)
    488     manager = get_option("mode.data_manager")
    489     if manager == "block":

File ~/work/pandas/pandas/pandas/core/construction.py:602, in sanitize_array(data, index, dtype, copy, allow_2d)
    599     subarr = np.array([], dtype=np.float64)
    601 elif dtype is not None:
--> 602     subarr = _try_cast(data, dtype, copy)
    604 else:
    605     subarr = maybe_convert_platform(data)

File ~/work/pandas/pandas/pandas/core/construction.py:759, in _try_cast(arr, dtype, copy)
    754     return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(
    755         shape
    756     )
    758 elif dtype.kind in ["m", "M"]:
--> 759     return maybe_cast_to_datetime(arr, dtype)
    761 # GH#15832: Check if we are requesting a numeric dtype and
    762 # that we can convert the data to the requested dtype.
    763 elif is_integer_dtype(dtype):
    764     # this will raise if we have e.g. floats

File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1201, in maybe_cast_to_datetime(value, dtype)
   1197     raise TypeError("value must be listlike")
   1199 # TODO: _from_sequence would raise ValueError in cases where
   1200 #  _ensure_nanosecond_dtype raises TypeError
-> 1201 _ensure_nanosecond_dtype(dtype)
   1203 if is_timedelta64_dtype(dtype):
   1204     res = TimedeltaArray._from_sequence(value, dtype=dtype)

File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1276, in _ensure_nanosecond_dtype(dtype)
   1273     raise ValueError(msg)
   1274 # TODO: ValueError or TypeError? existing test
   1275 #  test_constructor_generic_timestamp_bad_frequency expects TypeError
-> 1276 raise TypeError(
   1277     f"dtype={dtype} is not supported. Supported resolutions are 's', "
   1278     "'ms', 'us', and 'ns'"
   1279 )

TypeError: dtype=datetime64[D] is not supported. Supported resolutions are 's', 'ms', 'us', and 'ns'

Disallow astype conversion to non-supported datetime64/timedelta64 dtypes#

In previous versions, converting a Series or DataFrame from datetime64[ns] to a different datetime64[X] dtype would return with datetime64[ns] dtype instead of the requested dtype. In pandas 2.0, support is added for “datetime64[s]”, “datetime64[ms]”, and “datetime64[us]” dtypes, so converting to those dtypes gives exactly the requested dtype:

Previous behavior:

In [17]: idx = pd.date_range("2016-01-01", periods=3)

In [18]: ser = pd.Series(idx)

Previous behavior:

In [4]: ser.astype("datetime64[s]")
Out[4]:
0   2016-01-01
1   2016-01-02
2   2016-01-03
dtype: datetime64[ns]

With the new behavior, we get exactly the requested dtype:

New behavior:

In [19]: ser.astype("datetime64[s]")
Out[19]: 
0   2016-01-01
1   2016-01-02
2   2016-01-03
dtype: datetime64[s]

For non-supported resolutions e.g. “datetime64[D]”, we raise instead of silently ignoring the requested dtype:

New behavior:

In [20]: ser.astype("datetime64[D]")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 ser.astype("datetime64[D]")

File ~/work/pandas/pandas/pandas/core/generic.py:6327, in NDFrame.astype(self, dtype, copy, errors)
   6320     results = [
   6321         self.iloc[:, i].astype(dtype, copy=copy)
   6322         for i in range(len(self.columns))
   6323     ]
   6325 else:
   6326     # else, only a single dtype is given
-> 6327     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6328     return self._constructor(new_data).__finalize__(self, method="astype")
   6330 # GH 33113: handle empty frame or series

File ~/work/pandas/pandas/pandas/core/internals/managers.py:439, in BaseBlockManager.astype(self, dtype, copy, errors)
    438 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 439     return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File ~/work/pandas/pandas/pandas/core/internals/managers.py:349, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    347         applied = b.apply(f, **kwargs)
    348     else:
--> 349         applied = getattr(b, f)(**kwargs)
    350     result_blocks = extend_blocks(applied, result_blocks)
    352 out = type(self).from_blocks(result_blocks, self.axes)

File ~/work/pandas/pandas/pandas/core/internals/blocks.py:489, in Block.astype(self, dtype, copy, errors)
    471 """
    472 Coerce to the new dtype.
    473 
   (...)
    485 Block
    486 """
    487 values = self.values
--> 489 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    491 new_values = maybe_coerce_values(new_values)
    492 newb = self.make_block(new_values)

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:239, in astype_array_safe(values, dtype, copy, errors)
    236     dtype = dtype.numpy_dtype
    238 try:
--> 239     new_values = astype_array(values, dtype, copy=copy)
    240 except (ValueError, TypeError):
    241     # e.g. _astype_nansafe can fail on object-dtype of strings
    242     #  trying to convert to float
    243     if errors == "ignore":

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:181, in astype_array(values, dtype, copy)
    177     return values
    179 if not isinstance(values, np.ndarray):
    180     # i.e. ExtensionArray
--> 181     values = values.astype(dtype, copy=copy)
    183 else:
    184     values = _astype_nansafe(values, dtype, copy=copy)

File ~/work/pandas/pandas/pandas/core/arrays/datetimes.py:700, in DatetimeArray.astype(self, dtype, copy)
    698 elif is_period_dtype(dtype):
    699     return self.to_period(freq=dtype.freq)
--> 700 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)

File ~/work/pandas/pandas/pandas/core/arrays/datetimelike.py:487, in DatetimeLikeArrayMixin.astype(self, dtype, copy)
    480 elif (
    481     is_datetime_or_timedelta_dtype(dtype)
    482     and not is_dtype_equal(self.dtype, dtype)
    483 ) or is_float_dtype(dtype):
    484     # disallow conversion between datetime/timedelta,
    485     # and conversions for any datetimelike to float
    486     msg = f"Cannot cast {type(self).__name__} to dtype {dtype}"
--> 487     raise TypeError(msg)
    488 else:
    489     return np.asarray(self, dtype=dtype)

TypeError: Cannot cast DatetimeArray to dtype datetime64[D]

For conversion from timedelta64[ns] dtypes, the old behavior converted to a floating point format.

Previous behavior:

In [21]: idx = pd.timedelta_range("1 Day", periods=3)

In [22]: ser = pd.Series(idx)

Previous behavior:

In [7]: ser.astype("timedelta64[s]")
Out[7]:
0     86400.0
1    172800.0
2    259200.0
dtype: float64

In [8]: ser.astype("timedelta64[D]")
Out[8]:
0    1.0
1    2.0
2    3.0
dtype: float64

The new behavior, as for datetime64, either gives exactly the requested dtype or raises:

New behavior:

In [23]: ser.astype("timedelta64[s]")
Out[23]: 
0   1 days 00:00:00
1   2 days 00:00:00
2   3 days 00:00:00
dtype: timedelta64[s]

In [24]: ser.astype("timedelta64[D]")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[24], line 1
----> 1 ser.astype("timedelta64[D]")

File ~/work/pandas/pandas/pandas/core/generic.py:6327, in NDFrame.astype(self, dtype, copy, errors)
   6320     results = [
   6321         self.iloc[:, i].astype(dtype, copy=copy)
   6322         for i in range(len(self.columns))
   6323     ]
   6325 else:
   6326     # else, only a single dtype is given
-> 6327     new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   6328     return self._constructor(new_data).__finalize__(self, method="astype")
   6330 # GH 33113: handle empty frame or series

File ~/work/pandas/pandas/pandas/core/internals/managers.py:439, in BaseBlockManager.astype(self, dtype, copy, errors)
    438 def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 439     return self.apply("astype", dtype=dtype, copy=copy, errors=errors)

File ~/work/pandas/pandas/pandas/core/internals/managers.py:349, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
    347         applied = b.apply(f, **kwargs)
    348     else:
--> 349         applied = getattr(b, f)(**kwargs)
    350     result_blocks = extend_blocks(applied, result_blocks)
    352 out = type(self).from_blocks(result_blocks, self.axes)

File ~/work/pandas/pandas/pandas/core/internals/blocks.py:489, in Block.astype(self, dtype, copy, errors)
    471 """
    472 Coerce to the new dtype.
    473 
   (...)
    485 Block
    486 """
    487 values = self.values
--> 489 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    491 new_values = maybe_coerce_values(new_values)
    492 newb = self.make_block(new_values)

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:239, in astype_array_safe(values, dtype, copy, errors)
    236     dtype = dtype.numpy_dtype
    238 try:
--> 239     new_values = astype_array(values, dtype, copy=copy)
    240 except (ValueError, TypeError):
    241     # e.g. _astype_nansafe can fail on object-dtype of strings
    242     #  trying to convert to float
    243     if errors == "ignore":

File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:181, in astype_array(values, dtype, copy)
    177     return values
    179 if not isinstance(values, np.ndarray):
    180     # i.e. ExtensionArray
--> 181     values = values.astype(dtype, copy=copy)
    183 else:
    184     values = _astype_nansafe(values, dtype, copy=copy)

File ~/work/pandas/pandas/pandas/core/arrays/timedeltas.py:353, in TimedeltaArray.astype(self, dtype, copy)
    349         return type(self)._simple_new(
    350             res_values, dtype=res_values.dtype, freq=self.freq
    351         )
    352     else:
--> 353         raise ValueError(
    354             f"Cannot convert from {self.dtype} to {dtype}. "
    355             "Supported resolutions are 's', 'ms', 'us', 'ns'"
    356         )
    358 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)

ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'

UTC and fixed-offset timezones default to standard-library tzinfo objects#

In previous versions, the default tzinfo object used to represent UTC was pytz.UTC. In pandas 2.0, we default to datetime.timezone.utc instead. Similarly, for timezones represent fixed UTC offsets, we use datetime.timezone objects instead of pytz.FixedOffset objects. See (GH34916)

Previous behavior:

In [2]: ts = pd.Timestamp("2016-01-01", tz="UTC")
In [3]: type(ts.tzinfo)
Out[3]: pytz.UTC

In [4]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")
In [3]: type(ts2.tzinfo)
Out[5]: pytz._FixedOffset

New behavior:

In [25]: ts = pd.Timestamp("2016-01-01", tz="UTC")

In [26]: type(ts.tzinfo)
Out[26]: datetime.timezone

In [27]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")

In [28]: type(ts2.tzinfo)
Out[28]: datetime.timezone

For timezones that are neither UTC nor fixed offsets, e.g. “US/Pacific”, we continue to default to pytz objects.

Empty DataFrames/Series will now default to have a RangeIndex#

Before, constructing an empty (where data is None or an empty list-like argument) Series or DataFrame without specifying the axes (index=None, columns=None) would return the axes as empty Index with object dtype.

Now, the axes return an empty RangeIndex.

Previous behavior:

In [8]: pd.Series().index
Out[8]:
Index([], dtype='object')

In [9] pd.DataFrame().axes
Out[9]:
[Index([], dtype='object'), Index([], dtype='object')]

New behavior:

In [29]: pd.Series().index
Out[29]: RangeIndex(start=0, stop=0, step=1)

In [30]: pd.DataFrame().axes
Out[30]: [RangeIndex(start=0, stop=0, step=1), RangeIndex(start=0, stop=0, step=1)]

DataFrame to LaTeX has a new render engine#

The existing DataFrame.to_latex() has been restructured to utilise the extended implementation previously available under Styler.to_latex(). The arguments signature is similar, albeit col_space has been removed since it is ignored by LaTeX engines. This render engine also requires jinja2 as a dependency which needs to be installed, since rendering is based upon jinja2 templates.

The pandas options below are no longer used and will be removed in future releases. The alternative options giving similar functionality are indicated below:

  • display.latex.escape: replaced with styler.format.escape,

  • display.latex.longtable: replaced with styler.latex.environment,

  • display.latex.multicolumn, display.latex.multicolumn_format and display.latex.multirow: replaced with styler.sparse.rows, styler.sparse.columns, styler.latex.multirow_align and styler.latex.multicol_align,

  • display.latex.repr: replaced with styler.render.repr,

  • display.max_rows and display.max_columns: replace with styler.render.max_rows, styler.render.max_columns and styler.render.max_elements.

Note that the behaviour of _repr_latex_ is also changed. Previously setting display.latex.repr would generate LaTeX only when using nbconvert for a JupyterNotebook, and not when the user is running the notebook. Now the styler.render.repr option allows control of the specific output within JupyterNotebooks for operations (not just on nbconvert). See GH39911.

Increased minimum versions for dependencies#

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package

Minimum Version

Required

Changed

mypy (dev)

0.991

X

pytest (dev)

7.0.0

X

pytest-xdist (dev)

2.2.0

X

hypothesis (dev)

6.34.2

X

python-dateutil

2.8.2

X

X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package

Minimum Version

Changed

pyarrow

6.0.0

X

matplotlib

3.6.1

X

fastparquet

0.6.3

X

xarray

0.21.0

X

See Dependencies and Optional dependencies for more.

Datetimes are now parsed with a consistent format#

In the past, to_datetime() guessed the format for each element independently. This was appropriate for some cases where elements had mixed date formats - however, it would regularly cause problems when users expected a consistent format but the function would switch formats between elements. As of version 2.0.0, parsing will use a consistent format, determined by the first non-NA value (unless the user specifies a format, in which case that is used).

Old behavior:

In [1]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [2]: pd.to_datetime(ser)
Out[2]:
0   2000-01-13
1   2000-12-01
dtype: datetime64[ns]

New behavior:

In [31]: ser = pd.Series(['13-01-2000', '12-01-2000'])

In [32]: pd.to_datetime(ser)
Out[32]: 
0   2000-01-13
1   2000-01-12
dtype: datetime64[ns]

Note that this affects read_csv() as well.

If you still need to parse dates with inconsistent formats, you’ll need to apply to_datetime() to each element individually, e.g.

ser = pd.Series(['13-01-2000', '12 January 2000'])
ser.apply(pd.to_datetime)

Other API changes#

  • The freq, tz, nanosecond, and unit keywords in the Timestamp constructor are now keyword-only (GH45307, GH32526)

  • Passing nanoseconds greater than 999 or less than 0 in Timestamp now raises a ValueError (GH48538, GH48255)

  • read_csv(): specifying an incorrect number of columns with index_col of now raises ParserError instead of IndexError when using the c parser.

  • Default value of dtype in get_dummies() is changed to bool from uint8 (GH45848)

  • DataFrame.astype(), Series.astype(), and DatetimeIndex.astype() casting datetime64 data to any of “datetime64[s]”, “datetime64[ms]”, “datetime64[us]” will return an object with the given resolution instead of coercing back to “datetime64[ns]” (GH48928)

  • DataFrame.astype(), Series.astype(), and DatetimeIndex.astype() casting timedelta64 data to any of “timedelta64[s]”, “timedelta64[ms]”, “timedelta64[us]” will return an object with the given resolution instead of coercing to “float64” dtype (GH48963)

  • DatetimeIndex.astype(), TimedeltaIndex.astype(), PeriodIndex.astype() Series.astype(), DataFrame.astype() with datetime64, timedelta64 or PeriodDtype dtypes no longer allow converting to integer dtypes other than “int64”, do obj.astype('int64', copy=False).astype(dtype) instead (GH49715)

  • Index.astype() now allows casting from float64 dtype to datetime-like dtypes, matching Series behavior (GH49660)

  • Passing data with dtype of “timedelta64[s]”, “timedelta64[ms]”, or “timedelta64[us]” to TimedeltaIndex, Series, or DataFrame constructors will now retain that dtype instead of casting to “timedelta64[ns]”; timedelta64 data with lower resolution will be cast to the lowest supported resolution “timedelta64[s]” (GH49014)

  • Passing dtype of “timedelta64[s]”, “timedelta64[ms]”, or “timedelta64[us]” to TimedeltaIndex, Series, or DataFrame constructors will now retain that dtype instead of casting to “timedelta64[ns]”; passing a dtype with lower resolution for Series or DataFrame will be cast to the lowest supported resolution “timedelta64[s]” (GH49014)

  • Passing a np.datetime64 object with non-nanosecond resolution to Timestamp will retain the input resolution if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH49008)

  • Passing datetime64 values with resolution other than nanosecond to to_datetime() will retain the input resolution if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH50369)

  • Passing a string in ISO-8601 format to Timestamp will retain the resolution of the parsed input if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH49737)

  • The other argument in DataFrame.mask() and Series.mask() now defaults to no_default instead of np.nan consistent with DataFrame.where() and Series.where(). Entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes). (GH49111)

  • Changed behavior of Series.quantile() and DataFrame.quantile() with SparseDtype to retain sparse dtype (GH49583)

  • When creating a Series with a object-dtype Index of datetime objects, pandas no longer silently converts the index to a DatetimeIndex (GH39307, GH23598)

  • Series.unique() with dtype “timedelta64[ns]” or “datetime64[ns]” now returns TimedeltaArray or DatetimeArray instead of numpy.ndarray (GH49176)

  • to_datetime() and DatetimeIndex now allow sequences containing both datetime objects and numeric entries, matching Series behavior (GH49037, GH50453)

  • pandas.api.dtypes.is_string_dtype() now only returns True for array-likes with dtype=object when the elements are inferred to be strings (GH15585)

  • Passing a sequence containing datetime objects and date objects to Series constructor will return with object dtype instead of datetime64[ns] dtype, consistent with Index behavior (GH49341)

  • Passing strings that cannot be parsed as datetimes to Series or DataFrame with dtype="datetime64[ns]" will raise instead of silently ignoring the keyword and returning object dtype (GH24435)

  • Passing a sequence containing a type that cannot be converted to Timedelta to to_timedelta() or to the Series or DataFrame constructor with dtype="timedelta64[ns]" or to TimedeltaIndex now raises TypeError instead of ValueError (GH49525)

  • Changed behavior of Index constructor with sequence containing at least one NaT and everything else either None or NaN to infer datetime64[ns] dtype instead of object, matching Series behavior (GH49340)

  • read_stata() with parameter index_col set to None (the default) will now set the index on the returned DataFrame to a RangeIndex instead of a Int64Index (GH49745)

  • Changed behavior of Index, Series, and DataFrame arithmetic methods when working with object-dtypes, the results no longer do type inference on the result of the array operations, use result.infer_objects(copy=False) to do type inference on the result (GH49999, GH49714)

  • Changed behavior of Index constructor with an object-dtype numpy.ndarray containing all-bool values or all-complex values, this will now retain object dtype, consistent with the Series behavior (GH49594)

  • Added "None" to default na_values in read_csv() (GH50286)

  • Changed behavior of Series and DataFrame constructors when given an integer dtype and floating-point data that is not round numbers, this now raises ValueError instead of silently retaining the float dtype; do Series(data) or DataFrame(data) to get the old behavior, and Series(data).astype(dtype) or DataFrame(data).astype(dtype) to get the specified dtype (GH49599)

  • Changed behavior of DataFrame.shift() with axis=1, an integer fill_value, and homogeneous datetime-like dtype, this now fills new columns with integer dtypes instead of casting to datetimelike (GH49842)

  • Files are now closed when encountering an exception in read_json() (GH49921)

  • Changed behavior of read_csv(), read_json() & read_fwf(), where the index will now always be a RangeIndex, when no index is specified. Previously the index would be a Index with dtype object if the new DataFrame/Series has length 0 (GH49572)

  • DataFrame.values(), DataFrame.to_numpy(), DataFrame.xs(), DataFrame.reindex(), DataFrame.fillna(), and DataFrame.replace() no longer silently consolidate the underlying arrays; do df = df.copy() to ensure consolidation (GH49356)

  • Creating a new DataFrame using a full slice on both axes with loc or iloc (thus, df.loc[:, :] or df.iloc[:, :]) now returns a new DataFrame (shallow copy) instead of the original DataFrame, consistent with other methods to get a full slice (for example df.loc[:] or df[:]) (GH49469)

  • Disallow computing cumprod for Timedelta object; previously this returned incorrect values (GH50246)

  • Instantiating an Index with an numeric numpy dtype with data containing NA and/or NaT now raises a ValueError. Previously a TypeError was raised (GH51050)

  • Loading a JSON file with duplicate columns using read_json(orient='split') renames columns to avoid duplicates, as read_csv() and the other readers do (GH50370)

  • The levels of the index of the Series returned from Series.sparse.from_coo now always have dtype int32. Previously they had dtype int64 (GH50926)

  • to_datetime() with unit of either “Y” or “M” will now raise if a sequence contains a non-round float value, matching the Timestamp behavior (GH50301)

Deprecations#

Removal of prior version deprecations/changes#

  • Removed deprecated Timestamp.freq, Timestamp.freqstr and argument freq from the Timestamp constructor and Timestamp.fromordinal() (GH14146)

  • Removed deprecated CategoricalBlock, Block.is_categorical(), require datetime64 and timedelta64 values to be wrapped in DatetimeArray or TimedeltaArray before passing to Block.make_block_same_class(), require DatetimeTZBlock.values to have the correct ndim when passing to the BlockManager constructor, and removed the “fastpath” keyword from the SingleBlockManager constructor (GH40226, GH40571)

  • Removed deprecated global option use_inf_as_null in favor of use_inf_as_na (GH17126)

  • Removed deprecated module pandas.core.index (GH30193)

  • Removed deprecated alias pandas.core.tools.datetimes.to_time, import the function directly from pandas.core.tools.times instead (GH34145)

  • Removed deprecated alias pandas.io.json.json_normalize, import the function directly from pandas.json_normalize instead (GH27615)

  • Removed deprecated Categorical.to_dense(), use np.asarray(cat) instead (GH32639)

  • Removed deprecated Categorical.take_nd() (GH27745)

  • Removed deprecated Categorical.mode(), use Series(cat).mode() instead (GH45033)

  • Removed deprecated Categorical.is_dtype_equal() and CategoricalIndex.is_dtype_equal() (GH37545)

  • Removed deprecated CategoricalIndex.take_nd() (GH30702)

  • Removed deprecated Index.is_type_compatible() (GH42113)

  • Removed deprecated Index.is_mixed(), check index.inferred_type directly instead (GH32922)

  • Removed deprecated pandas.api.types.is_categorical(); use pandas.api.types.is_categorical_dtype() instead (GH33385)

  • Removed deprecated Index.asi8() (GH37877)

  • Enforced deprecation changing behavior when passing datetime64[ns] dtype data and timezone-aware dtype to Series, interpreting the values as wall-times instead of UTC times, matching DatetimeIndex behavior (GH41662)

  • Enforced deprecation changing behavior when applying a numpy ufunc on multiple non-aligned (on the index or columns) DataFrame that will now align the inputs first (GH39239)

  • Removed deprecated DataFrame._AXIS_NUMBERS(), DataFrame._AXIS_NAMES(), Series._AXIS_NUMBERS(), Series._AXIS_NAMES() (GH33637)

  • Removed deprecated Index.to_native_types(), use obj.astype(str) instead (GH36418)

  • Removed deprecated Series.iteritems(), DataFrame.iteritems(), use obj.items instead (GH45321)

  • Removed deprecated DataFrame.lookup() (GH35224)

  • Removed deprecated Series.append(), DataFrame.append(), use concat() instead (GH35407)

  • Removed deprecated Series.iteritems(), DataFrame.iteritems() and HDFStore.iteritems() use obj.items instead (GH45321)

  • Removed deprecated DatetimeIndex.union_many() (GH45018)

  • Removed deprecated weekofyear and week attributes of DatetimeArray, DatetimeIndex and dt accessor in favor of isocalendar().week (GH33595)

  • Removed deprecated RangeIndex._start(), RangeIndex._stop(), RangeIndex._step(), use start, stop, step instead (GH30482)

  • Removed deprecated DatetimeIndex.to_perioddelta(), Use dtindex - dtindex.to_period(freq).to_timestamp() instead (GH34853)

  • Removed deprecated Styler.hide_index() and Styler.hide_columns() (GH49397)

  • Removed deprecated Styler.set_na_rep() and Styler.set_precision() (GH49397)

  • Removed deprecated Styler.where() (GH49397)

  • Removed deprecated Styler.render() (GH49397)

  • Removed deprecated argument col_space in DataFrame.to_latex() (GH47970)

  • Removed deprecated argument null_color in Styler.highlight_null() (GH49397)

  • Removed deprecated argument check_less_precise in testing.assert_frame_equal(), testing.assert_extension_array_equal(), testing.assert_series_equal(), testing.assert_index_equal() (GH30562)

  • Removed deprecated null_counts argument in DataFrame.info(). Use show_counts instead (GH37999)

  • Removed deprecated Index.is_monotonic(), and Series.is_monotonic(); use obj.is_monotonic_increasing instead (GH45422)

  • Removed deprecated Index.is_all_dates() (GH36697)

  • Enforced deprecation disallowing passing a timezone-aware Timestamp and dtype="datetime64[ns]" to Series or DataFrame constructors (GH41555)

  • Enforced deprecation disallowing passing a sequence of timezone-aware values and dtype="datetime64[ns]" to to Series or DataFrame constructors (GH41555)

  • Enforced deprecation disallowing numpy.ma.mrecords.MaskedRecords in the DataFrame constructor; pass "{name: data[name] for name in data.dtype.names} instead (GH40363)

  • Enforced deprecation disallowing unit-less “datetime64” dtype in Series.astype() and DataFrame.astype() (GH47844)

  • Enforced deprecation disallowing using .astype to convert a datetime64[ns] Series, DataFrame, or DatetimeIndex to timezone-aware dtype, use obj.tz_localize or ser.dt.tz_localize instead (GH39258)

  • Enforced deprecation disallowing using .astype to convert a timezone-aware Series, DataFrame, or DatetimeIndex to timezone-naive datetime64[ns] dtype, use obj.tz_localize(None) or obj.tz_convert("UTC").tz_localize(None) instead (GH39258)

  • Enforced deprecation disallowing passing non boolean argument to sort in concat() (GH44629)

  • Removed Date parser functions parse_date_time(), parse_date_fields(), parse_all_fields() and generic_parser() (GH24518)

  • Removed argument index from the core.arrays.SparseArray constructor (GH43523)

  • Remove argument squeeze from DataFrame.groupby() and Series.groupby() (GH32380)

  • Removed deprecated apply, apply_index, __call__, onOffset, and isAnchored attributes from DateOffset (GH34171)

  • Removed keep_tz argument in DatetimeIndex.to_series() (GH29731)

  • Remove arguments names and dtype from Index.copy() and levels and codes from MultiIndex.copy() (GH35853, GH36685)

  • Remove argument inplace from MultiIndex.set_levels() and MultiIndex.set_codes() (GH35626)

  • Removed arguments verbose and encoding from DataFrame.to_excel() and Series.to_excel() (GH47912)

  • Removed argument line_terminator from DataFrame.to_csv() and Series.to_csv(), use lineterminator instead (GH45302)

  • Removed argument inplace from DataFrame.set_axis() and Series.set_axis(), use obj = obj.set_axis(..., copy=False) instead (GH48130)

  • Disallow passing positional arguments to MultiIndex.set_levels() and MultiIndex.set_codes() (GH41485)

  • Disallow parsing to Timedelta strings with components with units “Y”, “y”, or “M”, as these do not represent unambiguous durations (GH36838)

  • Removed MultiIndex.is_lexsorted() and MultiIndex.lexsort_depth() (GH38701)

  • Removed argument how from PeriodIndex.astype(), use PeriodIndex.to_timestamp() instead (GH37982)

  • Removed argument try_cast from DataFrame.mask(), DataFrame.where(), Series.mask() and Series.where() (GH38836)

  • Removed argument tz from Period.to_timestamp(), use obj.to_timestamp(...).tz_localize(tz) instead (GH34522)

  • Removed argument sort_columns in DataFrame.plot() and Series.plot() (GH47563)

  • Removed argument is_copy from DataFrame.take() and Series.take() (GH30615)

  • Removed argument kind from Index.get_slice_bound(), Index.slice_indexer() and Index.slice_locs() (GH41378)

  • Removed arguments prefix, squeeze, error_bad_lines and warn_bad_lines from read_csv() (GH40413, GH43427)

  • Removed argument datetime_is_numeric from DataFrame.describe() and Series.describe() as datetime data will always be summarized as numeric data (GH34798)

  • Disallow passing list key to Series.xs() and DataFrame.xs(), pass a tuple instead (GH41789)

  • Disallow subclass-specific keywords (e.g. “freq”, “tz”, “names”, “closed”) in the Index constructor (GH38597)

  • Removed argument inplace from Categorical.remove_unused_categories() (GH37918)

  • Disallow passing non-round floats to Timestamp with unit="M" or unit="Y" (GH47266)

  • Remove keywords convert_float and mangle_dupe_cols from read_excel() (GH41176)

  • Remove keyword mangle_dupe_cols from read_csv() and read_table() (GH48137)

  • Removed errors keyword from DataFrame.where(), Series.where(), DataFrame.mask() and Series.mask() (GH47728)

  • Disallow passing non-keyword arguments to read_excel() except io and sheet_name (GH34418)

  • Disallow passing non-keyword arguments to DataFrame.drop() and Series.drop() except labels (GH41486)

  • Disallow passing non-keyword arguments to DataFrame.fillna() and Series.fillna() except value (GH41485)

  • Disallow passing non-keyword arguments to StringMethods.split() and StringMethods.rsplit() except for pat (GH47448)

  • Disallow passing non-keyword arguments to DataFrame.set_index() except keys (GH41495)

  • Disallow passing non-keyword arguments to Resampler.interpolate() except method (GH41699)

  • Disallow passing non-keyword arguments to DataFrame.reset_index() and Series.reset_index() except level (GH41496)

  • Disallow passing non-keyword arguments to DataFrame.dropna() and Series.dropna() (GH41504)

  • Disallow passing non-keyword arguments to ExtensionArray.argsort() (GH46134)

  • Disallow passing non-keyword arguments to Categorical.sort_values() (GH47618)

  • Disallow passing non-keyword arguments to Index.drop_duplicates() and Series.drop_duplicates() (GH41485)

  • Disallow passing non-keyword arguments to DataFrame.drop_duplicates() except for subset (GH41485)

  • Disallow passing non-keyword arguments to DataFrame.sort_index() and Series.sort_index() (GH41506)

  • Disallow passing non-keyword arguments to DataFrame.interpolate() and Series.interpolate() except for method (GH41510)

  • Disallow passing non-keyword arguments to DataFrame.any() and Series.any() (GH44896)

  • Disallow passing non-keyword arguments to Index.set_names() except for names (GH41551)

  • Disallow passing non-keyword arguments to Index.join() except for other (GH46518)

  • Disallow passing non-keyword arguments to concat() except for objs (GH41485)

  • Disallow passing non-keyword arguments to pivot() except for data (GH48301)

  • Disallow passing non-keyword arguments to DataFrame.pivot() (GH48301)

  • Disallow passing non-keyword arguments to read_html() except for io (GH27573)

  • Disallow passing non-keyword arguments to read_json() except for path_or_buf (GH27573)

  • Disallow passing non-keyword arguments to read_sas() except for filepath_or_buffer (GH47154)

  • Disallow passing non-keyword arguments to read_stata() except for filepath_or_buffer (GH48128)

  • Disallow passing non-keyword arguments to read_csv() except filepath_or_buffer (GH41485)

  • Disallow passing non-keyword arguments to read_table() except filepath_or_buffer (GH41485)

  • Disallow passing non-keyword arguments to read_fwf() except filepath_or_buffer (GH44710)

  • Disallow passing non-keyword arguments to read_xml() except for path_or_buffer (GH45133)

  • Disallow passing non-keyword arguments to Series.mask() and DataFrame.mask() except cond and other (GH41580)

  • Disallow passing non-keyword arguments to DataFrame.to_stata() except for path (GH48128)

  • Disallow passing non-keyword arguments to DataFrame.where() and Series.where() except for cond and other (GH41523)

  • Disallow passing non-keyword arguments to Series.set_axis() and DataFrame.set_axis() except for labels (GH41491)

  • Disallow passing non-keyword arguments to Series.rename_axis() and DataFrame.rename_axis() except for mapper (GH47587)

  • Disallow passing non-keyword arguments to Series.clip() and DataFrame.clip() (GH41511)

  • Disallow passing non-keyword arguments to Series.bfill(), Series.ffill(), DataFrame.bfill() and DataFrame.ffill() (GH41508)

  • Disallow passing non-keyword arguments to DataFrame.replace(), Series.replace() except for to_replace and value (GH47587)

  • Disallow passing non-keyword arguments to DataFrame.sort_values() except for by (GH41505)

  • Disallow passing non-keyword arguments to Series.sort_values() (GH41505)

  • Disallow passing non-keyword arguments to DataFrame.reindex() except for labels (GH17966)

  • Disallow Index.reindex() with non-unique Index objects (GH42568)

  • Disallowed constructing Categorical with scalar data (GH38433)

  • Disallowed constructing CategoricalIndex without passing data (GH38944)

  • Removed Rolling.validate(), Expanding.validate(), and ExponentialMovingWindow.validate() (GH43665)

  • Removed Rolling.win_type returning "freq" (GH38963)

  • Removed Rolling.is_datetimelike (GH38963)

  • Removed the level keyword in DataFrame and Series aggregations; use groupby instead (GH39983)

  • Removed deprecated Timedelta.delta(), Timedelta.is_populated(), and Timedelta.freq (GH46430, GH46476)

  • Removed deprecated NaT.freq (GH45071)

  • Removed deprecated Categorical.replace(), use Series.replace() instead (GH44929)

  • Removed the numeric_only keyword from Categorical.min() and Categorical.max() in favor of skipna (GH48821)

  • Changed behavior of DataFrame.median() and DataFrame.mean() with numeric_only=None to not exclude datetime-like columns THIS NOTE WILL BE IRRELEVANT ONCE numeric_only=None DEPRECATION IS ENFORCED (GH29941)

  • Removed is_extension_type() in favor of is_extension_array_dtype() (GH29457)

  • Removed .ExponentialMovingWindow.vol (GH39220)

  • Removed Index.get_value() and Index.set_value() (GH33907, GH28621)

  • Removed Series.slice_shift() and DataFrame.slice_shift() (GH37601)

  • Remove DataFrameGroupBy.pad() and DataFrameGroupBy.backfill() (GH45076)

  • Remove numpy argument from read_json() (GH30636)

  • Disallow passing abbreviations for orient in DataFrame.to_dict() (GH32516)

  • Disallow partial slicing on an non-monotonic DatetimeIndex with keys which are not in Index. This now raises a KeyError (GH18531)

  • Removed get_offset in favor of to_offset() (GH30340)

  • Removed the warn keyword in infer_freq() (GH45947)

  • Removed the include_start and include_end arguments in DataFrame.between_time() in favor of inclusive (GH43248)

  • Removed the closed argument in date_range() and bdate_range() in favor of inclusive argument (GH40245)

  • Removed the center keyword in DataFrame.expanding() (GH20647)

  • Removed the truediv keyword from eval() (GH29812)

  • Removed the method and tolerance arguments in Index.get_loc(). Use index.get_indexer([label], method=..., tolerance=...) instead (GH42269)

  • Removed the pandas.datetime submodule (GH30489)

  • Removed the pandas.np submodule (GH30296)

  • Removed pandas.util.testing in favor of pandas.testing (GH30745)

  • Removed Series.str.__iter__() (GH28277)

  • Removed pandas.SparseArray in favor of arrays.SparseArray (GH30642)

  • Removed pandas.SparseSeries and pandas.SparseDataFrame, including pickle support. (GH30642)

  • Enforced disallowing passing an integer fill_value to DataFrame.shift() and Series.shift`() with datetime64, timedelta64, or period dtypes (GH32591)

  • Enforced disallowing a string column label into times in DataFrame.ewm() (GH43265)

  • Enforced disallowing passing True and False into inclusive in Series.between() in favor of "both" and "neither" respectively (GH40628)

  • Enforced disallowing using usecols with out of bounds indices for read_csv with engine="c" (GH25623)

  • Enforced disallowing the use of **kwargs in ExcelWriter; use the keyword argument engine_kwargs instead (GH40430)

  • Enforced disallowing a tuple of column labels into DataFrameGroupBy.__getitem__() (GH30546)

  • Enforced disallowing missing labels when indexing with a sequence of labels on a level of a MultiIndex. This now raises a KeyError (GH42351)

  • Enforced disallowing setting values with .loc using a positional slice. Use .loc with labels or .iloc with positions instead (GH31840)

  • Enforced disallowing positional indexing with a float key even if that key is a round number, manually cast to integer instead (GH34193)

  • Enforced disallowing using a DataFrame indexer with .iloc, use .loc instead for automatic alignment (GH39022)

  • Enforced disallowing set or dict indexers in __getitem__ and __setitem__ methods (GH42825)

  • Enforced disallowing indexing on a Index or positional indexing on a Series producing multi-dimensional objects e.g. obj[:, None], convert to numpy before indexing instead (GH35141)

  • Enforced disallowing dict or set objects in suffixes in merge() (GH34810)

  • Enforced disallowing merge() to produce duplicated columns through the suffixes keyword and already existing columns (GH22818)

  • Enforced disallowing using merge() or join() on a different number of levels (GH34862)

  • Enforced disallowing value_name argument in DataFrame.melt() to match an element in the DataFrame columns (GH35003)

  • Enforced disallowing passing showindex into **kwargs in DataFrame.to_markdown() and Series.to_markdown() in favor of index (GH33091)

  • Removed setting Categorical._codes directly (GH41429)

  • Removed setting Categorical.categories directly (GH47834)

  • Removed argument inplace from Categorical.add_categories(), Categorical.remove_categories(), Categorical.set_categories(), Categorical.rename_categories(), Categorical.reorder_categories(), Categorical.set_ordered(), Categorical.as_ordered(), Categorical.as_unordered() (GH37981, GH41118, GH41133, GH47834)

  • Enforced Rolling.count() with min_periods=None to default to the size of the window (GH31302)

  • Renamed fname to path in DataFrame.to_parquet(), DataFrame.to_stata() and DataFrame.to_feather() (GH30338)

  • Enforced disallowing indexing a Series with a single item list with a slice (e.g. ser[[slice(0, 2)]]). Either convert the list to tuple, or pass the slice directly instead (GH31333)

  • Changed behavior indexing on a DataFrame with a DatetimeIndex index using a string indexer, previously this operated as a slice on rows, now it operates like any other column key; use frame.loc[key] for the old behavior (GH36179)

  • Enforced the display.max_colwidth option to not accept negative integers (GH31569)

  • Removed the display.column_space option in favor of df.to_string(col_space=...) (GH47280)

  • Removed the deprecated method mad from pandas classes (GH11787)

  • Removed the deprecated method tshift from pandas classes (GH11631)

  • Changed behavior of empty data passed into Series; the default dtype will be object instead of float64 (GH29405)

  • Changed the behavior of DatetimeIndex.union(), DatetimeIndex.intersection(), and DatetimeIndex.symmetric_difference() with mismatched timezones to convert to UTC instead of casting to object dtype (GH39328)

  • Changed the behavior of to_datetime() with argument “now” with utc=False to match Timestamp("now") (GH18705)

  • Changed the behavior of indexing on a timezone-aware DatetimeIndex with a timezone-naive datetime object or vice-versa; these now behave like any other non-comparable type by raising KeyError (GH36148)

  • Changed the behavior of Index.reindex(), Series.reindex(), and DataFrame.reindex() with a datetime64 dtype and a datetime.date object for fill_value; these are no longer considered equivalent to datetime.datetime objects so the reindex casts to object dtype (GH39767)

  • Changed behavior of SparseArray.astype() when given a dtype that is not explicitly SparseDtype, cast to the exact requested dtype rather than silently using a SparseDtype instead (GH34457)

  • Changed behavior of Index.ravel() to return a view on the original Index instead of a np.ndarray (GH36900)

  • Changed behavior of Series.to_frame() and Index.to_frame() with explicit name=None to use None for the column name instead of the index’s name or default 0 (GH45523)

  • Changed behavior of concat() with one array of bool-dtype and another of integer dtype, this now returns object dtype instead of integer dtype; explicitly cast the bool object to integer before concatenating to get the old behavior (GH45101)

  • Changed behavior of DataFrame constructor given floating-point data and an integer dtype, when the data cannot be cast losslessly, the floating point dtype is retained, matching Series behavior (GH41170)

  • Changed behavior of Index constructor when given a np.ndarray with object-dtype containing numeric entries; this now retains object dtype rather than inferring a numeric dtype, consistent with Series behavior (GH42870)

  • Changed behavior of Index.__and__(), Index.__or__() and Index.__xor__() to behave as logical operations (matching Series behavior) instead of aliases for set operations (GH37374)

  • Changed behavior of DataFrame constructor when passed a list whose first element is a Categorical, this now treats the elements as rows casting to object dtype, consistent with behavior for other types (GH38845)

  • Changed behavior of DataFrame constructor when passed a dtype (other than int) that the data cannot be cast to; it now raises instead of silently ignoring the dtype (GH41733)

  • Changed the behavior of Series constructor, it will no longer infer a datetime64 or timedelta64 dtype from string entries (GH41731)

  • Changed behavior of Timestamp constructor with a np.datetime64 object and a tz passed to interpret the input as a wall-time as opposed to a UTC time (GH42288)

  • Changed behavior of Timestamp.utcfromtimestamp() to return a timezone-aware object satisfying Timestamp.utcfromtimestamp(val).timestamp() == val (GH45083)

  • Changed behavior of Index constructor when passed a SparseArray or SparseDtype to retain that dtype instead of casting to numpy.ndarray (GH43930)

  • Changed behavior of setitem-like operations (__setitem__, fillna, where, mask, replace, insert, fill_value for shift) on an object with DatetimeTZDtype when using a value with a non-matching timezone, the value will be cast to the object’s timezone instead of casting both to object-dtype (GH44243)

  • Changed behavior of Index, Series, DataFrame constructors with floating-dtype data and a DatetimeTZDtype, the data are now interpreted as UTC-times instead of wall-times, consistent with how integer-dtype data are treated (GH45573)

  • Changed behavior of Series and DataFrame constructors with integer dtype and floating-point data containing NaN, this now raises IntCastingNaNError (GH40110)

  • Changed behavior of Series and DataFrame constructors with an integer dtype and values that are too large to losslessly cast to this dtype, this now raises ValueError (GH41734)

  • Changed behavior of Series and DataFrame constructors with an integer dtype and values having either datetime64 or timedelta64 dtypes, this now raises TypeError, use values.view("int64") instead (GH41770)

  • Removed the deprecated base and loffset arguments from pandas.DataFrame.resample(), pandas.Series.resample() and pandas.Grouper. Use offset or origin instead (GH31809)

  • Changed behavior of Series.fillna() and DataFrame.fillna() with timedelta64[ns] dtype and an incompatible fill_value; this now casts to object dtype instead of raising, consistent with the behavior with other dtypes (GH45746)

  • Change the default argument of regex for Series.str.replace() from True to False. Additionally, a single character pat with regex=True is now treated as a regular expression instead of a string literal. (GH36695, GH24804)

  • Changed behavior of DataFrame.any() and DataFrame.all() with bool_only=True; object-dtype columns with all-bool values will no longer be included, manually cast to bool dtype first (GH46188)

  • Changed behavior of DataFrame.max(), DataFrame.min, DataFrame.mean, DataFrame.median, DataFrame.skew, DataFrame.kurt with axis=None to return a scalar applying the aggregation across both axes (GH45072)

  • Changed behavior of comparison of a Timestamp with a datetime.date object; these now compare as un-equal and raise on inequality comparisons, matching the datetime.datetime behavior (GH36131)

  • Changed behavior of comparison of NaT with a datetime.date object; these now raise on inequality comparisons (GH39196)

  • Enforced deprecation of silently dropping columns that raised a TypeError in Series.transform and DataFrame.transform when used with a list or dictionary (GH43740)

  • Changed behavior of DataFrame.apply() with list-like so that any partial failure will raise an error (GH43740)

  • Changed behaviour of DataFrame.to_latex() to now use the Styler implementation via Styler.to_latex() (GH47970)

  • Changed behavior of Series.__setitem__() with an integer key and a Float64Index when the key is not present in the index; previously we treated the key as positional (behaving like series.iloc[key] = val), now we treat it is a label (behaving like series.loc[key] = val), consistent with Series.__getitem__`() behavior (GH33469)

  • Removed na_sentinel argument from factorize(), Index.factorize(), and ExtensionArray.factorize() (GH47157)

  • Changed behavior of Series.diff() and DataFrame.diff() with ExtensionDtype dtypes whose arrays do not implement diff, these now raise TypeError rather than casting to numpy (GH31025)

  • Enforced deprecation of calling numpy “ufunc”s on DataFrame with method="outer"; this now raises NotImplementedError (GH36955)

  • Enforced deprecation disallowing passing numeric_only=True to Series reductions (rank, any, all, …) with non-numeric dtype (GH47500)

  • Changed behavior of DataFrameGroupBy.apply() and SeriesGroupBy.apply() so that group_keys is respected even if a transformer is detected (GH34998)

  • Comparisons between a DataFrame and a Series where the frame’s columns do not match the series’s index raise ValueError instead of automatically aligning, do left, right = left.align(right, axis=1, copy=False) before comparing (GH36795)

  • Enforced deprecation numeric_only=None (the default) in DataFrame reductions that would silently drop columns that raised; numeric_only now defaults to False (GH41480)

  • Changed default of numeric_only to False in all DataFrame methods with that argument (GH46096, GH46906)

  • Changed default of numeric_only to False in Series.rank() (GH47561)

  • Enforced deprecation of silently dropping nuisance columns in groupby and resample operations when numeric_only=False (GH41475)

  • Enforced deprecation of silently dropping nuisance columns in Rolling, Expanding, and ExponentialMovingWindow ops. This will now raise a errors.DataError (GH42834)

  • Changed behavior in setting values with df.loc[:, foo] = bar or df.iloc[:, foo] = bar, these now always attempt to set values inplace before falling back to casting (GH45333)

  • Changed default of numeric_only in various DataFrameGroupBy methods; all methods now default to numeric_only=False (GH46072)

  • Changed default of numeric_only to False in Resampler methods (GH47177)

  • Using the method DataFrameGroupBy.transform() with a callable that returns DataFrames will align to the input’s index (GH47244)

  • When providing a list of columns of length one to DataFrame.groupby(), the keys that are returned by iterating over the resulting DataFrameGroupBy object will now be tuples of length one (GH47761)

  • Removed deprecated methods ExcelWriter.write_cells(), ExcelWriter.save(), ExcelWriter.cur_sheet(), ExcelWriter.handles(), ExcelWriter.path() (GH45795)

  • The ExcelWriter attribute book can no longer be set; it is still available to be accessed and mutated (GH48943)

  • Removed unused *args and **kwargs in Rolling, Expanding, and ExponentialMovingWindow ops (GH47851)

  • Removed the deprecated argument line_terminator from DataFrame.to_csv() (GH45302)

  • Removed the deprecated argument label from lreshape() (GH30219)

  • Arguments after expr in DataFrame.eval() and DataFrame.query() are keyword-only (GH47587)

  • Removed Index._get_attributes_dict() (GH50648)

  • Removed Series.__array_wrap__() (GH50648)

Performance improvements#

Bug fixes#

Categorical#

Datetimelike#

  • Bug in pandas.infer_freq(), raising TypeError when inferred on RangeIndex (GH47084)

  • Bug in to_datetime() incorrectly raising OverflowError with string arguments corresponding to large integers (GH50533)

  • Bug in to_datetime() was raising on invalid offsets with errors='coerce' and infer_datetime_format=True (GH48633)

  • Bug in DatetimeIndex constructor failing to raise when tz=None is explicitly specified in conjunction with timezone-aware dtype or data (GH48659)

  • Bug in subtracting a datetime scalar from DatetimeIndex failing to retain the original freq attribute (GH48818)

  • Bug in pandas.tseries.holiday.Holiday where a half-open date interval causes inconsistent return types from USFederalHolidayCalendar.holidays() (GH49075)

  • Bug in rendering DatetimeIndex and Series and DataFrame with timezone-aware dtypes with dateutil or zoneinfo timezones near daylight-savings transitions (GH49684)

  • Bug in to_datetime() was raising ValueError when parsing Timestamp, datetime.datetime, datetime.date, or np.datetime64 objects when non-ISO8601 format was passed (GH49298, GH50036)

  • Bug in to_datetime() was raising ValueError when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as NaT, for compatibility with how is done for ISO8601 formats (GH50251)

  • Bug in Timestamp was showing UserWarning, which was not actionable by users, when parsing non-ISO8601 delimited date strings (GH50232)

  • Bug in to_datetime() was showing misleading ValueError when parsing dates with format containing ISO week directive and ISO weekday directive (GH50308)

  • Bug in Timestamp.round() when the freq argument has zero-duration (e.g. “0ns”) returning incorrect results instead of raising (GH49737)

  • Bug in to_datetime() was not raising ValueError when invalid format was passed and errors was 'ignore' or 'coerce' (GH50266)

  • Bug in DateOffset was throwing TypeError when constructing with milliseconds and another super-daily argument (GH49897)

  • Bug in to_datetime() was not raising ValueError when parsing string with decimal date with format '%Y%m%d' (GH50051)

  • Bug in to_datetime() was not converting None to NaT when parsing mixed-offset date strings with ISO8601 format (GH50071)

  • Bug in to_datetime() was not returning input when parsing out-of-bounds date string with errors='ignore' and format='%Y%m%d' (GH14487)

  • Bug in to_datetime() was converting timezone-naive datetime.datetime to timezone-aware when parsing with timezone-aware strings, ISO8601 format, and utc=False (GH50254)

  • Bug in to_datetime() was throwing ValueError when parsing dates with ISO8601 format where some values were not zero-padded (GH21422)

  • Bug in to_datetime() was giving incorrect results when using format='%Y%m%d' and errors='ignore' (GH26493)

  • Bug in to_datetime() was failing to parse date strings 'today' and 'now' if format was not ISO8601 (GH50359)

  • Bug in Timestamp.utctimetuple() raising a TypeError (GH32174)

  • Bug in to_datetime() was raising ValueError when parsing mixed-offset Timestamp with errors='ignore' (GH50585)

  • Bug in to_datetime() was incorrectly handling floating-point inputs within 1 unit of the overflow boundaries (GH50183)

  • Bug in to_datetime() with unit of “Y” or “M” giving incorrect results, not matching pointwise Timestamp results (GH50870)

Timedelta#

  • Bug in to_timedelta() raising error when input has nullable dtype Float64 (GH48796)

  • Bug in Timedelta constructor incorrectly raising instead of returning NaT when given a np.timedelta64("nat") (GH48898)

  • Bug in Timedelta constructor failing to raise when passed both a Timedelta object and keywords (e.g. days, seconds) (GH48898)

Timezones#

Numeric#

Conversion#

  • Bug in constructing Series with int64 dtype from a string list raising instead of casting (GH44923)

  • Bug in constructing Series with masked dtype and boolean values with NA raising (GH42137)

  • Bug in DataFrame.eval() incorrectly raising an AttributeError when there are negative values in function call (GH46471)

  • Bug in Series.convert_dtypes() not converting dtype to nullable dtype when Series contains NA and has dtype object (GH48791)

  • Bug where any ExtensionDtype subclass with kind="M" would be interpreted as a timezone type (GH34986)

  • Bug in arrays.ArrowExtensionArray that would raise NotImplementedError when passed a sequence of strings or binary (GH49172)

  • Bug in Series.astype() raising pyarrow.ArrowInvalid when converting from a non-pyarrow string dtype to a pyarrow numeric type (GH50430)

  • Bug in Series.to_numpy() converting to NumPy array before applying na_value (GH48951)

  • Bug in DataFrame.astype() not copying data when converting to pyarrow dtype (GH50984)

  • Bug in to_datetime() was not respecting exact argument when format was an ISO8601 format (GH12649)

  • Bug in TimedeltaArray.astype() raising TypeError when converting to a pyarrow duration type (GH49795)

Strings#

  • Bug in pandas.api.dtypes.is_string_dtype() that would not return True for StringDtype or ArrowDtype with pyarrow.string() (GH15585)

  • Bug in converting string dtypes to “datetime64[ns]” or “timedelta64[ns]” incorrectly raising TypeError (GH36153)

Interval#

Indexing#

Missing#

MultiIndex#

I/O#

Period#

  • Bug in Period.strftime() and PeriodIndex.strftime(), raising UnicodeDecodeError when a locale-specific directive was passed (GH46319)

  • Bug in adding a Period object to an array of DateOffset objects incorrectly raising TypeError (GH50162)

  • Bug in Period where passing a string with finer resolution than nanosecond would result in a KeyError instead of dropping the extra precision (GH50417)

  • Bug in parsing strings representing Week-periods e.g. “2017-01-23/2017-01-29” as minute-frequency instead of week-frequency (GH50803)

  • Bug in GroupBy.sum(), GroupBy.cumsum(), GroupBy.prod(), GroupBy.cumprod() with PeriodDtype failing to raise TypeError (GH51040)

Plotting#

  • Bug in DataFrame.plot.hist(), not dropping elements of weights corresponding to NaN values in data (GH48884)

  • ax.set_xlim was sometimes raising UserWarning which users couldn’t address due to set_xlim not accepting parsing arguments - the converter now uses Timestamp() instead (GH49148)

Groupby/resample/rolling#

  • Bug in ExponentialMovingWindow with online not raising a NotImplementedError for unsupported operations (GH48834)

  • Bug in DataFrameGroupBy.sample() raises ValueError when the object is empty (GH48459)

  • Bug in Series.groupby() raises ValueError when an entry of the index is equal to the name of the index (GH48567)

  • Bug in DataFrameGroupBy.resample() produces inconsistent results when passing empty DataFrame (GH47705)

  • Bug in DataFrameGroupBy and SeriesGroupBy would not include unobserved categories in result when grouping by categorical indexes (GH49354)

  • Bug in DataFrameGroupBy and SeriesGroupBy would change result order depending on the input index when grouping by categoricals (GH49223)

  • Bug in DataFrameGroupBy and SeriesGroupBy when grouping on categorical data would sort result values even when used with sort=False (GH42482)

  • Bug in DataFrameGroupBy.apply() and SeriesGroupBy.apply with as_index=False would not attempt the computation without using the grouping keys when using them failed with a TypeError (GH49256)

  • Bug in DataFrameGroupBy.describe() would describe the group keys (GH49256)

  • Bug in SeriesGroupBy.describe() with as_index=False would have the incorrect shape (GH49256)

  • Bug in DataFrameGroupBy and SeriesGroupBy with dropna=False would drop NA values when the grouper was categorical (GH36327)

  • Bug in SeriesGroupBy.nunique() would incorrectly raise when the grouper was an empty categorical and observed=True (GH21334)

  • Bug in SeriesGroupBy.nth() would raise when grouper contained NA values after subsetting from a DataFrameGroupBy (GH26454)

  • Bug in DataFrame.groupby() would not include a Grouper specified by key in the result when as_index=False (GH50413)

  • Bug in DataFrameGrouBy.value_counts() would raise when used with a TimeGrouper (GH50486)

  • Bug in Resampler.size() caused a wide DataFrame to be returned instead of a Series with MultiIndex (GH46826)

  • Bug in DataFrameGroupBy.transform() and SeriesGroupBy.transform() would raise incorrectly when grouper had axis=1 for "idxmin" and "idxmax" arguments (GH45986)

  • Bug in DataFrameGroupBy would raise when used with an empty DataFrame, categorical grouper, and dropna=False (GH50634)

  • Bug in SeriesGroupBy.value_counts() did not respect sort=False (GH50482)

  • Bug in DataFrameGroupBy.resample() raises KeyError when getting the result from a key list when resampling on time index (GH50840)

  • Bug in DataFrameGroupBy.transform() and SeriesGroupBy.transform() would raise incorrectly when grouper had axis=1 for "ngroup" argument (GH45986)

Reshaping#

Sparse#

ExtensionArray#

Styler#

Metadata#

Other#

Contributors#