What’s new in 1.4.0 (January 22, 2022)#

These are the changes in pandas 1.4.0. See Release notes for a full changelog including other versions of pandas.

Enhancements#

Improved warning messages#

Previously, warning messages may have pointed to lines within the pandas library. Running the script setting_with_copy_warning.py

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})
df[:2].loc[:, 'a'] = 5

with pandas 1.3 resulted in:

.../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

This made it difficult to determine where the warning was being generated from. Now pandas will inspect the call stack, reporting the first line outside of the pandas library that gave rise to the warning. The output of the above script is now:

setting_with_copy_warning.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.

Index can hold arbitrary ExtensionArrays#

Until now, passing a custom ExtensionArray to pd.Index would cast the array to object dtype. Now Index can directly hold arbitrary ExtensionArrays (GH43930).

Previous behavior:

In [1]: arr = pd.array([1, 2, pd.NA])

In [2]: idx = pd.Index(arr)

In the old behavior, idx would be object-dtype:

Previous behavior:

In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')

With the new behavior, we keep the original dtype:

New behavior:

In [3]: idx
Out[3]: Index([1, 2, <NA>], dtype='Int64')

One exception to this is SparseArray, which will continue to cast to numpy dtype until pandas 2.0. At that point it will retain its dtype like other ExtensionArrays.

Styler#

Styler has been further developed in 1.4.0. The following general enhancements have been made:

Additionally there are specific enhancements to the HTML specific rendering:

  • Styler.bar() introduces additional arguments to control alignment and display (GH26070, GH36419), and it also validates the input arguments width and height (GH42511)

  • Styler.to_html() introduces keyword arguments sparse_index, sparse_columns, bold_headers, caption, max_rows and max_columns (GH41946, GH43149, GH42972)

  • Styler.to_html() omits CSSStyle rules for hidden table elements as a performance enhancement (GH43619)

  • Custom CSS classes can now be directly specified without string replacement (GH43686)

  • Ability to render hyperlinks automatically via a new hyperlinks formatting keyword argument (GH45058)

There are also some LaTeX specific enhancements:

  • Styler.to_latex() introduces keyword argument environment, which also allows a specific “longtable” entry through a separate jinja2 template (GH41866)

  • Naive sparsification is now possible for LaTeX without the necessity of including the multirow package (GH43369)

  • cline support has been added for MultiIndex row sparsification through a keyword argument (GH45138)

Multi-threaded CSV reading with a new CSV Engine based on pyarrow#

pandas.read_csv() now accepts engine="pyarrow" (requires at least pyarrow 1.0.1) as an argument, allowing for faster csv parsing on multicore machines with pyarrow installed. See the I/O docs for more info. (GH23697, GH43706)

Rank function for rolling and expanding windows#

Added rank function to Rolling and Expanding. The new function supports the method, ascending, and pct flags of DataFrame.rank(). The method argument supports min, max, and average ranking methods. Example:

In [4]: s = pd.Series([1, 4, 2, 3, 5, 3])

In [5]: s.rolling(3).rank()
Out[5]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    1.5
dtype: float64

In [6]: s.rolling(3).rank(method="max")
Out[6]: 
0    NaN
1    NaN
2    2.0
3    2.0
4    3.0
5    2.0
dtype: float64

Groupby positional indexing#

It is now possible to specify positional ranges relative to the ends of each group.

Negative arguments for DataFrameGroupBy.head(), SeriesGroupBy.head(), DataFrameGroupBy.tail(), and SeriesGroupBy.tail() now work correctly and result in ranges relative to the end and start of each group, respectively. Previously, negative arguments returned empty frames.

In [7]: df = pd.DataFrame([["g", "g0"], ["g", "g1"], ["g", "g2"], ["g", "g3"],
   ...:                    ["h", "h0"], ["h", "h1"]], columns=["A", "B"])
   ...: 

In [8]: df.groupby("A").head(-1)
Out[8]: 
   A   B
0  g  g0
1  g  g1
2  g  g2
4  h  h0

DataFrameGroupBy.nth() and SeriesGroupBy.nth() now accept a slice or list of integers and slices.

In [9]: df.groupby("A").nth(slice(1, -1))
Out[9]: 
    B
A    
g  g1
g  g2

In [10]: df.groupby("A").nth([slice(None, 1), slice(-1, None)])
Out[10]: 
    B
A    
g  g0
g  g3
h  h0
h  h1

DataFrameGroupBy.nth() and SeriesGroupBy.nth() now accept index notation.

In [11]: df.groupby("A").nth[1, -1]
Out[11]: 
    B
A    
g  g1
g  g3
h  h1

In [12]: df.groupby("A").nth[1:-1]
Out[12]: 
    B
A    
g  g1
g  g2

In [13]: df.groupby("A").nth[:1, -1:]
Out[13]: 
    B
A    
g  g0
g  g3
h  h0
h  h1

DataFrame.from_dict and DataFrame.to_dict have new 'tight' option#

A new 'tight' dictionary format that preserves MultiIndex entries and names is now available with the DataFrame.from_dict() and DataFrame.to_dict() methods and can be used with the standard json library to produce a tight representation of DataFrame objects (GH4889).

In [14]: df = pd.DataFrame.from_records(
   ....:     [[1, 3], [2, 4]],
   ....:     index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")],
   ....:                                     names=["n1", "n2"]),
   ....:     columns=pd.MultiIndex.from_tuples([("x", 1), ("y", 2)],
   ....:                                       names=["z1", "z2"]),
   ....: )
   ....: 

In [15]: df
Out[15]: 
z1     x  y
z2     1  2
n1 n2      
a  b   1  3
   c   2  4

In [16]: df.to_dict(orient='tight')
Out[16]: 
{'index': [('a', 'b'), ('a', 'c')],
 'columns': [('x', 1), ('y', 2)],
 'data': [[1, 3], [2, 4]],
 'index_names': ['n1', 'n2'],
 'column_names': ['z1', 'z2']}

Other enhancements#

Notable bug fixes#

These are bug fixes that might have notable behavior changes.

Inconsistent date string parsing#

The dayfirst option of to_datetime() isn’t strict, and this can lead to surprising behavior:

In [17]: pd.to_datetime(["31-12-2021"], dayfirst=False)
Out[17]: DatetimeIndex(['2021-12-31'], dtype='datetime64[ns]', freq=None)

Now, a warning will be raised if a date string cannot be parsed accordance to the given dayfirst value when the value is a delimited date string (e.g. 31-12-2012).

Ignoring dtypes in concat with empty or all-NA columns#

Note

This behaviour change has been reverted in pandas 1.4.3.

When using concat() to concatenate two or more DataFrame objects, if one of the DataFrames was empty or had all-NA values, its dtype was sometimes ignored when finding the concatenated dtype. These are now consistently not ignored (GH43507).

In [18]: df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))

In [19]: df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))

In [20]: res = pd.concat([df1, df2])

Previously, the float-dtype in df2 would be ignored so the result dtype would be datetime64[ns]. As a result, the np.nan would be cast to NaT.

Previous behavior:

In [4]: res
Out[4]:
         bar
0 2013-01-01
1        NaT

Now the float-dtype is respected. Since the common dtype for these DataFrames is object, the np.nan is retained.

New behavior:

In [4]: res
Out[4]:
                   bar
0  2013-01-01 00:00:00
1                  NaN

Null-values are no longer coerced to NaN-value in value_counts and mode#

Series.value_counts() and Series.mode() no longer coerce None, NaT and other null-values to a NaN-value for np.object-dtype. This behavior is now consistent with unique, isin and others (GH42688).

In [21]: s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])

In [22]: res = s.value_counts(dropna=False)

Previously, all null-values were replaced by a NaN-value.

Previous behavior:

In [3]: res
Out[3]:
NaN     5
True    1
dtype: int64

Now null-values are no longer mangled.

New behavior:

In [23]: res
Out[23]: 
None    3
NaT     2
True    1
dtype: int64

mangle_dupe_cols in read_csv no longer renames unique columns conflicting with target names#

read_csv() no longer renames unique column labels which conflict with the target names of duplicated columns. Already existing columns are skipped, i.e. the next available index is used for the target column name (GH14704).

In [24]: import io

In [25]: data = "a,a,a.1\n1,2,3"

In [26]: res = pd.read_csv(io.StringIO(data))

Previously, the second column was called a.1, while the third column was also renamed to a.1.1.

Previous behavior:

In [3]: res
Out[3]:
    a  a.1  a.1.1
0   1    2      3

Now the renaming checks if a.1 already exists when changing the name of the second column and jumps this index. The second column is instead renamed to a.2.

New behavior:

In [27]: res
Out[27]: 
   a  a.2  a.1
0  1    2    3

unstack and pivot_table no longer raises ValueError for result that would exceed int32 limit#

Previously DataFrame.pivot_table() and DataFrame.unstack() would raise a ValueError if the operation could produce a result with more than 2**31 - 1 elements. This operation now raises a errors.PerformanceWarning instead (GH26314).

Previous behavior:

In [3]: df = DataFrame({"ind1": np.arange(2 ** 16), "ind2": np.arange(2 ** 16), "count": 0})
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
ValueError: Unstacked DataFrame is too big, causing int32 overflow

New behavior:

In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
PerformanceWarning: The following operation may generate 4294967296 cells in the resulting pandas object.

groupby.apply consistent transform detection#

DataFrameGroupBy.apply() and SeriesGroupBy.apply() are designed to be flexible, allowing users to perform aggregations, transformations, filters, and use it with user-defined functions that might not fall into any of these categories. As part of this, apply will attempt to detect when an operation is a transform, and in such a case, the result will have the same index as the input. In order to determine if the operation is a transform, pandas compares the input’s index to the result’s and determines if it has been mutated. Previously in pandas 1.3, different code paths used different definitions of “mutated”: some would use Python’s is whereas others would test only up to equality.

This inconsistency has been removed, pandas now tests up to equality.

In [28]: def func(x):
   ....:     return x.copy()
   ....: 

In [29]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})

In [30]: df
Out[30]: 
   a  b  c
0  1  3  5
1  2  4  6

Previous behavior:

In [3]: df.groupby(['a']).apply(func)
Out[3]:
     a  b  c
a
1 0  1  3  5
2 1  2  4  6

In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
     c
a b
1 3  5
2 4  6

In the examples above, the first uses a code path where pandas uses is and determines that func is not a transform whereas the second tests up to equality and determines that func is a transform. In the first case, the result’s index is not the same as the input’s.

New behavior:

In [5]: df.groupby(['a']).apply(func)
Out[5]:
   a  b  c
0  1  3  5
1  2  4  6

In [6]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[6]:
     c
a b
1 3  5
2 4  6

Now in both cases it is determined that func is a transform. In each case, the result has the same index as the input.

Backwards incompatible API changes#

Increased minimum version for Python#

pandas 1.4.0 supports Python 3.8 and higher.

Increased minimum versions for dependencies#

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package

Minimum Version

Required

Changed

numpy

1.18.5

X

X

pytz

2020.1

X

X

python-dateutil

2.8.1

X

X

bottleneck

1.3.1

X

numexpr

2.7.1

X

pytest (dev)

6.0

mypy (dev)

0.930

X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package

Minimum Version

Changed

beautifulsoup4

4.8.2

X

fastparquet

0.4.0

fsspec

0.7.4

gcsfs

0.6.0

lxml

4.5.0

X

matplotlib

3.3.2

X

numba

0.50.1

X

openpyxl

3.0.3

X

pandas-gbq

0.14.0

X

pyarrow

1.0.1

X

pymysql

0.10.1

X

pytables

3.6.1

X

s3fs

0.4.0

scipy

1.4.1

X

sqlalchemy

1.4.0

X

tabulate

0.8.7

xarray

0.15.1

X

xlrd

2.0.1

X

xlsxwriter

1.2.2

X

xlwt

1.3.0

See Dependencies and Optional dependencies for more.

Other API changes#

  • Index.get_indexer_for() no longer accepts keyword arguments (other than target); in the past these would be silently ignored if the index was not unique (GH42310)

  • Change in the position of the min_rows argument in DataFrame.to_string() due to change in the docstring (GH44304)

  • Reduction operations for DataFrame or Series now raising a ValueError when None is passed for skipna (GH44178)

  • read_csv() and read_html() no longer raising an error when one of the header rows consists only of Unnamed: columns (GH13054)

  • Changed the name attribute of several holidays in USFederalHolidayCalendar to match official federal holiday names specifically:

    • “New Year’s Day” gains the possessive apostrophe

    • “Presidents Day” becomes “Washington’s Birthday”

    • “Martin Luther King Jr. Day” is now “Birthday of Martin Luther King, Jr.”

    • “July 4th” is now “Independence Day”

    • “Thanksgiving” is now “Thanksgiving Day”

    • “Christmas” is now “Christmas Day”

    • Added “Juneteenth National Independence Day”

Deprecations#

Deprecated Int64Index, UInt64Index & Float64Index#

Int64Index, UInt64Index and Float64Index have been deprecated in favor of the base Index class and will be removed in Pandas 2.0 (GH43028).

For constructing a numeric index, you can use the base Index class instead specifying the data type (which will also work on older pandas releases):

# replace
pd.Int64Index([1, 2, 3])
# with
pd.Index([1, 2, 3], dtype="int64")

For checking the data type of an index object, you can replace isinstance checks with checking the dtype:

# replace
isinstance(idx, pd.Int64Index)
# with
idx.dtype == "int64"

Currently, in order to maintain backward compatibility, calls to Index will continue to return Int64Index, UInt64Index and Float64Index when given numeric data, but in the future, an Index will be returned.

Current behavior:

In [1]: pd.Index([1, 2, 3], dtype="int32")
Out [1]: Int64Index([1, 2, 3], dtype='int64')
In [1]: pd.Index([1, 2, 3], dtype="uint64")
Out [1]: UInt64Index([1, 2, 3], dtype='uint64')

Future behavior:

In [3]: pd.Index([1, 2, 3], dtype="int32")
Out [3]: Index([1, 2, 3], dtype='int32')
In [4]: pd.Index([1, 2, 3], dtype="uint64")
Out [4]: Index([1, 2, 3], dtype='uint64')

Deprecated DataFrame.append and Series.append#

DataFrame.append() and Series.append() have been deprecated and will be removed in a future version. Use pandas.concat() instead (GH35407).

Deprecated syntax

In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
Out [1]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
0    1
1    2
0    3
1    4
dtype: int64

In [2]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [3]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [4]: df1.append(df2)
Out [4]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

Recommended syntax

In [31]: pd.concat([pd.Series([1, 2]), pd.Series([3, 4])])
Out[31]: 
0    1
1    2
0    3
1    4
dtype: int64

In [32]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))

In [33]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))

In [34]: pd.concat([df1, df2])
Out[34]: 
   A  B
0  1  2
1  3  4
0  5  6
1  7  8

Other Deprecations#

  • Deprecated Index.is_type_compatible() (GH42113)

  • Deprecated method argument in Index.get_loc(), use index.get_indexer([label], method=...) instead (GH42269)

  • Deprecated treating integer keys in Series.__setitem__() as positional when the index is a Float64Index not containing the key, a IntervalIndex with no entries containing the key, or a MultiIndex with leading Float64Index level not containing the key (GH33469)

  • Deprecated treating numpy.datetime64 objects as UTC times when passed to the Timestamp constructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, use Timestamp(dt64).tz_localize("UTC").tz_convert(tz) (GH24559)

  • Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a MultiIndex (GH42351)

  • Creating an empty Series without a dtype will now raise a more visible FutureWarning instead of a DeprecationWarning (GH30017)

  • Deprecated the kind argument in Index.get_slice_bound(), Index.slice_indexer(), and Index.slice_locs(); in a future version passing kind will raise (GH42857)

  • Deprecated dropping of nuisance columns in Rolling, Expanding, and EWM aggregations (GH42738)

  • Deprecated Index.reindex() with a non-unique Index (GH42568)

  • Deprecated Styler.render() in favor of Styler.to_html() (GH42140)

  • Deprecated Styler.hide_index() and Styler.hide_columns() in favor of Styler.hide() (GH43758)

  • Deprecated passing in a string column label into times in DataFrame.ewm() (GH43265)

  • Deprecated the include_start and include_end arguments in DataFrame.between_time(); in a future version passing include_start or include_end will raise (GH40245)

  • Deprecated the squeeze argument to read_csv(), read_table(), and read_excel(). Users should squeeze the DataFrame afterwards with .squeeze("columns") instead (GH43242)

  • Deprecated the index argument to SparseArray construction (GH23089)

  • Deprecated the closed argument in date_range() and bdate_range() in favor of inclusive argument; In a future version passing closed will raise (GH40245)

  • Deprecated Rolling.validate(), Expanding.validate(), and ExponentialMovingWindow.validate() (GH43665)

  • Deprecated silent dropping of columns that raised a TypeError in Series.transform and DataFrame.transform when used with a dictionary (GH43740)

  • Deprecated silent dropping of columns that raised a TypeError, DataError, and some cases of ValueError in Series.aggregate(), DataFrame.aggregate(), Series.groupby.aggregate(), and DataFrame.groupby.aggregate() when used with a list (GH43740)

  • Deprecated casting behavior when setting timezone-aware value(s) into a timezone-aware Series or DataFrame column when the timezones do not match. Previously this cast to object dtype. In a future version, the values being inserted will be converted to the series or column’s existing timezone (GH37605)

  • Deprecated casting behavior when passing an item with mismatched-timezone to DatetimeIndex.insert(), DatetimeIndex.putmask(), DatetimeIndex.where() DatetimeIndex.fillna(), Series.mask(), Series.where(), Series.fillna(), Series.shift(), Series.replace(), Series.reindex() (and DataFrame column analogues). In the past this has cast to object dtype. In a future version, these will cast the passed item to the index or series’s timezone (GH37605, GH44940)

  • Deprecated the prefix keyword argument in read_csv() and read_table(), in a future version the argument will be removed (GH43396)

  • Deprecated passing non boolean argument to sort in concat() (GH41518)

  • Deprecated passing arguments as positional for read_fwf() other than filepath_or_buffer (GH41485)

  • Deprecated passing arguments as positional for read_xml() other than path_or_buffer (GH45133)

  • Deprecated passing skipna=None for DataFrame.mad() and Series.mad(), pass skipna=True instead (GH44580)

  • Deprecated the behavior of to_datetime() with the string “now” with utc=False; in a future version this will match Timestamp("now"), which in turn matches Timestamp.now() returning the local time (GH18705)

  • Deprecated DateOffset.apply(), use offset + other instead (GH44522)

  • Deprecated parameter names in Index.copy() (GH44916)

  • A deprecation warning is now shown for DataFrame.to_latex() indicating the arguments signature may change and emulate more the arguments to Styler.to_latex() in future versions (GH44411)

  • Deprecated behavior of concat() between objects with bool-dtype and numeric-dtypes; in a future version these will cast to object dtype instead of coercing bools to numeric values (GH39817)

  • Deprecated Categorical.replace(), use Series.replace() instead (GH44929)

  • Deprecated passing set or dict as indexer for DataFrame.loc.__setitem__(), DataFrame.loc.__getitem__(), Series.loc.__setitem__(), Series.loc.__getitem__(), DataFrame.__getitem__(), Series.__getitem__() and Series.__setitem__() (GH42825)

  • Deprecated Index.__getitem__() with a bool key; use index.values[key] to get the old behavior (GH44051)

  • Deprecated downcasting column-by-column in DataFrame.where() with integer-dtypes (GH44597)

  • Deprecated DatetimeIndex.union_many(), use DatetimeIndex.union() instead (GH44091)

  • Deprecated Groupby.pad() in favor of Groupby.ffill() (GH33396)

  • Deprecated Groupby.backfill() in favor of Groupby.bfill() (GH33396)

  • Deprecated Resample.pad() in favor of Resample.ffill() (GH33396)

  • Deprecated Resample.backfill() in favor of Resample.bfill() (GH33396)

  • Deprecated numeric_only=None in DataFrame.rank(); in a future version numeric_only must be either True or False (the default) (GH45036)

  • Deprecated the behavior of Timestamp.utcfromtimestamp(), in the future it will return a timezone-aware UTC Timestamp (GH22451)

  • Deprecated NaT.freq() (GH45071)

  • Deprecated behavior of Series and DataFrame construction when passed float-dtype data containing NaN and an integer dtype ignoring the dtype argument; in a future version this will raise (GH40110)

  • Deprecated the behaviour of Series.to_frame() and Index.to_frame() to ignore the name argument when name=None. Currently, this means to preserve the existing name, but in the future explicitly passing name=None will set None as the name of the column in the resulting DataFrame (GH44212)

Performance improvements#

Bug fixes#

Categorical#

  • Bug in setting dtype-incompatible values into a Categorical (or Series or DataFrame backed by Categorical) raising ValueError instead of TypeError (GH41919)

  • Bug in Categorical.searchsorted() when passing a dtype-incompatible value raising KeyError instead of TypeError (GH41919)

  • Bug in Categorical.astype() casting datetimes and Timestamp to int for dtype object (GH44930)

  • Bug in Series.where() with CategoricalDtype when passing a dtype-incompatible value raising ValueError instead of TypeError (GH41919)

  • Bug in Categorical.fillna() when passing a dtype-incompatible value raising ValueError instead of TypeError (GH41919)

  • Bug in Categorical.fillna() with a tuple-like category raising ValueError instead of TypeError when filling with a non-category tuple (GH41919)

Datetimelike#

  • Bug in DataFrame constructor unnecessarily copying non-datetimelike 2D object arrays (GH39272)

  • Bug in to_datetime() with format and pandas.NA was raising ValueError (GH42957)

  • to_datetime() would silently swap MM/DD/YYYY and DD/MM/YYYY formats if the given dayfirst option could not be respected - now, a warning is raised in the case of delimited date strings (e.g. 31-12-2012) (GH12585)

  • Bug in date_range() and bdate_range() do not return right bound when start = end and set is closed on one side (GH43394)

  • Bug in inplace addition and subtraction of DatetimeIndex or TimedeltaIndex with DatetimeArray or TimedeltaArray (GH43904)

  • Bug in calling np.isnan, np.isfinite, or np.isinf on a timezone-aware DatetimeIndex incorrectly raising TypeError (GH43917)

  • Bug in constructing a Series from datetime-like strings with mixed timezones incorrectly partially-inferring datetime values (GH40111)

  • Bug in addition of a Tick object and a np.timedelta64 object incorrectly raising instead of returning Timedelta (GH44474)

  • np.maximum.reduce and np.minimum.reduce now correctly return Timestamp and Timedelta objects when operating on Series, DataFrame, or Index with datetime64[ns] or timedelta64[ns] dtype (GH43923)

  • Bug in adding a np.timedelta64 object to a BusinessDay or CustomBusinessDay object incorrectly raising (GH44532)

  • Bug in Index.insert() for inserting np.datetime64, np.timedelta64 or tuple into Index with dtype='object' with negative loc adding None and replacing existing value (GH44509)

  • Bug in Timestamp.to_pydatetime() failing to retain the fold attribute (GH45087)

  • Bug in Series.mode() with DatetimeTZDtype incorrectly returning timezone-naive and PeriodDtype incorrectly raising (GH41927)

  • Fixed regression in reindex() raising an error when using an incompatible fill value with a datetime-like dtype (or not raising a deprecation warning for using a datetime.date as fill value) (GH42921)

  • Bug in DateOffset addition with Timestamp where offset.nanoseconds would not be included in the result (GH43968, GH36589)

  • Bug in Timestamp.fromtimestamp() not supporting the tz argument (GH45083)

  • Bug in DataFrame construction from dict of Series with mismatched index dtypes sometimes raising depending on the ordering of the passed dict (GH44091)

  • Bug in Timestamp hashing during some DST transitions caused a segmentation fault (GH33931 and GH40817)

Timedelta#

  • Bug in division of all-NaT TimeDeltaIndex, Series or DataFrame column with object-dtype array like of numbers failing to infer the result as timedelta64-dtype (GH39750)

  • Bug in floor division of timedelta64[ns] data with a scalar returning garbage values (GH44466)

  • Bug in Timedelta now properly taking into account any nanoseconds contribution of any kwarg (GH43764, GH45227)

Time Zones#

Numeric#

  • Bug in floor-dividing a list or tuple of integers by a Series incorrectly raising (GH44674)

  • Bug in DataFrame.rank() raising ValueError with object columns and method="first" (GH41931)

  • Bug in DataFrame.rank() treating missing values and extreme values as equal (for example np.nan and np.inf), causing incorrect results when na_option="bottom" or na_option="top used (GH41931)

  • Bug in numexpr engine still being used when the option compute.use_numexpr is set to False (GH32556)

  • Bug in DataFrame arithmetic ops with a subclass whose _constructor() attribute is a callable other than the subclass itself (GH43201)

  • Bug in arithmetic operations involving RangeIndex where the result would have the incorrect name (GH43962)

  • Bug in arithmetic operations involving Series where the result could have the incorrect name when the operands having matching NA or matching tuple names (GH44459)

  • Bug in division with IntegerDtype or BooleanDtype array and NA scalar incorrectly raising (GH44685)

  • Bug in multiplying a Series with FloatingDtype with a timedelta-like scalar incorrectly raising (GH44772)

Conversion#

Strings#

  • Bug in checking for string[pyarrow] dtype incorrectly raising an ImportError when pyarrow is not installed (GH44276)

Interval#

  • Bug in Series.where() with IntervalDtype incorrectly raising when the where call should not replace anything (GH44181)

Indexing#

  • Bug in Series.rename() with MultiIndex and level is provided (GH43659)

  • Bug in DataFrame.truncate() and Series.truncate() when the object’s Index has a length greater than one but only one unique value (GH42365)

  • Bug in Series.loc() and DataFrame.loc() with a MultiIndex when indexing with a tuple in which one of the levels is also a tuple (GH27591)

  • Bug in Series.loc() with a MultiIndex whose first level contains only np.nan values (GH42055)

  • Bug in indexing on a Series or DataFrame with a DatetimeIndex when passing a string, the return type depended on whether the index was monotonic (GH24892)

  • Bug in indexing on a MultiIndex failing to drop scalar levels when the indexer is a tuple containing a datetime-like string (GH42476)

  • Bug in DataFrame.sort_values() and Series.sort_values() when passing an ascending value, failed to raise or incorrectly raising ValueError (GH41634)

  • Bug in updating values of pandas.Series using boolean index, created by using pandas.DataFrame.pop() (GH42530)

  • Bug in Index.get_indexer_non_unique() when index contains multiple np.nan (GH35392)

  • Bug in DataFrame.query() did not handle the degree sign in a backticked column name, such as `Temp(°C)`, used in an expression to query a DataFrame (GH42826)

  • Bug in DataFrame.drop() where the error message did not show missing labels with commas when raising KeyError (GH42881)

  • Bug in DataFrame.query() where method calls in query strings led to errors when the numexpr package was installed (GH22435)

  • Bug in DataFrame.nlargest() and Series.nlargest() where sorted result did not count indexes containing np.nan (GH28984)

  • Bug in indexing on a non-unique object-dtype Index with an NA scalar (e.g. np.nan) (GH43711)

  • Bug in DataFrame.__setitem__() incorrectly writing into an existing column’s array rather than setting a new array when the new dtype and the old dtype match (GH43406)

  • Bug in setting floating-dtype values into a Series with integer dtype failing to set inplace when those values can be losslessly converted to integers (GH44316)

  • Bug in Series.__setitem__() with object dtype when setting an array with matching size and dtype=’datetime64[ns]’ or dtype=’timedelta64[ns]’ incorrectly converting the datetime/timedeltas to integers (GH43868)

  • Bug in DataFrame.sort_index() where ignore_index=True was not being respected when the index was already sorted (GH43591)

  • Bug in Index.get_indexer_non_unique() when index contains multiple np.datetime64("NaT") and np.timedelta64("NaT") (GH43869)

  • Bug in setting a scalar Interval value into a Series with IntervalDtype when the scalar’s sides are floats and the values’ sides are integers (GH44201)

  • Bug when setting string-backed Categorical values that can be parsed to datetimes into a DatetimeArray or Series or DataFrame column backed by DatetimeArray failing to parse these strings (GH44236)

  • Bug in Series.__setitem__() with an integer dtype other than int64 setting with a range object unnecessarily upcasting to int64 (GH44261)

  • Bug in Series.__setitem__() with a boolean mask indexer setting a listlike value of length 1 incorrectly broadcasting that value (GH44265)

  • Bug in Series.reset_index() not ignoring name argument when drop and inplace are set to True (GH44575)

  • Bug in DataFrame.loc.__setitem__() and DataFrame.iloc.__setitem__() with mixed dtypes sometimes failing to operate in-place (GH44345)

  • Bug in DataFrame.loc.__getitem__() incorrectly raising KeyError when selecting a single column with a boolean key (GH44322).

  • Bug in setting DataFrame.iloc() with a single ExtensionDtype column and setting 2D values e.g. df.iloc[:] = df.values incorrectly raising (GH44514)

  • Bug in setting values with DataFrame.iloc() with a single ExtensionDtype column and a tuple of arrays as the indexer (GH44703)

  • Bug in indexing on columns with loc or iloc using a slice with a negative step with ExtensionDtype columns incorrectly raising (GH44551)

  • Bug in DataFrame.loc.__setitem__() changing dtype when indexer was completely False (GH37550)

  • Bug in IntervalIndex.get_indexer_non_unique() returning boolean mask instead of array of integers for a non unique and non monotonic index (GH44084)

  • Bug in IntervalIndex.get_indexer_non_unique() not handling targets of dtype ‘object’ with NaNs correctly (GH44482)

  • Fixed regression where a single column np.matrix was no longer coerced to a 1d np.ndarray when added to a DataFrame (GH42376)

  • Bug in Series.__getitem__() with a CategoricalIndex of integers treating lists of integers as positional indexers, inconsistent with the behavior with a single scalar integer (GH15470, GH14865)

  • Bug in Series.__setitem__() when setting floats or integers into integer-dtype Series failing to upcast when necessary to retain precision (GH45121)

  • Bug in DataFrame.iloc.__setitem__() ignores axis argument (GH45032)

Missing#

MultiIndex#

I/O#

  • Bug in read_excel() attempting to read chart sheets from .xlsx files (GH41448)

  • Bug in json_normalize() where errors=ignore could fail to ignore missing values of meta when record_path has a length greater than one (GH41876)

  • Bug in read_csv() with multi-header input and arguments referencing column names as tuples (GH42446)

  • Bug in read_fwf(), where difference in lengths of colspecs and names was not raising ValueError (GH40830)

  • Bug in Series.to_json() and DataFrame.to_json() where some attributes were skipped when serializing plain Python objects to JSON (GH42768, GH33043)

  • Column headers are dropped when constructing a DataFrame from a sqlalchemy’s Row object (GH40682)

  • Bug in unpickling an Index with object dtype incorrectly inferring numeric dtypes (GH43188)

  • Bug in read_csv() where reading multi-header input with unequal lengths incorrectly raised IndexError (GH43102)

  • Bug in read_csv() raising ParserError when reading file in chunks and some chunk blocks have fewer columns than header for engine="c" (GH21211)

  • Bug in read_csv(), changed exception class when expecting a file path name or file-like object from OSError to TypeError (GH43366)

  • Bug in read_csv() and read_fwf() ignoring all skiprows except first when nrows is specified for engine='python' (GH44021, GH10261)

  • Bug in read_csv() keeping the original column in object format when keep_date_col=True is set (GH13378)

  • Bug in read_json() not handling non-numpy dtypes correctly (especially category) (GH21892, GH33205)

  • Bug in json_normalize() where multi-character sep parameter is incorrectly prefixed to every key (GH43831)

  • Bug in json_normalize() where reading data with missing multi-level metadata would not respect errors="ignore" (GH44312)

  • Bug in read_csv() used second row to guess implicit index if header was set to None for engine="python" (GH22144)

  • Bug in read_csv() not recognizing bad lines when names were given for engine="c" (GH22144)

  • Bug in read_csv() with float_precision="round_trip" which did not skip initial/trailing whitespace (GH43713)

  • Bug when Python is built without the lzma module: a warning was raised at the pandas import time, even if the lzma capability isn’t used (GH43495)

  • Bug in read_csv() not applying dtype for index_col (GH9435)

  • Bug in dumping/loading a DataFrame with yaml.dump(frame) (GH42748)

  • Bug in read_csv() raising ValueError when names was longer than header but equal to data rows for engine="python" (GH38453)

  • Bug in ExcelWriter, where engine_kwargs were not passed through to all engines (GH43442)

  • Bug in read_csv() raising ValueError when parse_dates was used with MultiIndex columns (GH8991)

  • Bug in read_csv() not raising an ValueError when \n was specified as delimiter or sep which conflicts with lineterminator (GH43528)

  • Bug in to_csv() converting datetimes in categorical Series to integers (GH40754)

  • Bug in read_csv() converting columns to numeric after date parsing failed (GH11019)

  • Bug in read_csv() not replacing NaN values with np.nan before attempting date conversion (GH26203)

  • Bug in read_csv() raising AttributeError when attempting to read a .csv file and infer index column dtype from an nullable integer type (GH44079)

  • Bug in to_csv() always coercing datetime columns with different formats to the same format (GH21734)

  • DataFrame.to_csv() and Series.to_csv() with compression set to 'zip' no longer create a zip file containing a file ending with “.zip”. Instead, they try to infer the inner file name more smartly (GH39465)

  • Bug in read_csv() where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (GH42808, GH34120)

  • Bug in to_xml() raising error for pd.NA with extension array dtype (GH43903)

  • Bug in read_csv() when passing simultaneously a parser in date_parser and parse_dates=False, the parsing was still called (GH44366)

  • Bug in read_csv() not setting name of MultiIndex columns correctly when index_col is not the first column (GH38549)

  • Bug in read_csv() silently ignoring errors when failing to create a memory-mapped file (GH44766)

  • Bug in read_csv() when passing a tempfile.SpooledTemporaryFile opened in binary mode (GH44748)

  • Bug in read_json() raising ValueError when attempting to parse json strings containing “://” (GH36271)

  • Bug in read_csv() when engine="c" and encoding_errors=None which caused a segfault (GH45180)

  • Bug in read_csv() an invalid value of usecols leading to an unclosed file handle (GH45384)

  • Bug in DataFrame.to_json() fix memory leak (GH43877)

Period#

Plotting#

Groupby/resample/rolling#

Reshaping#

Sparse#

  • Bug in DataFrame.sparse.to_coo() raising AttributeError when column names are not unique (GH29564)

  • Bug in SparseArray.max() and SparseArray.min() raising ValueError for arrays with 0 non-null elements (GH43527)

  • Bug in DataFrame.sparse.to_coo() silently converting non-zero fill values to zero (GH24817)

  • Bug in SparseArray comparison methods with an array-like operand of mismatched length raising AssertionError or unclear ValueError depending on the input (GH43863)

  • Bug in SparseArray arithmetic methods floordiv and mod behaviors when dividing by zero not matching the non-sparse Series behavior (GH38172)

  • Bug in SparseArray unary methods as well as SparseArray.isna() doesn’t recalculate indexes (GH44955)

ExtensionArray#

  • Bug in array() failing to preserve PandasArray (GH43887)

  • NumPy ufuncs np.abs, np.positive, np.negative now correctly preserve dtype when called on ExtensionArrays that implement __abs__, __pos__, __neg__, respectively. In particular this is fixed for TimedeltaArray (GH43899, GH23316)

  • NumPy ufuncs np.minimum.reduce np.maximum.reduce, np.add.reduce, and np.prod.reduce now work correctly instead of raising NotImplementedError on Series with IntegerDtype or FloatDtype (GH43923, GH44793)

  • NumPy ufuncs with out keyword are now supported by arrays with IntegerDtype and FloatingDtype (GH45122)

  • Avoid raising PerformanceWarning about fragmented DataFrame when using many columns with an extension dtype (GH44098)

  • Bug in IntegerArray and FloatingArray construction incorrectly coercing mismatched NA values (e.g. np.timedelta64("NaT")) to numeric NA (GH44514)

  • Bug in BooleanArray.__eq__() and BooleanArray.__ne__() raising TypeError on comparison with an incompatible type (like a string). This caused DataFrame.replace() to sometimes raise a TypeError if a nullable boolean column was included (GH44499)

  • Bug in array() incorrectly raising when passed a ndarray with float16 dtype (GH44715)

  • Bug in calling np.sqrt on BooleanArray returning a malformed FloatingArray (GH44715)

  • Bug in Series.where() with ExtensionDtype when other is a NA scalar incompatible with the Series dtype (e.g. NaT with a numeric dtype) incorrectly casting to a compatible NA value (GH44697)

  • Bug in Series.replace() where explicitly passing value=None is treated as if no value was passed, and None not being in the result (GH36984, GH19998)

  • Bug in Series.replace() with unwanted downcasting being done in no-op replacements (GH44498)

  • Bug in Series.replace() with FloatDtype, string[python], or string[pyarrow] dtype not being preserved when possible (GH33484, GH40732, GH31644, GH41215, GH25438)

Styler#

Other#

  • Bug in DataFrame.astype() with non-unique columns and a Series dtype argument (GH44417)

  • Bug in CustomBusinessMonthBegin.__add__() (CustomBusinessMonthEnd.__add__()) not applying the extra offset parameter when beginning (end) of the target month is already a business day (GH41356)

  • Bug in RangeIndex.union() with another RangeIndex with matching (even) step and starts differing by strictly less than step / 2 (GH44019)

  • Bug in RangeIndex.difference() with sort=None and step<0 failing to sort (GH44085)

  • Bug in Series.replace() and DataFrame.replace() with value=None and ExtensionDtypes (GH44270, GH37899)

  • Bug in FloatingArray.equals() failing to consider two arrays equal if they contain np.nan values (GH44382)

  • Bug in DataFrame.shift() with axis=1 and ExtensionDtype columns incorrectly raising when an incompatible fill_value is passed (GH44564)

  • Bug in DataFrame.shift() with axis=1 and periods larger than len(frame.columns) producing an invalid DataFrame (GH44978)

  • Bug in DataFrame.diff() when passing a NumPy integer object instead of an int object (GH44572)

  • Bug in Series.replace() raising ValueError when using regex=True with a Series containing np.nan values (GH43344)

  • Bug in DataFrame.to_records() where an incorrect n was used when missing names were replaced by level_n (GH44818)

  • Bug in DataFrame.eval() where resolvers argument was overriding the default resolvers (GH34966)

  • Series.__repr__() and DataFrame.__repr__() no longer replace all null-values in indexes with “NaN” but use their real string-representations. “NaN” is used only for float("nan") (GH45263)

Contributors#

A total of 275 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

  • Abhishek R

  • Albert Villanova del Moral

  • Alessandro Bisiani +

  • Alex Lim

  • Alex-Gregory-1 +

  • Alexander Gorodetsky

  • Alexander Regueiro +

  • Alexey Györi

  • Alexis Mignon

  • Aleš Erjavec

  • Ali McMaster

  • Alibi +

  • Andrei Batomunkuev +

  • Andrew Eckart +

  • Andrew Hawyrluk

  • Andrew Wood

  • Anton Lodder +

  • Armin Berres +

  • Arushi Sharma +

  • Benedikt Heidrich +

  • Beni Bienz +

  • Benoît Vinot

  • Bert Palm +

  • Boris Rumyantsev +

  • Brian Hulette

  • Brock

  • Bruno Costa +

  • Bryan Racic +

  • Caleb Epstein

  • Calvin Ho

  • ChristofKaufmann +

  • Christopher Yeh +

  • Chuliang Xiao +

  • ClaudiaSilver +

  • DSM

  • Daniel Coll +

  • Daniel Schmidt +

  • Dare Adewumi

  • David +

  • David Sanders +

  • David Wales +

  • Derzan Chiang +

  • DeviousLab +

  • Dhruv B Shetty +

  • Digres45 +

  • Dominik Kutra +

  • Drew Levitt +

  • DriesS

  • EdAbati

  • Elle

  • Elliot Rampono

  • Endre Mark Borza

  • Erfan Nariman

  • Evgeny Naumov +

  • Ewout ter Hoeven +

  • Fangchen Li

  • Felix Divo

  • Felix Dulys +

  • Francesco Andreuzzi +

  • Francois Dion +

  • Frans Larsson +

  • Fred Reiss

  • GYvan

  • Gabriel Di Pardi Arruda +

  • Gesa Stupperich

  • Giacomo Caria +

  • Greg Siano +

  • Griffin Ansel

  • Hiroaki Ogasawara +

  • Horace +

  • Horace Lai +

  • Irv Lustig

  • Isaac Virshup

  • JHM Darbyshire (MBP)

  • JHM Darbyshire (iMac)

  • JHM Darbyshire +

  • Jack Liu

  • Jacob Skwirsk +

  • Jaime Di Cristina +

  • James Holcombe +

  • Janosh Riebesell +

  • Jarrod Millman

  • Jason Bian +

  • Jeff Reback

  • Jernej Makovsek +

  • Jim Bradley +

  • Joel Gibson +

  • Joeperdefloep +

  • Johannes Mueller +

  • John S Bogaardt +

  • John Zangwill +

  • Jon Haitz Legarreta Gorroño +

  • Jon Wiggins +

  • Jonas Haag +

  • Joris Van den Bossche

  • Josh Friedlander

  • José Duarte +

  • Julian Fleischer +

  • Julien de la Bruère-T

  • Justin McOmie

  • Kadatatlu Kishore +

  • Kaiqi Dong

  • Kashif Khan +

  • Kavya9986 +

  • Kendall +

  • Kevin Sheppard

  • Kiley Hewitt

  • Koen Roelofs +

  • Krishna Chivukula

  • KrishnaSai2020

  • Leonardo Freua +

  • Leonardus Chen

  • Liang-Chi Hsieh +

  • Loic Diridollou +

  • Lorenzo Maffioli +

  • Luke Manley +

  • LunarLanding +

  • Marc Garcia

  • Marcel Bittar +

  • Marcel Gerber +

  • Marco Edward Gorelli

  • Marco Gorelli

  • MarcoGorelli

  • Marvin +

  • Mateusz Piotrowski +

  • Mathias Hauser +

  • Matt Richards +

  • Matthew Davis +

  • Matthew Roeschke

  • Matthew Zeitlin

  • Matthias Bussonnier

  • Matti Picus

  • Mauro Silberberg +

  • Maxim Ivanov

  • Maximilian Carr +

  • MeeseeksMachine

  • Michael Sarrazin +

  • Michael Wang +

  • Michał Górny +

  • Mike Phung +

  • Mike Taves +

  • Mohamad Hussein Rkein +

  • NJOKU OKECHUKWU VALENTINE +

  • Neal McBurnett +

  • Nick Anderson +

  • Nikita Sobolev +

  • Olivier Cavadenti +

  • PApostol +

  • Pandas Development Team

  • Patrick Hoefler

  • Peter

  • Peter Tillmann +

  • Prabha Arivalagan +

  • Pradyumna Rahul

  • Prerana Chakraborty

  • Prithvijit +

  • Rahul Gaikwad +

  • Ray Bell

  • Ricardo Martins +

  • Richard Shadrach

  • Robbert-jan ‘t Hoen +

  • Robert Voyer +

  • Robin Raymond +

  • Rohan Sharma +

  • Rohan Sirohia +

  • Roman Yurchak

  • Ruan Pretorius +

  • Sam James +

  • Scott Talbert

  • Shashwat Sharma +

  • Sheogorath27 +

  • Shiv Gupta

  • Shoham Debnath

  • Simon Hawkins

  • Soumya +

  • Stan West +

  • Stefanie Molin +

  • Stefano Alberto Russo +

  • Stephan Heßelmann

  • Stephen

  • Suyash Gupta +

  • Sven

  • Swanand01 +

  • Sylvain Marié +

  • TLouf

  • Tania Allard +

  • Terji Petersen

  • TheDerivator +

  • Thomas Dickson

  • Thomas Kastl +

  • Thomas Kluyver

  • Thomas Li

  • Thomas Smith

  • Tim Swast

  • Tim Tran +

  • Tobias McNulty +

  • Tobias Pitters

  • Tomoki Nakagawa +

  • Tony Hirst +

  • Torsten Wörtwein

  • V.I. Wood +

  • Vaibhav K +

  • Valentin Oliver Loftsson +

  • Varun Shrivastava +

  • Vivek Thazhathattil +

  • Vyom Pathak

  • Wenjun Si

  • William Andrea +

  • William Bradley +

  • Wojciech Sadowski +

  • Yao-Ching Huang +

  • Yash Gupta +

  • Yiannis Hadjicharalambous +

  • Yoshiki Vázquez Baeza

  • Yuanhao Geng

  • Yury Mikhaylov

  • Yvan Gatete +

  • Yves Delley +

  • Zach Rait

  • Zbyszek Królikowski +

  • Zero +

  • Zheyuan

  • Zhiyi Wu +

  • aiudirog

  • ali sayyah +

  • aneesh98 +

  • aptalca

  • arw2019 +

  • attack68

  • brendandrury +

  • bubblingoak +

  • calvinsomething +

  • claws +

  • deponovo +

  • dicristina

  • el-g-1 +

  • evensure +

  • fotino21 +

  • fshi01 +

  • gfkang +

  • github-actions[bot]

  • i-aki-y

  • jbrockmendel

  • jreback

  • juliandwain +

  • jxb4892 +

  • kendall smith +

  • lmcindewar +

  • lrepiton

  • maximilianaccardo +

  • michal-gh

  • neelmraman

  • partev

  • phofl +

  • pratyushsharan +

  • quantumalaviya +

  • rafael +

  • realead

  • rocabrera +

  • rosagold

  • saehuihwang +

  • salomondush +

  • shubham11941140 +

  • srinivasan +

  • stphnlyd

  • suoniq

  • trevorkask +

  • tushushu

  • tyuyoshi +

  • usersblock +

  • vernetya +

  • vrserpa +

  • willie3838 +

  • zeitlinv +

  • zhangxiaoxing +