What’s new in 1.3.0 (July 2, 2021)

These are the changes in pandas 1.3.0. See Release notes for a full changelog including other versions of pandas.

Warning

When reading new Excel 2007+ (.xlsx) files, the default argument engine=None to read_excel() will now result in using the openpyxl engine in all cases when the option io.excel.xlsx.reader is set to "auto". Previously, some cases would use the xlrd engine instead. See What’s new 1.2.0 for background on this change.

Enhancements

Custom HTTP(s) headers when reading csv or json files

When reading from a remote URL that is not handled by fsspec (e.g. HTTP and HTTPS) the dictionary passed to storage_options will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (GH36688). For example:

In [1]: headers = {"User-Agent": "pandas"}

In [2]: df = pd.read_csv(
   ...:     "https://download.bls.gov/pub/time.series/cu/cu.item",
   ...:     sep="\t",
   ...:     storage_options=headers
   ...: )
   ...: 

Read and write XML documents

We added I/O support to read and render shallow versions of XML documents with read_xml() and DataFrame.to_xml(). Using lxml as parser, both XPath 1.0 and XSLT 1.0 are available. (GH27554)

In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
   ...: <data>
   ...:  <row>
   ...:     <shape>square</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides>4.0</sides>
   ...:  </row>
   ...:  <row>
   ...:     <shape>circle</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides/>
   ...:  </row>
   ...:  <row>
   ...:     <shape>triangle</shape>
   ...:     <degrees>180</degrees>
   ...:     <sides>3.0</sides>
   ...:  </row>
   ...:  </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
      shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>

For more, see Writing XML in the user guide on IO tools.

Styler enhancements

We provided some focused development on Styler. See also the Styler documentation which has been revised and improved (GH39720, GH39317, GH40493).

DataFrame constructor honors copy=False with dict

When passing a dictionary to DataFrame with copy=False, a copy will no longer be made (GH32960).

In [3]: arr = np.array([1, 2, 3])

In [4]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [5]: df
Out[5]: 
   A  B
0  1  1
1  2  2
2  3  3

[3 rows x 2 columns]

df["A"] remains a view on arr:

In [6]: arr[0] = 0

In [7]: assert df.iloc[0, 0] == 0

The default behavior when not passing copy will remain unchanged, i.e. a copy will be made.

PyArrow backed string data type

We’ve enhanced the StringDtype, an extension type dedicated to string data. (GH39908)

It is now possible to specify a storage keyword option to StringDtype. Use pandas options or specify the dtype using dtype='string[pyarrow]' to allow the StringArray to be backed by a PyArrow array instead of a NumPy array of Python objects.

The PyArrow backed StringArray requires pyarrow 1.0.0 or greater to be installed.

Warning

string[pyarrow] is currently considered experimental. The implementation and parts of the API may change without warning.

In [8]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[8]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

You can use the alias "string[pyarrow]" as well.

In [9]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")

In [10]: s
Out[10]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

You can also create a PyArrow backed string array using pandas options.

In [11]: with pd.option_context("string_storage", "pyarrow"):
   ....:     s = pd.Series(['abc', None, 'def'], dtype="string")
   ....: 

In [12]: s
Out[12]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string

The usual string accessor methods work. Where appropriate, the return type of the Series or columns of a DataFrame will also have string dtype.

In [13]: s.str.upper()
Out[13]: 
0     ABC
1    <NA>
2     DEF
Length: 3, dtype: string

In [14]: s.str.split('b', expand=True).dtypes
Out[14]: 
0    string
1    string
Length: 2, dtype: object

String accessor methods returning integers will return a value with Int64Dtype

In [15]: s.str.count("a")
Out[15]: 
0       1
1    <NA>
2       0
Length: 3, dtype: Int64

Centered datetime-like rolling windows

When performing rolling calculations on DataFrame and Series objects with a datetime-like index, a centered datetime-like window can now be used (GH38780). For example:

In [16]: df = pd.DataFrame(
   ....:     {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
   ....: )
   ....: 

In [17]: df
Out[17]: 
            A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

[5 rows x 1 columns]

In [18]: df.rolling("2D", center=True).mean()
Out[18]: 
              A
2020-01-01  0.5
2020-01-02  1.5
2020-01-03  2.5
2020-01-04  3.5
2020-01-05  4.0

[5 rows x 1 columns]

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

Categorical.unique now always maintains same dtype as original

Previously, when calling Categorical.unique() with categorical data, unused categories in the new array would be removed, making the dtype of the new array different than the original (GH18291)

As an example of this, given:

In [19]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [20]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [21]: original = pd.Series(cat)

In [22]: unique = original.unique()

Previous behavior:

In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False

New behavior:

In [23]: unique
Out[23]: 
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']

In [24]: original.dtype == unique.dtype
Out[24]: True

Preserve dtypes in DataFrame.combine_first()

DataFrame.combine_first() will now preserve dtypes (GH7509)

In [25]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [26]: df1
Out[26]: 
   A  B
0  1  1
1  2  2
2  3  3

[3 rows x 2 columns]

In [27]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [28]: df2
Out[28]: 
   B  C
2  4  1
3  5  2
4  6  3

[3 rows x 2 columns]

In [29]: combined = df1.combine_first(df2)

Previous behavior:

In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object

New behavior:

In [30]: combined.dtypes
Out[30]: 
A    float64
B      int64
C    float64
Length: 3, dtype: object

Groupby methods agg and transform no longer changes return dtype for callables

Previously the methods DataFrameGroupBy.aggregate(), SeriesGroupBy.aggregate(), DataFrameGroupBy.transform(), and SeriesGroupBy.transform() might cast the result dtype when the argument func is callable, possibly leading to undesirable results (GH21240). The cast would occur if the result is numeric and casting back to the input dtype does not change any values as measured by np.allclose. Now no such casting occurs.

In [31]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [32]: df
Out[32]: 
   key      a     b
0    1   True  True
1    1  False  True

[2 rows x 3 columns]

Previous behavior:

In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
        a  b
key
1    True  2

New behavior:

In [33]: df.groupby('key').agg(lambda x: x.sum())
Out[33]: 
     a  b
key      
1    1  2

[1 rows x 2 columns]

float result for GroupBy.mean(), GroupBy.median(), and GroupBy.var()

Previously, these methods could result in different dtypes depending on the input values. Now, these methods will always return a float dtype. (GH41137)

In [34]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})

Previous behavior:

In [5]: df.groupby(df.index).mean()
Out[5]:
        a  b    c
0    True  1  1.0

New behavior:

In [35]: df.groupby(df.index).mean()
Out[35]: 
     a    b    c
0  1.0  1.0  1.0

[1 rows x 3 columns]

Try operating inplace when setting values with loc and iloc

When setting an entire column using loc or iloc, pandas will try to insert the values into the existing data rather than create an entirely new array.

In [36]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [37]: values = df.values

In [38]: new = np.array([5, 6, 7], dtype="int64")

In [39]: df.loc[[0, 1, 2], "A"] = new

In both the new and old behavior, the data in values is overwritten, but in the old behavior the dtype of df["A"] changed to int64.

Previous behavior:

In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False

In pandas 1.3.0, df continues to share data with values

New behavior:

In [40]: df.dtypes
Out[40]: 
A    float64
Length: 1, dtype: object

In [41]: np.shares_memory(df["A"], new)
Out[41]: False

In [42]: np.shares_memory(df["A"], values)
Out[42]: True

Never operate inplace when setting frame[keys] = values

When setting multiple columns using frame[keys] = values new arrays will replace pre-existing arrays for these keys, which will not be over-written (GH39510). As a result, the columns will retain the dtype(s) of values, never casting to the dtypes of the existing arrays.

In [43]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [44]: df[["A"]] = 5

In the old behavior, 5 was cast to float64 and inserted into the existing array backing df:

Previous behavior:

In [1]: df.dtypes
Out[1]:
A    float64

In the new behavior, we get a new array, and retain an integer-dtyped 5:

New behavior:

In [45]: df.dtypes
Out[45]: 
A    int64
Length: 1, dtype: object

Consistent casting with setting into Boolean Series

Setting non-boolean values into a Series with dtype=bool now consistently casts to dtype=object (GH38709)

In [46]: orig = pd.Series([True, False])

In [47]: ser = orig.copy()

In [48]: ser.iloc[1] = np.nan

In [49]: ser2 = orig.copy()

In [50]: ser2.iloc[1] = 2.0

Previous behavior:

In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

New behavior:

In [51]: ser
Out[51]: 
0    True
1     NaN
Length: 2, dtype: object

In [52]: ser2
Out[52]: 
0    True
1     2.0
Length: 2, dtype: object

GroupBy.rolling no longer returns grouped-by column in values

The group-by column will now be dropped from the result of a groupby.rolling operation (GH32262)

In [53]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [54]: df
Out[54]: 
   A  B
0  1  0
1  1  1
2  2  2
3  3  3

[4 rows x 2 columns]

Previous behavior:

In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
       A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN

New behavior:

In [55]: df.groupby("A").rolling(2).sum()
Out[55]: 
       B
A       
1 0  NaN
  1  1.0
2 2  NaN
3 3  NaN

[4 rows x 1 columns]

Removed artificial truncation in rolling variance and standard deviation

Rolling.std() and Rolling.var() will no longer artificially truncate results that are less than ~1e-8 and ~1e-15 respectively to zero (GH37051, GH40448, GH39872).

However, floating point artifacts may now exist in the results when rolling over larger values.

In [56]: s = pd.Series([7, 5, 5, 5])

In [57]: s.rolling(3).var()
Out[57]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
Length: 4, dtype: float64

GroupBy.rolling with MultiIndex no longer drops levels in the result

GroupBy.rolling() will no longer drop levels of a DataFrame with a MultiIndex in the result. This can lead to a perceived duplication of levels in the resulting MultiIndex, but this change restores the behavior that was present in version 1.1.3 (GH38787, GH38523).

In [58]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [59]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [60]: df
Out[60]: 
               a  b
label1 label2      
idx1   idx2    1  2

[1 rows x 2 columns]

Previous behavior:

In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
          a    b
label1
idx1    1.0  2.0

New behavior:

In [61]: df.groupby('label1').rolling(1).sum()
Out[61]: 
                        a    b
label1 label1 label2          
idx1   idx1   idx2    1.0  2.0

[1 rows x 2 columns]

Backwards incompatible API changes

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package

Minimum Version

Required

Changed

numpy

1.17.3

X

X

pytz

2017.3

X

python-dateutil

2.7.3

X

bottleneck

1.2.1

numexpr

2.7.0

X

pytest (dev)

6.0

X

mypy (dev)

0.812

X

setuptools

38.6.0

X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package

Minimum Version

Changed

beautifulsoup4

4.6.0

fastparquet

0.4.0

X

fsspec

0.7.4

gcsfs

0.6.0

lxml

4.3.0

matplotlib

2.2.3

numba

0.46.0

openpyxl

3.0.0

X

pyarrow

0.17.0

X

pymysql

0.8.1

X

pytables

3.5.1

s3fs

0.4.0

scipy

1.2.0

sqlalchemy

1.3.0

X

tabulate

0.8.7

X

xarray

0.12.0

xlrd

1.2.0

xlsxwriter

1.0.2

xlwt

1.3.0

pandas-gbq

0.12.0

See Dependencies and Optional dependencies for more.

Other API changes

  • Partially initialized CategoricalDtype objects (i.e. those with categories=None) will no longer compare as equal to fully initialized dtype objects (GH38516)

  • Accessing _constructor_expanddim on a DataFrame and _constructor_sliced on a Series now raise an AttributeError. Previously a NotImplementedError was raised (GH38782)

  • Added new engine and **engine_kwargs parameters to DataFrame.to_sql() to support other future “SQL engines”. Currently we still only use SQLAlchemy under the hood, but more engines are planned to be supported such as turbodbc (GH36893)

  • Removed redundant freq from PeriodIndex string representation (GH41653)

  • ExtensionDtype.construct_array_type() is now a required method instead of an optional one for ExtensionDtype subclasses (GH24860)

  • Calling hash on non-hashable pandas objects will now raise TypeError with the built-in error message (e.g. unhashable type: 'Series'). Previously it would raise a custom message such as 'Series' objects are mutable, thus they cannot be hashed. Furthermore, isinstance(<Series>, abc.collections.Hashable) will now return False (GH40013)

  • Styler.from_custom_template() now has two new arguments for template names, and removed the old name, due to template inheritance having been introducing for better parsing (GH42053). Subclassing modifications to Styler attributes are also needed.

Build

  • Documentation in .pptx and .pdf formats are no longer included in wheels or source distributions. (GH30741)

Deprecations

Deprecated dropping nuisance columns in DataFrame reductions and DataFrameGroupBy operations

Calling a reduction (e.g. .min, .max, .sum) on a DataFrame with numeric_only=None (the default), columns where the reduction raises a TypeError are silently ignored and dropped from the result.

This behavior is deprecated. In a future version, the TypeError will be raised, and users will need to select only valid columns before calling the function.

For example:

In [62]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [63]: df
Out[63]: 
   A          B
0  1 2016-01-01
1  2 2016-01-02
2  3 2016-01-03
3  4 2016-01-04

[4 rows x 2 columns]

Old behavior:

In [3]: df.prod()
Out[3]:
Out[3]:
A    24
dtype: int64

Future behavior:

In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'

In [5]: df[["A"]].prod()
Out[5]:
A    24
dtype: int64

Similarly, when applying a function to DataFrameGroupBy, columns on which the function raises TypeError are currently silently ignored and dropped from the result.

This behavior is deprecated. In a future version, the TypeError will be raised, and users will need to select only valid columns before calling the function.

For example:

In [64]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})

In [65]: gb = df.groupby([1, 1, 2, 2])

Old behavior:

In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1   2
2  12

Future behavior:

In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations

In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
    A
1   2
2  12

Other Deprecations

Performance improvements

Bug fixes

Categorical

  • Bug in CategoricalIndex incorrectly failing to raise TypeError when scalar data is passed (GH38614)

  • Bug in CategoricalIndex.reindex failed when the Index passed was not categorical but whose values were all labels in the category (GH28690)

  • Bug where constructing a Categorical from an object-dtype array of date objects did not round-trip correctly with astype (GH38552)

  • Bug in constructing a DataFrame from an ndarray and a CategoricalDtype (GH38857)

  • Bug in setting categorical values into an object-dtype column in a DataFrame (GH39136)

  • Bug in DataFrame.reindex() was raising an IndexError when the new index contained duplicates and the old index was a CategoricalIndex (GH38906)

  • Bug in Categorical.fillna() with a tuple-like category raising NotImplementedError instead of ValueError when filling with a non-category tuple (GH41914)

Datetimelike

Timedelta

  • Bug in constructing Timedelta from np.timedelta64 objects with non-nanosecond units that are out of bounds for timedelta64[ns] (GH38965)

  • Bug in constructing a TimedeltaIndex incorrectly accepting np.datetime64("NaT") objects (GH39462)

  • Bug in constructing Timedelta from an input string with only symbols and no digits failed to raise an error (GH39710)

  • Bug in TimedeltaIndex and to_timedelta() failing to raise when passed non-nanosecond timedelta64 arrays that overflow when converting to timedelta64[ns] (GH40008)

Timezones

  • Bug in different tzinfo objects representing UTC not being treated as equivalent (GH39216)

  • Bug in dateutil.tz.gettz("UTC") not being recognized as equivalent to other UTC-representing tzinfos (GH39276)

Numeric

Conversion

  • Bug in Series.to_dict() with orient='records' now returns Python native types (GH25969)

  • Bug in Series.view() and Index.view() when converting between datetime-like (datetime64[ns], datetime64[ns, tz], timedelta64, period) dtypes (GH39788)

  • Bug in creating a DataFrame from an empty np.recarray not retaining the original dtypes (GH40121)

  • Bug in DataFrame failing to raise a TypeError when constructing from a frozenset (GH40163)

  • Bug in Index construction silently ignoring a passed dtype when the data cannot be cast to that dtype (GH21311)

  • Bug in StringArray.astype() falling back to NumPy and raising when converting to dtype='categorical' (GH40450)

  • Bug in factorize() where, when given an array with a numeric NumPy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (GH41132)

  • Bug in DataFrame construction with a dictionary containing an array-like with ExtensionDtype and copy=True failing to make a copy (GH38939)

  • Bug in qcut() raising error when taking Float64DType as input (GH40730)

  • Bug in DataFrame and Series construction with datetime64[ns] data and dtype=object resulting in datetime objects instead of Timestamp objects (GH41599)

  • Bug in DataFrame and Series construction with timedelta64[ns] data and dtype=object resulting in np.timedelta64 objects instead of Timedelta objects (GH41599)

  • Bug in DataFrame construction when given a two-dimensional object-dtype np.ndarray of Period or Interval objects failing to cast to PeriodDtype or IntervalDtype, respectively (GH41812)

  • Bug in constructing a Series from a list and a PandasDtype (GH39357)

  • Bug in creating a Series from a range object that does not fit in the bounds of int64 dtype (GH30173)

  • Bug in creating a Series from a dict with all-tuple keys and an Index that requires reindexing (GH41707)

  • Bug in infer_dtype() not recognizing Series, Index, or array with a Period dtype (GH23553)

  • Bug in infer_dtype() raising an error for general ExtensionArray objects. It will now return "unknown-array" instead of raising (GH37367)

  • Bug in DataFrame.convert_dtypes() incorrectly raised a ValueError when called on an empty DataFrame (GH40393)

Strings

Interval

  • Bug in IntervalIndex.intersection() and IntervalIndex.symmetric_difference() always returning object-dtype when operating with CategoricalIndex (GH38653, GH38741)

  • Bug in IntervalIndex.intersection() returning duplicates when at least one of the Index objects have duplicates which are present in the other (GH38743)

  • IntervalIndex.union(), IntervalIndex.intersection(), IntervalIndex.difference(), and IntervalIndex.symmetric_difference() now cast to the appropriate dtype instead of raising a TypeError when operating with another IntervalIndex with incompatible dtype (GH39267)

  • PeriodIndex.union(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference(), PeriodIndex.difference() now cast to object dtype instead of raising IncompatibleFrequency when operating with another PeriodIndex with incompatible dtype (GH39306)

  • Bug in IntervalIndex.is_monotonic(), IntervalIndex.get_loc(), IntervalIndex.get_indexer_for(), and IntervalIndex.__contains__() when NA values are present (GH41831)

Indexing

  • Bug in Index.union() and MultiIndex.union() dropping duplicate Index values when Index was not monotonic or sort was set to False (GH36289, GH31326, GH40862)

  • Bug in CategoricalIndex.get_indexer() failing to raise InvalidIndexError when non-unique (GH38372)

  • Bug in IntervalIndex.get_indexer() when target has CategoricalDtype and both the index and the target contain NA values (GH41934)

  • Bug in Series.loc() raising a ValueError when input was filtered with a Boolean list and values to set were a list with lower dimension (GH20438)

  • Bug in inserting many new columns into a DataFrame causing incorrect subsequent indexing behavior (GH38380)

  • Bug in DataFrame.__setitem__() raising a ValueError when setting multiple values to duplicate columns (GH15695)

  • Bug in DataFrame.loc(), Series.loc(), DataFrame.__getitem__() and Series.__getitem__() returning incorrect elements for non-monotonic DatetimeIndex for string slices (GH33146)

  • Bug in DataFrame.reindex() and Series.reindex() with timezone aware indexes raising a TypeError for method="ffill" and method="bfill" and specified tolerance (GH38566)

  • Bug in DataFrame.reindex() with datetime64[ns] or timedelta64[ns] incorrectly casting to integers when the fill_value requires casting to object dtype (GH39755)

  • Bug in DataFrame.__setitem__() raising a ValueError when setting on an empty DataFrame using specified columns and a nonempty DataFrame value (GH38831)

  • Bug in DataFrame.loc.__setitem__() raising a ValueError when operating on a unique column when the DataFrame has duplicate columns (GH38521)

  • Bug in DataFrame.iloc.__setitem__() and DataFrame.loc.__setitem__() with mixed dtypes when setting with a dictionary value (GH38335)

  • Bug in Series.loc.__setitem__() and DataFrame.loc.__setitem__() raising KeyError when provided a Boolean generator (GH39614)

  • Bug in Series.iloc() and DataFrame.iloc() raising a KeyError when provided a generator (GH39614)

  • Bug in DataFrame.__setitem__() not raising a ValueError when the right hand side is a DataFrame with wrong number of columns (GH38604)

  • Bug in Series.__setitem__() raising a ValueError when setting a Series with a scalar indexer (GH38303)

  • Bug in DataFrame.loc() dropping levels of a MultiIndex when the DataFrame used as input has only one row (GH10521)

  • Bug in DataFrame.__getitem__() and Series.__getitem__() always raising KeyError when slicing with existing strings where the Index has milliseconds (GH33589)

  • Bug in setting timedelta64 or datetime64 values into numeric Series failing to cast to object dtype (GH39086, GH39619)

  • Bug in setting Interval values into a Series or DataFrame with mismatched IntervalDtype incorrectly casting the new values to the existing dtype (GH39120)

  • Bug in setting datetime64 values into a Series with integer-dtype incorrectly casting the datetime64 values to integers (GH39266)

  • Bug in setting np.datetime64("NaT") into a Series with Datetime64TZDtype incorrectly treating the timezone-naive value as timezone-aware (GH39769)

  • Bug in Index.get_loc() not raising KeyError when key=NaN and method is specified but NaN is not in the Index (GH39382)

  • Bug in DatetimeIndex.insert() when inserting np.datetime64("NaT") into a timezone-aware index incorrectly treating the timezone-naive value as timezone-aware (GH39769)

  • Bug in incorrectly raising in Index.insert(), when setting a new column that cannot be held in the existing frame.columns, or in Series.reset_index() or DataFrame.reset_index() instead of casting to a compatible dtype (GH39068)

  • Bug in RangeIndex.append() where a single object of length 1 was concatenated incorrectly (GH39401)

  • Bug in RangeIndex.astype() where when converting to CategoricalIndex, the categories became a Int64Index instead of a RangeIndex (GH41263)

  • Bug in setting numpy.timedelta64 values into an object-dtype Series using a Boolean indexer (GH39488)

  • Bug in setting numeric values into a into a boolean-dtypes Series using at or iat failing to cast to object-dtype (GH39582)

  • Bug in DataFrame.__setitem__() and DataFrame.iloc.__setitem__() raising ValueError when trying to index with a row-slice and setting a list as values (GH40440)

  • Bug in DataFrame.loc() not raising KeyError when the key was not found in MultiIndex and the levels were not fully specified (GH41170)

  • Bug in DataFrame.loc.__setitem__() when setting-with-expansion incorrectly raising when the index in the expanding axis contained duplicates (GH40096)

  • Bug in DataFrame.loc.__getitem__() with MultiIndex casting to float when at least one index column has float dtype and we retrieve a scalar (GH41369)

  • Bug in DataFrame.loc() incorrectly matching non-Boolean index elements (GH20432)

  • Bug in indexing with np.nan on a Series or DataFrame with a CategoricalIndex incorrectly raising KeyError when np.nan keys are present (GH41933)

  • Bug in Series.__delitem__() with ExtensionDtype incorrectly casting to ndarray (GH40386)

  • Bug in DataFrame.at() with a CategoricalIndex returning incorrect results when passed integer keys (GH41846)

  • Bug in DataFrame.loc() returning a MultiIndex in the wrong order if an indexer has duplicates (GH40978)

  • Bug in DataFrame.__setitem__() raising a TypeError when using a str subclass as the column name with a DatetimeIndex (GH37366)

  • Bug in PeriodIndex.get_loc() failing to raise a KeyError when given a Period with a mismatched freq (GH41670)

  • Bug .loc.__getitem__ with a UInt64Index and negative-integer keys raising OverflowError instead of KeyError in some cases, wrapping around to positive integers in others (GH41777)

  • Bug in Index.get_indexer() failing to raise ValueError in some cases with invalid method, limit, or tolerance arguments (GH41918)

  • Bug when slicing a Series or DataFrame with a TimedeltaIndex when passing an invalid string raising ValueError instead of a TypeError (GH41821)

  • Bug in Index constructor sometimes silently ignoring a specified dtype (GH38879)

  • Index.where() behavior now mirrors Index.putmask() behavior, i.e. index.where(mask, other) matches index.putmask(~mask, other) (GH39412)

Missing

MultiIndex

  • Bug in DataFrame.drop() raising a TypeError when the MultiIndex is non-unique and level is not provided (GH36293)

  • Bug in MultiIndex.intersection() duplicating NaN in the result (GH38623)

  • Bug in MultiIndex.equals() incorrectly returning True when the MultiIndex contained NaN even when they are differently ordered (GH38439)

  • Bug in MultiIndex.intersection() always returning an empty result when intersecting with CategoricalIndex (GH38653)

  • Bug in MultiIndex.difference() incorrectly raising TypeError when indexes contain non-sortable entries (GH41915)

  • Bug in MultiIndex.reindex() raising a ValueError when used on an empty MultiIndex and indexing only a specific level (GH41170)

  • Bug in MultiIndex.reindex() raising TypeError when reindexing against a flat Index (GH41707)

I/O

Period

  • Comparisons of Period objects or Index, Series, or DataFrame with mismatched PeriodDtype now behave like other mismatched-type comparisons, returning False for equals, True for not-equal, and raising TypeError for inequality checks (GH39274)

Plotting

  • Bug in plotting.scatter_matrix() raising when 2d ax argument passed (GH16253)

  • Prevent warnings when Matplotlib’s constrained_layout is enabled (GH25261)

  • Bug in DataFrame.plot() was showing the wrong colors in the legend if the function was called repeatedly and some calls used yerr while others didn’t (GH39522)

  • Bug in DataFrame.plot() was showing the wrong colors in the legend if the function was called repeatedly and some calls used secondary_y and others use legend=False (GH40044)

  • Bug in DataFrame.plot.box() when dark_background theme was selected, caps or min/max markers for the plot were not visible (GH40769)

Groupby/resample/rolling

  • Bug in GroupBy.agg() with PeriodDtype columns incorrectly casting results too aggressively (GH38254)

  • Bug in SeriesGroupBy.value_counts() where unobserved categories in a grouped categorical Series were not tallied (GH38672)

  • Bug in SeriesGroupBy.value_counts() where an error was raised on an empty Series (GH39172)

  • Bug in GroupBy.indices() would contain non-existent indices when null values were present in the groupby keys (GH9304)

  • Fixed bug in GroupBy.sum() causing a loss of precision by now using Kahan summation (GH38778)

  • Fixed bug in GroupBy.cumsum() and GroupBy.mean() causing loss of precision through using Kahan summation (GH38934)

  • Bug in Resampler.aggregate() and DataFrame.transform() raising a TypeError instead of SpecificationError when missing keys had mixed dtypes (GH39025)

  • Bug in DataFrameGroupBy.idxmin() and DataFrameGroupBy.idxmax() with ExtensionDtype columns (GH38733)

  • Bug in Series.resample() would raise when the index was a PeriodIndex consisting of NaT (GH39227)

  • Bug in RollingGroupby.corr() and ExpandingGroupby.corr() where the groupby column would return 0 instead of np.nan when providing other that was longer than each group (GH39591)

  • Bug in ExpandingGroupby.corr() and ExpandingGroupby.cov() where 1 would be returned instead of np.nan when providing other that was longer than each group (GH39591)

  • Bug in GroupBy.mean(), GroupBy.median() and DataFrame.pivot_table() not propagating metadata (GH28283)

  • Bug in Series.rolling() and DataFrame.rolling() not calculating window bounds correctly when window is an offset and dates are in descending order (GH40002)

  • Bug in Series.groupby() and DataFrame.groupby() on an empty Series or DataFrame would lose index, columns, and/or data types when directly using the methods idxmax, idxmin, mad, min, max, sum, prod, and skew or using them through apply, aggregate, or resample (GH26411)

  • Bug in GroupBy.apply() where a MultiIndex would be created instead of an Index when used on a RollingGroupby object (GH39732)

  • Bug in DataFrameGroupBy.sample() where an error was raised when weights was specified and the index was an Int64Index (GH39927)

  • Bug in DataFrameGroupBy.aggregate() and Resampler.aggregate() would sometimes raise a SpecificationError when passed a dictionary and columns were missing; will now always raise a KeyError instead (GH40004)

  • Bug in DataFrameGroupBy.sample() where column selection was not applied before computing the result (GH39928)

  • Bug in ExponentialMovingWindow when calling __getitem__ would incorrectly raise a ValueError when providing times (GH40164)

  • Bug in ExponentialMovingWindow when calling __getitem__ would not retain com, span, alpha or halflife attributes (GH40164)

  • ExponentialMovingWindow now raises a NotImplementedError when specifying times with adjust=False due to an incorrect calculation (GH40098)

  • Bug in ExponentialMovingWindowGroupby.mean() where the times argument was ignored when engine='numba' (GH40951)

  • Bug in ExponentialMovingWindowGroupby.mean() where the wrong times were used the in case of multiple groups (GH40951)

  • Bug in ExponentialMovingWindowGroupby where the times vector and values became out of sync for non-trivial groups (GH40951)

  • Bug in Series.asfreq() and DataFrame.asfreq() dropping rows when the index was not sorted (GH39805)

  • Bug in aggregation functions for DataFrame not respecting numeric_only argument when level keyword was given (GH40660)

  • Bug in SeriesGroupBy.aggregate() where using a user-defined function to aggregate a Series with an object-typed Index causes an incorrect Index shape (GH40014)

  • Bug in RollingGroupby where as_index=False argument in groupby was ignored (GH39433)

  • Bug in GroupBy.any() and GroupBy.all() raising a ValueError when using with nullable type columns holding NA even with skipna=True (GH40585)

  • Bug in GroupBy.cummin() and GroupBy.cummax() incorrectly rounding integer values near the int64 implementations bounds (GH40767)

  • Bug in GroupBy.rank() with nullable dtypes incorrectly raising a TypeError (GH41010)

  • Bug in GroupBy.cummin() and GroupBy.cummax() computing wrong result with nullable data types too large to roundtrip when casting to float (GH37493)

  • Bug in DataFrame.rolling() returning mean zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH41053)

  • Bug in DataFrame.rolling() returning sum not zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH41053)

  • Bug in SeriesGroupBy.agg() failing to retain ordered CategoricalDtype on order-preserving aggregations (GH41147)

  • Bug in GroupBy.min() and GroupBy.max() with multiple object-dtype columns and numeric_only=False incorrectly raising a ValueError (GH41111)

  • Bug in DataFrameGroupBy.rank() with the GroupBy object’s axis=0 and the rank method’s keyword axis=1 (GH41320)

  • Bug in DataFrameGroupBy.__getitem__() with non-unique columns incorrectly returning a malformed SeriesGroupBy instead of DataFrameGroupBy (GH41427)

  • Bug in DataFrameGroupBy.transform() with non-unique columns incorrectly raising an AttributeError (GH41427)

  • Bug in Resampler.apply() with non-unique columns incorrectly dropping duplicated columns (GH41445)

  • Bug in Series.groupby() aggregations incorrectly returning empty Series instead of raising TypeError on aggregations that are invalid for its dtype, e.g. .prod with datetime64[ns] dtype (GH41342)

  • Bug in DataFrameGroupBy aggregations incorrectly failing to drop columns with invalid dtypes for that aggregation when there are no valid columns (GH41291)

  • Bug in DataFrame.rolling.__iter__() where on was not assigned to the index of the resulting objects (GH40373)

  • Bug in DataFrameGroupBy.transform() and DataFrameGroupBy.agg() with engine="numba" where *args were being cached with the user passed function (GH41647)

  • Bug in DataFrameGroupBy methods agg, transform, sum, bfill, ffill, pad, pct_change, shift, ohlc dropping .columns.names (GH41497)

Reshaping

Sparse

  • Bug in DataFrame.sparse.to_coo() raising a KeyError with columns that are a numeric Index without a 0 (GH18414)

  • Bug in SparseArray.astype() with copy=False producing incorrect results when going from integer dtype to floating dtype (GH34456)

  • Bug in SparseArray.max() and SparseArray.min() would always return an empty result (GH40921)

ExtensionArray

Styler

  • Bug in Styler where the subset argument in methods raised an error for some valid MultiIndex slices (GH33562)

  • Styler rendered HTML output has seen minor alterations to support w3 good code standards (GH39626)

  • Bug in Styler where rendered HTML was missing a column class identifier for certain header cells (GH39716)

  • Bug in Styler.background_gradient() where text-color was not determined correctly (GH39888)

  • Bug in Styler.set_table_styles() where multiple elements in CSS-selectors of the table_styles argument were not correctly added (GH34061)

  • Bug in Styler where copying from Jupyter dropped the top left cell and misaligned headers (GH12147)

  • Bug in Styler.where where kwargs were not passed to the applicable callable (GH40845)

  • Bug in Styler causing CSS to duplicate on multiple renders (GH39395, GH40334)

Other

Contributors

A total of 251 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

  • Abhishek R +

  • Ada Draginda

  • Adam J. Stewart

  • Adam Turner +

  • Aidan Feldman +

  • Ajitesh Singh +

  • Akshat Jain +

  • Albert Villanova del Moral

  • Alexandre Prince-Levasseur +

  • Andrew Hawyrluk +

  • Andrew Wieteska

  • AnglinaBhambra +

  • Ankush Dua +

  • Anna Daglis

  • Ashlan Parker +

  • Ashwani +

  • Avinash Pancham

  • Ayushman Kumar +

  • BeanNan

  • Benoît Vinot

  • Bharat Raghunathan

  • Bijay Regmi +

  • Bobin Mathew +

  • Bogdan Pilyavets +

  • Brian Hulette +

  • Brian Sun +

  • Brock +

  • Bryan Cutler

  • Caleb +

  • Calvin Ho +

  • Chathura Widanage +

  • Chinmay Rane +

  • Chris Lynch

  • Chris Withers

  • Christos Petropoulos

  • Corentin Girard +

  • DaPy15 +

  • Damodara Puddu +

  • Daniel Hrisca

  • Daniel Saxton

  • DanielFEvans

  • Dare Adewumi +

  • Dave Willmer

  • David Schlachter +

  • David-dmh +

  • Deepang Raval +

  • Doris Lee +

  • Dr. Jan-Philip Gehrcke +

  • DriesS +

  • Dylan Percy

  • Erfan Nariman

  • Eric Leung

  • EricLeer +

  • Eve

  • Fangchen Li

  • Felix Divo

  • Florian Jetter

  • Fred Reiss

  • GFJ138 +

  • Gaurav Sheni +

  • Geoffrey B. Eisenbarth +

  • Gesa Stupperich +

  • Griffin Ansel +

  • Gustavo C. Maciel +

  • Heidi +

  • Henry +

  • Hung-Yi Wu +

  • Ian Ozsvald +

  • Irv Lustig

  • Isaac Chung +

  • Isaac Virshup

  • JHM Darbyshire (MBP) +

  • JHM Darbyshire (iMac) +

  • Jack Liu +

  • James Lamb +

  • Jeet Parekh

  • Jeff Reback

  • Jiezheng2018 +

  • Jody Klymak

  • Johan Kåhrström +

  • John McGuigan

  • Joris Van den Bossche

  • Jose

  • JoseNavy

  • Josh Dimarsky

  • Josh Friedlander

  • Joshua Klein +

  • Julia Signell

  • Julian Schnitzler +

  • Kaiqi Dong

  • Kasim Panjri +

  • Katie Smith +

  • Kelly +

  • Kenil +

  • Keppler, Kyle +

  • Kevin Sheppard

  • Khor Chean Wei +

  • Kiley Hewitt +

  • Larry Wong +

  • Lightyears +

  • Lucas Holtz +

  • Lucas Rodés-Guirao

  • Lucky Sivagurunathan +

  • Luis Pinto

  • Maciej Kos +

  • Marc Garcia

  • Marco Edward Gorelli +

  • Marco Gorelli

  • MarcoGorelli +

  • Mark Graham

  • Martin Dengler +

  • Martin Grigorov +

  • Marty Rudolf +

  • Matt Roeschke

  • Matthew Roeschke

  • Matthew Zeitlin

  • Max Bolingbroke

  • Maxim Ivanov

  • Maxim Kupfer +

  • Mayur +

  • MeeseeksMachine

  • Micael Jarniac

  • Michael Hsieh +

  • Michel de Ruiter +

  • Mike Roberts +

  • Miroslav Šedivý

  • Mohammad Jafar Mashhadi

  • Morisa Manzella +

  • Mortada Mehyar

  • Muktan +

  • Naveen Agrawal +

  • Noah

  • Nofar Mishraki +

  • Oleh Kozynets

  • Olga Matoula +

  • Oli +

  • Omar Afifi

  • Omer Ozarslan +

  • Owen Lamont +

  • Ozan Öğreden +

  • Pandas Development Team

  • Paolo Lammens

  • Parfait Gasana +

  • Patrick Hoefler

  • Paul McCarthy +

  • Paulo S. Costa +

  • Pav A

  • Peter

  • Pradyumna Rahul +

  • Punitvara +

  • QP Hou +

  • Rahul Chauhan

  • Rahul Sathanapalli

  • Richard Shadrach

  • Robert Bradshaw

  • Robin to Roxel

  • Rohit Gupta

  • Sam Purkis +

  • Samuel GIFFARD +

  • Sean M. Law +

  • Shahar Naveh +

  • ShaharNaveh +

  • Shiv Gupta +

  • Shrey Dixit +

  • Shudong Yang +

  • Simon Boehm +

  • Simon Hawkins

  • Sioned Baker +

  • Stefan Mejlgaard +

  • Steven Pitman +

  • Steven Schaerer +

  • Stéphane Guillou +

  • TLouf +

  • Tegar D Pratama +

  • Terji Petersen

  • Theodoros Nikolaou +

  • Thomas Dickson

  • Thomas Li

  • Thomas Smith

  • Thomas Yu +

  • ThomasBlauthQC +

  • Tim Hoffmann

  • Tom Augspurger

  • Torsten Wörtwein

  • Tyler Reddy

  • UrielMaD

  • Uwe L. Korn

  • Venaturum +

  • VirosaLi

  • Vladimir Podolskiy

  • Vyom Pathak +

  • WANG Aiyong

  • Waltteri Koskinen +

  • Wenjun Si +

  • William Ayd

  • Yeshwanth N +

  • Yuanhao Geng

  • Zito Relova +

  • aflah02 +

  • arredond +

  • attack68

  • cdknox +

  • chinggg +

  • fathomer +

  • ftrihardjo +

  • github-actions[bot] +

  • gunjan-solanki +

  • guru kiran

  • hasan-yaman

  • i-aki-y +

  • jbrockmendel

  • jmholzer +

  • jordi-crespo +

  • jotasi +

  • jreback

  • juliansmidek +

  • kylekeppler

  • lrepiton +

  • lucasrodes

  • maroth96 +

  • mikeronayne +

  • mlondschien

  • moink +

  • morrme

  • mschmookler +

  • mzeitlin11

  • na2 +

  • nofarmishraki +

  • partev

  • patrick

  • ptype

  • realead

  • rhshadrach

  • rlukevie +

  • rosagold +

  • saucoide +

  • sdementen +

  • shawnbrown

  • sstiijn +

  • stphnlyd +

  • sukriti1 +

  • taytzehao

  • theOehrly +

  • theodorju +

  • thordisstella +

  • tonyyyyip +

  • tsinggggg +

  • tushushu +

  • vangorade +

  • vladu +

  • wertha +