What’s new in 2.1.0 (Month XX, 2023)#

These are the changes in pandas 2.1.0. See Release notes for a full changelog including other versions of pandas.

Enhancements#

enhancement1#

map(func, na_action="ignore") now works for all array types#

When given a callable, Series.map() applies the callable to all elements of the Series. Similarly, DataFrame.map() applies the callable to all elements of the DataFrame, while Index.map() applies the callable to all elements of the Index.

Frequently, it is not desirable to apply the callable to nan-like values of the array and to avoid doing that, the map method could be called with na_action="ignore", i.e. ser.map(func, na_action="ignore"). However, na_action="ignore" was not implemented for many ExtensionArray and Index types and na_action="ignore" did not work correctly for any ExtensionArray subclass except the nullable numeric ones (i.e. with dtype Int64 etc.).

na_action="ignore" now works for all array types (GH 52219, GH 51645, GH 51809, GH 51936, GH 52033; GH 52096).

Previous behavior:

In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [2]: ser.map(str.upper, na_action="ignore")
NotImplementedError
In [3]: df = pd.DataFrame(ser)
In [4]: df.applymap(str.upper, na_action="ignore")  # worked for DataFrame
     0
0    A
1    B
2  NaN
In [5]: idx = pd.Index(ser)
In [6]: idx.map(str.upper, na_action="ignore")
TypeError: CategoricalIndex.map() got an unexpected keyword argument 'na_action'

New behavior:

In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category")

In [2]: ser.map(str.upper, na_action="ignore")
Out[2]: 
0      A
1      B
2    NaN
dtype: category
Categories (2, object): ['A', 'B']

In [3]: df = pd.DataFrame(ser)

In [4]: df.map(str.upper, na_action="ignore")
Out[4]: 
     0
0    A
1    B
2  NaN

In [5]: idx = pd.Index(ser)

In [6]: idx.map(str.upper, na_action="ignore")
Out[6]: CategoricalIndex(['A', 'B', nan], categories=['A', 'B'], ordered=False, dtype='category')

Notice also that in this version, DataFrame.map() been added and DataFrame.applymap() has been deprecated. DataFrame.map() has the same functionality as DataFrame.applymap(), but the new name better communicate that this is the DataFrame version of Series.map() (GH 52353).

Also, note that Categorical.map() implicitly has had its na_action set to "ignore" by default. This has been deprecated and will Categorical.map() in the future change the default to na_action=None, like for all the other array types.

Other enhancements#

Notable bug fixes#

These are bug fixes that might have notable behavior changes.

notable_bug_fix1#

notable_bug_fix2#

Backwards incompatible API changes#

Increased minimum version for Python#

pandas 2.1.0 supports Python 3.9 and higher.

Increased minimum versions for dependencies#

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package

Minimum Version

Required

Changed

numpy

1.21.6

X

X

mypy (dev)

1.2

X

beautifulsoup4

4.11.1

X

bottleneck

1.3.4

X

fastparquet

0.8.1

X

fsspec

2022.05.0

X

hypothesis

6.46.1

X

gcsfs

2022.05.0

X

jinja2

3.1.2

X

lxml

4.8.0

X

numba

0.55.2

X

numexpr

2.8.0

X

openpyxl

3.0.10

X

pandas-gbq

0.17.5

X

psycopg2

2.9.3

X

pyreadstat

1.1.5

X

pyqt5

5.15.6

X

pytables

3.7.0

X

python-snappy

0.6.1

X

pyxlsb

1.0.9

X

s3fs

2022.05.0

X

scipy

1.8.1

X

sqlalchemy

1.4.36

X

tabulate

0.8.10

X

xarray

2022.03.0

X

xlsxwriter

3.0.3

X

zstandard

0.17.0

X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package

Minimum Version

Changed

X

See Dependencies and Optional dependencies for more.

Other API changes#

Deprecations#

Performance improvements#

Bug fixes#

Categorical#

Datetimelike#

  • DatetimeIndex.map() with na_action="ignore" now works as expected. (GH 51644)

  • Bug in date_range() when freq was a DateOffset with nanoseconds (GH 46877)

  • Bug in Timestamp.round() with values close to the implementation bounds returning incorrect results instead of raising OutOfBoundsDatetime (GH 51494)

  • Bug in arrays.DatetimeArray.map() and DatetimeIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)

  • Bug in constructing a Series or DataFrame from a datetime or timedelta scalar always inferring nanosecond resolution instead of inferring from the input (GH 52212)

  • Bug in parsing datetime strings with weekday but no day e.g. “2023 Sept Thu” incorrectly raising AttributeError instead of ValueError (GH 52659)

Timedelta#

  • TimedeltaIndex.map() with na_action="ignore" now works as expected (GH 51644)

  • Bug in TimedeltaIndex division or multiplication leading to .freq of “0 Days” instead of None (GH 51575)

  • Bug in Timedelta.round() with values close to the implementation bounds returning incorrect results instead of raising OutOfBoundsTimedelta (GH 51494)

  • Bug in arrays.TimedeltaArray.map() and TimedeltaIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)

Timezones#

  • Bug in infer_freq() that raises TypeError for Series of timezone-aware timestamps (GH 52456)

  • Bug in DatetimeTZDtype.base() that always returns a NumPy dtype with nanosecond resolution (GH 52705)

Numeric#

Conversion#

  • Bug in DataFrame.style.to_latex() and DataFrame.style.to_html() if the DataFrame contains integers with more digits than can be represented by floating point double precision (GH 52272)

  • Bug in array() when given a datetime64 or timedelta64 dtype with unit of “s”, “us”, or “ms” returning PandasArray instead of DatetimeArray or TimedeltaArray (GH 52859)

  • Bug in ArrowDtype.numpy_dtype() returning nanosecond units for non-nanosecond pyarrow.timestamp and pyarrow.duration types (GH 51800)

  • Bug in DataFrame.__repr__() incorrectly raising a TypeError when the dtype of a column is np.record (GH 48526)

  • Bug in DataFrame.info() raising ValueError when use_numba is set (GH 51922)

  • Bug in DataFrame.insert() raising TypeError if loc is np.int64 (GH 53193)

Strings#

Interval#

Indexing#

  • Bug in DataFrame.__setitem__() losing dtype when setting a DataFrame into duplicated columns (GH 53143)

  • Bug in DataFrame.__setitem__() with a boolean mask and DataFrame.putmask() with mixed non-numeric dtypes and a value other than NaN incorrectly raising TypeError (GH 53291)

Missing#

MultiIndex#

I/O#

Period#

  • PeriodIndex.map() with na_action="ignore" now works as expected (GH 51644)

  • Bug in PeriodDtype constructor failing to raise TypeError when no argument is passed or when None is passed (GH 27388)

  • Bug in PeriodDtype constructor incorrectly returning the same normalize for different DateOffset freq inputs (GH 24121)

  • Bug in PeriodDtype constructor raising ValueError instead of TypeError when an invalid type is passed (GH 51790)

  • Bug in read_csv() not processing empty strings as a null value, with engine="pyarrow" (GH 52087)

  • Bug in read_csv() returning object dtype columns instead of float64 dtype columns with engine="pyarrow" for columns that are all null with engine="pyarrow" (GH 52087)

  • Bug in arrays.PeriodArray.map() and PeriodIndex.map(), where the supplied callable operated array-wise instead of element-wise (GH 51977)

  • Bug in incorrectly allowing construction of Period or PeriodDtype with CustomBusinessDay freq; use BusinessDay instead (GH 52534)

Plotting#

Groupby/resample/rolling#

  • Bug in DataFrame.resample() and Series.resample() in incorrectly allowing non-fixed freq when resampling on a TimedeltaIndex (GH 51896)

  • Bug in DataFrameGroupBy.idxmin(), SeriesGroupBy.idxmin(), DataFrameGroupBy.idxmax(), SeriesGroupBy.idxmax() return wrong dtype when used on empty DataFrameGroupBy or SeriesGroupBy (GH 51423)

  • Bug in weighted rolling aggregations when specifying min_periods=0 (GH 51449)

  • Bug in DataFrame.groupby() and Series.groupby(), where, when the index of the grouped Series or DataFrame was a DatetimeIndex, TimedeltaIndex or PeriodIndex, and the groupby method was given a function as its first argument, the function operated on the whole index rather than each element of the index. (GH 51979)

  • Bug in DataFrame.groupby() with column selection on the resulting groupby object not returning names as tuples when grouping by a list of a single element. (GH 53500)

  • Bug in DataFrameGroupBy.agg() with lists not respecting as_index=False (GH 52849)

  • Bug in DataFrameGroupBy.apply() causing an error to be raised when the input DataFrame was subset as a DataFrame after groupby ([['a']] and not ['a']) and the given callable returned Series that were not all indexed the same. (GH 52444)

  • Bug in DataFrameGroupBy.apply() raising a TypeError when selecting multiple columns and providing a function that returns np.ndarray results (GH 18930)

  • Bug in GroupBy.groups() with a datetime key in conjunction with another key produced incorrect number of group keys (GH 51158)

  • Bug in GroupBy.quantile() may implicitly sort the result index with sort=False (GH 53009)

  • Bug in GroupBy.var() failing to raise TypeError when called with datetime64, timedelta64 or PeriodDtype values (GH 52128, GH 53045)

  • Bug in SeriresGroupBy.nth() and DataFrameGroupBy.nth() after performing column selection when using dropna="any" or dropna="all" would not subset columns (GH 53518)

  • Bug in SeriresGroupBy.nth() and DataFrameGroupBy.nth() raised after performing column selection when using dropna="any" or dropna="all" resulted in rows being dropped (GH 53518)

Reshaping#

Sparse#

  • Bug in SparseDtype constructor failing to raise TypeError when given an incompatible dtype for its subtype, which must be a numpy dtype (GH 53160)

  • Bug in arrays.SparseArray.map() allowed the fill value to be included in the sparse values (GH 52095)

ExtensionArray#

  • Bug in ArrowExtensionArray converting pandas non-nanosecond temporal objects from non-zero values to zero values (GH 53171)

  • Bug in Series.quantile() for pyarrow temporal types raising ArrowInvalid (GH 52678)

  • Bug in Series.rank() returning wrong order for small values with Float64 dtype (GH 52471)

  • Bug in __iter__() and __getitem__() returning python datetime and timedelta objects for non-nano dtypes (GH 53326)

  • Bug where the __from_arrow__ method of masked ExtensionDtypes(e.g. Float64Dtype, BooleanDtype) would not accept pyarrow arrays of type pyarrow.null() (GH 52223)

Styler#

  • Bug in Styler._copy() calling overridden methods in subclasses of Styler (GH 52728)

Metadata#

Other#

  • Bug in FloatingArray.__contains__ with NaN item incorrectly returning False when NaN values are present (GH 52840)

  • Bug in api.interchange.from_dataframe() when converting an empty DataFrame object (GH 53155)

  • Bug in assert_almost_equal() now throwing assertion error for two unequal sets (GH 51727)

  • Bug in assert_frame_equal() checks category dtypes even when asked not to check index type (GH 52126)

  • Bug in DataFrame.reindex() with a fill_value that should be inferred with a ExtensionDtype incorrectly inferring object dtype (GH 52586)

  • Bug in Series.map() when giving a callable to an empty series, the returned series had object dtype. It now keeps the original dtype (GH 52384)

  • Bug in Series.memory_usage() when deep=True throw an error with Series of objects and the returned value is incorrect, as it does not take into account GC corrections (GH 51858)

  • Fixed incorrect __name__ attribute of pandas._libs.json (GH 52898)

Contributors#