What’s new in 1.5.1 (October 19, 2022)#

These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.

Behavior of `groupby` with categorical groupers (GH48645)#

In versions of pandas prior to 1.5, groupby with dropna=False would still drop NA values when the grouper was a categorical dtype. A fix for this was attempted in 1.5, however it introduced a regression where passing observed=False and dropna=False to groupby would result in only observed categories. It was found that the patch fixing the dropna=False bug is incompatible with observed=False, and decided that the best resolution is to restore the correct observed=False behavior at the cost of reintroducing the dropna=False bug.

>>>In [1]: df = pd.DataFrame(
   ...:     {
   ...:         "x": pd.Categorical([1, None], categories=[1, 2, 3]),
   ...:         "y": [3, 4],
   ...:     }
   ...: )
   ...: 

In [2]: df
Out[2]: 
     x  y
0    1  3
1  NaN  4

1.5.0 behavior:

>>>In [3]: # Correct behavior, NA values are not dropped
        df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
     y
x
1    3
NaN  4


In [4]: # Incorrect behavior, only observed categories present
        df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
     y
x
1    3
NaN  4

1.5.1 behavior:

>>># Incorrect behavior, NA values are dropped
In [3]: df.groupby("x", observed=True, dropna=False).sum()
Out[3]: 
   y
x   
1  3

# Correct behavior, unobserved categories present (NA values still dropped)
In [4]: df.groupby("x", observed=False, dropna=False).sum()
Out[4]: 
   y
x   
1  3
2  0
3  0

Fixed regressions#

Fixed Regression in Series.__setitem__() casting None to NaN for object dtype (GH48665)
Fixed Regression in DataFrame.loc() when setting values as a DataFrame with all True indexer (GH48701)
Regression in read_csv() causing an EmptyDataError when using an UTF-8 file handle that was already read from (GH48646)
Regression in to_datetime() when utc=True and arg contained timezone naive and aware arguments raised a ValueError (GH48678)
Fixed regression in DataFrame.loc() raising FutureWarning when setting an empty DataFrame (GH48480)
Fixed regression in DataFrame.describe() raising TypeError when result contains NA (GH48778)
Fixed regression in DataFrame.plot() ignoring invalid colormap for kind="scatter" (GH48726)
Fixed regression in MultiIndex.values`() resetting freq attribute of underlying Index object (GH49054)
Fixed performance regression in factorize() when na_sentinel is not None and sort=False (GH48620)
Fixed regression causing an AttributeError during warning emitted if the provided table name in DataFrame.to_sql() and the table name actually used in the database do not match (GH48733)
Fixed regression in to_datetime() when arg was a date string with nanosecond and format contained %f would raise a ValueError (GH48767)
Fixed regression in assert_frame_equal() raising for MultiIndex with Categorical and check_like=True (GH48975)
Fixed regression in DataFrame.fillna() replacing wrong values for datetime64[ns] dtype and inplace=True (GH48863)
Fixed DataFrameGroupBy.size() not returning a Series when axis=1 (GH48738)
Fixed Regression in DataFrameGroupBy.apply() when user defined function is called on an empty dataframe (GH47985)
Fixed regression in DataFrame.apply() when passing non-zero axis via keyword argument (GH48656)
Fixed regression in Series.groupby() and DataFrame.groupby() when the grouper is a nullable data type (e.g. Int64) or a PyArrow-backed string array, contains null values, and dropna=False (GH48794)
Fixed performance regression in Series.isin() with mismatching dtypes (GH49162)
Fixed regression in DataFrame.to_parquet() raising when file name was specified as bytes (GH48944)
Fixed regression in ExcelWriter where the book attribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (GH48780)
Fixed regression in DataFrame.corrwith() when computing correlation on tied data with method="spearman" (GH48826)

Bug fixes#

Bug in Series.__getitem__() not falling back to positional for integer keys and boolean Index (GH48653)
Bug in DataFrame.to_hdf() raising AssertionError with boolean index (GH48667)
Bug in assert_index_equal() for extension arrays with non matching NA raising ValueError (GH48608)
Bug in DataFrame.pivot_table() raising unexpected FutureWarning when setting datetime column as index (GH48683)
Bug in DataFrame.sort_values() emitting unnecessary FutureWarning when called on DataFrame with boolean sparse columns (GH48784)
Bug in arrays.ArrowExtensionArray with a comparison operator to an invalid object would not raise a NotImplementedError (GH48833)

Other#

Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (GH48692)

Contributors#

A total of 16 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

Amay Patel +
Deepak Sirohiwal +
Dennis Chukwunta
Gaurav Sheni
Himanshu Wagh +
Lorenzo Vainigli +
Marc Garcia
Marco Edward Gorelli
Matthew Roeschke
MeeseeksMachine
Noa Tamir
Pandas Development Team
Patrick Hoefler
Richard Shadrach
Shantanu
Torsten Wörtwein