What’s new in 1.5.1 (October 19, 2022)#

These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.

Behavior of groupby with categorical groupers (GH48645)#

In versions of pandas prior to 1.5, groupby with dropna=False would still drop NA values when the grouper was a categorical dtype. A fix for this was attempted in 1.5, however it introduced a regression where passing observed=False and dropna=False to groupby would result in only observed categories. It was found that the patch fixing the dropna=False bug is incompatible with observed=False, and decided that the best resolution is to restore the correct observed=False behavior at the cost of reintroducing the dropna=False bug.

In [1]: df = pd.DataFrame(
   ...:     {
   ...:         "x": pd.Categorical([1, None], categories=[1, 2, 3]),
   ...:         "y": [3, 4],
   ...:     }
   ...: )
   ...: 

In [2]: df
Out[2]: 
     x  y
0    1  3
1  NaN  4

[2 rows x 2 columns]

1.5.0 behavior:

In [3]: # Correct behavior, NA values are not dropped
        df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
     y
x
1    3
NaN  4


In [4]: # Incorrect behavior, only observed categories present
        df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
     y
x
1    3
NaN  4

1.5.1 behavior:

# Incorrect behavior, NA values are dropped
In [3]: df.groupby("x", observed=True, dropna=False).sum()
Out[3]: 
   y
x   
1  3

[1 rows x 1 columns]

# Correct behavior, unobserved categories present (NA values still dropped)
In [4]: df.groupby("x", observed=False, dropna=False).sum()
Out[4]: 
   y
x   
1  3
2  0
3  0

[3 rows x 1 columns]

Fixed regressions#

  • Fixed Regression in Series.__setitem__() casting None to NaN for object dtype (GH48665)

  • Fixed Regression in DataFrame.loc() when setting values as a DataFrame with all True indexer (GH48701)

  • Regression in read_csv() causing an EmptyDataError when using an UTF-8 file handle that was already read from (GH48646)

  • Regression in to_datetime() when utc=True and arg contained timezone naive and aware arguments raised a ValueError (GH48678)

  • Fixed regression in DataFrame.loc() raising FutureWarning when setting an empty DataFrame (GH48480)

  • Fixed regression in DataFrame.describe() raising TypeError when result contains NA (GH48778)

  • Fixed regression in DataFrame.plot() ignoring invalid colormap for kind="scatter" (GH48726)

  • Fixed regression in MultiIndex.values`() resetting freq attribute of underlying Index object (GH49054)

  • Fixed performance regression in factorize() when na_sentinel is not None and sort=False (GH48620)

  • Fixed regression causing an AttributeError during warning emitted if the provided table name in DataFrame.to_sql() and the table name actually used in the database do not match (GH48733)

  • Fixed regression in to_datetime() when arg was a date string with nanosecond and format contained %f would raise a ValueError (GH48767)

  • Fixed regression in assert_frame_equal() raising for MultiIndex with Categorical and check_like=True (GH48975)

  • Fixed regression in DataFrame.fillna() replacing wrong values for datetime64[ns] dtype and inplace=True (GH48863)

  • Fixed DataFrameGroupBy.size() not returning a Series when axis=1 (GH48738)

  • Fixed Regression in DataFrameGroupBy.apply() when user defined function is called on an empty dataframe (GH47985)

  • Fixed regression in DataFrame.apply() when passing non-zero axis via keyword argument (GH48656)

  • Fixed regression in Series.groupby() and DataFrame.groupby() when the grouper is a nullable data type (e.g. Int64) or a PyArrow-backed string array, contains null values, and dropna=False (GH48794)

  • Fixed performance regression in Series.isin() with mismatching dtypes (GH49162)

  • Fixed regression in DataFrame.to_parquet() raising when file name was specified as bytes (GH48944)

  • Fixed regression in ExcelWriter where the book attribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (GH48780)

  • Fixed regression in DataFrame.corrwith() when computing correlation on tied data with method="spearman" (GH48826)

Bug fixes#

Other#

  • Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (GH48692)

Contributors#

A total of 16 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

  • Amay Patel +

  • Deepak Sirohiwal +

  • Dennis Chukwunta

  • Gaurav Sheni

  • Himanshu Wagh +

  • Lorenzo Vainigli +

  • Marc Garcia

  • Marco Edward Gorelli

  • Matthew Roeschke

  • MeeseeksMachine

  • Noa Tamir

  • Pandas Development Team

  • Patrick Hoefler

  • Richard Shadrach

  • Shantanu

  • Torsten Wörtwein