What’s new in 1.5.1 (October 19, 2022)#
These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.
Behavior of groupby with categorical groupers (GH48645)#
In versions of pandas prior to 1.5, groupby with dropna=False would still drop
NA values when the grouper was a categorical dtype. A fix for this was attempted in
1.5, however it introduced a regression where passing observed=False and
dropna=False to groupby would result in only observed categories. It was found
that the patch fixing the dropna=False bug is incompatible with observed=False,
and decided that the best resolution is to restore the correct observed=False
behavior at the cost of reintroducing the dropna=False bug.
In [1]: df = pd.DataFrame(
   ...:     {
   ...:         "x": pd.Categorical([1, None], categories=[1, 2, 3]),
   ...:         "y": [3, 4],
   ...:     }
   ...: )
   ...: 
In [2]: df
Out[2]: 
     x  y
0    1  3
1  NaN  4
[2 rows x 2 columns]
1.5.0 behavior:
In [3]: # Correct behavior, NA values are not dropped
        df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
     y
x
1    3
NaN  4
In [4]: # Incorrect behavior, only observed categories present
        df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
     y
x
1    3
NaN  4
1.5.1 behavior:
# Incorrect behavior, NA values are dropped
In [3]: df.groupby("x", observed=True, dropna=False).sum()
Out[3]: 
   y
x   
1  3
[1 rows x 1 columns]
# Correct behavior, unobserved categories present (NA values still dropped)
In [4]: df.groupby("x", observed=False, dropna=False).sum()
Out[4]: 
   y
x   
1  3
2  0
3  0
[3 rows x 1 columns]
Fixed regressions#
- Fixed Regression in - Series.__setitem__()casting- Noneto- NaNfor object dtype (GH48665)
- Fixed Regression in - DataFrame.loc()when setting values as a- DataFramewith all- Trueindexer (GH48701)
- Regression in - read_csv()causing an- EmptyDataErrorwhen using an UTF-8 file handle that was already read from (GH48646)
- Regression in - to_datetime()when- utc=Trueand- argcontained timezone naive and aware arguments raised a- ValueError(GH48678)
- Fixed regression in - DataFrame.loc()raising- FutureWarningwhen setting an empty- DataFrame(GH48480)
- Fixed regression in - DataFrame.describe()raising- TypeErrorwhen result contains- NA(GH48778)
- Fixed regression in - DataFrame.plot()ignoring invalid- colormapfor- kind="scatter"(GH48726)
- Fixed regression in - MultiIndex.values`()resetting- freqattribute of underlying- Indexobject (GH49054)
- Fixed performance regression in - factorize()when- na_sentinelis not- Noneand- sort=False(GH48620)
- Fixed regression causing an - AttributeErrorduring warning emitted if the provided table name in- DataFrame.to_sql()and the table name actually used in the database do not match (GH48733)
- Fixed regression in - to_datetime()when- argwas a date string with nanosecond and- formatcontained- %fwould raise a- ValueError(GH48767)
- Fixed regression in - assert_frame_equal()raising for- MultiIndexwith- Categoricaland- check_like=True(GH48975)
- Fixed regression in - DataFrame.fillna()replacing wrong values for- datetime64[ns]dtype and- inplace=True(GH48863)
- Fixed - DataFrameGroupBy.size()not returning a Series when- axis=1(GH48738)
- Fixed Regression in - DataFrameGroupBy.apply()when user defined function is called on an empty dataframe (GH47985)
- Fixed regression in - DataFrame.apply()when passing non-zero- axisvia keyword argument (GH48656)
- Fixed regression in - Series.groupby()and- DataFrame.groupby()when the grouper is a nullable data type (e.g.- Int64) or a PyArrow-backed string array, contains null values, and- dropna=False(GH48794)
- Fixed performance regression in - Series.isin()with mismatching dtypes (GH49162)
- Fixed regression in - DataFrame.to_parquet()raising when file name was specified as- bytes(GH48944)
- Fixed regression in - ExcelWriterwhere the- bookattribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (GH48780)
- Fixed regression in - DataFrame.corrwith()when computing correlation on tied data with- method="spearman"(GH48826)
Bug fixes#
- Bug in - Series.__getitem__()not falling back to positional for integer keys and boolean- Index(GH48653)
- Bug in - DataFrame.to_hdf()raising- AssertionErrorwith boolean index (GH48667)
- Bug in - assert_index_equal()for extension arrays with non matching- NAraising- ValueError(GH48608)
- Bug in - DataFrame.pivot_table()raising unexpected- FutureWarningwhen setting datetime column as index (GH48683)
- Bug in - DataFrame.sort_values()emitting unnecessary- FutureWarningwhen called on- DataFramewith boolean sparse columns (GH48784)
- Bug in - arrays.ArrowExtensionArraywith a comparison operator to an invalid object would not raise a- NotImplementedError(GH48833)
Other#
- Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (GH48692) 
Contributors#
A total of 16 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Amay Patel + 
- Deepak Sirohiwal + 
- Dennis Chukwunta 
- Gaurav Sheni 
- Himanshu Wagh + 
- Lorenzo Vainigli + 
- Marc Garcia 
- Marco Edward Gorelli 
- Matthew Roeschke 
- MeeseeksMachine 
- Noa Tamir 
- Pandas Development Team 
- Patrick Hoefler 
- Richard Shadrach 
- Shantanu 
- Torsten Wörtwein