What’s new in 1.5.1 (October 19, 2022)#
These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.
Behavior of groupby with categorical groupers (GH 48645)#
In versions of pandas prior to 1.5, groupby with dropna=False would still drop
NA values when the grouper was a categorical dtype. A fix for this was attempted in
1.5, however it introduced a regression where passing observed=False and
dropna=False to groupby would result in only observed categories. It was found
that the patch fixing the dropna=False bug is incompatible with observed=False,
and decided that the best resolution is to restore the correct observed=False
behavior at the cost of reintroducing the dropna=False bug.
In [1]: df = pd.DataFrame(
...: {
...: "x": pd.Categorical([1, None], categories=[1, 2, 3]),
...: "y": [3, 4],
...: }
...: )
...:
In [2]: df
Out[2]:
x y
0 1 3
1 NaN 4
1.5.0 behavior:
In [3]: # Correct behavior, NA values are not dropped
df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
y
x
1 3
NaN 4
In [4]: # Incorrect behavior, only observed categories present
df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
y
x
1 3
NaN 4
1.5.1 behavior:
# Incorrect behavior, NA values are dropped
In [3]: df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
y
x
1 3
NaN 4
# Correct behavior, unobserved categories present (NA values still dropped)
In [4]: df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
y
x
1 3
2 0
3 0
NaN 4
Fixed regressions#
Fixed Regression in
Series.__setitem__()castingNonetoNaNfor object dtype (GH 48665)Fixed Regression in
DataFrame.loc()when setting values as aDataFramewith allTrueindexer (GH 48701)Regression in
read_csv()causing anEmptyDataErrorwhen using an UTF-8 file handle that was already read from (GH 48646)Regression in
to_datetime()whenutc=Trueandargcontained timezone naive and aware arguments raised aValueError(GH 48678)Fixed regression in
DataFrame.loc()raisingFutureWarningwhen setting an emptyDataFrame(GH 48480)Fixed regression in
DataFrame.describe()raisingTypeErrorwhen result containsNA(GH 48778)Fixed regression in
DataFrame.plot()ignoring invalidcolormapforkind="scatter"(GH 48726)Fixed regression in
MultiIndex.values()resettingfreqattribute of underlyingIndexobject (GH 49054)Fixed performance regression in
factorize()whenna_sentinelis notNoneandsort=False(GH 48620)Fixed regression causing an
AttributeErrorduring warning emitted if the provided table name inDataFrame.to_sql()and the table name actually used in the database do not match (GH 48733)Fixed regression in
to_datetime()whenargwas a date string with nanosecond andformatcontained%fwould raise aValueError(GH 48767)Fixed regression in
testing.assert_frame_equal()raising forMultiIndexwithCategoricalandcheck_like=True(GH 48975)Fixed regression in
DataFrame.fillna()replacing wrong values fordatetime64[ns]dtype andinplace=True(GH 48863)Fixed
DataFrameGroupBy.size()not returning a Series whenaxis=1(GH 48738)Fixed Regression in
DataFrameGroupBy.apply()when user defined function is called on an empty dataframe (GH 47985)Fixed regression in
DataFrame.apply()when passing non-zeroaxisvia keyword argument (GH 48656)Fixed regression in
Series.groupby()andDataFrame.groupby()when the grouper is a nullable data type (e.g.Int64) or a PyArrow-backed string array, contains null values, anddropna=False(GH 48794)Fixed performance regression in
Series.isin()with mismatching dtypes (GH 49162)Fixed regression in
DataFrame.to_parquet()raising when file name was specified asbytes(GH 48944)Fixed regression in
ExcelWriterwhere thebookattribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (GH 48780)Fixed regression in
DataFrame.corrwith()when computing correlation on tied data withmethod="spearman"(GH 48826)
Bug fixes#
Bug in
Series.__getitem__()not falling back to positional for integer keys and booleanIndex(GH 48653)Bug in
DataFrame.to_hdf()raisingAssertionErrorwith boolean index (GH 48667)Bug in
testing.assert_index_equal()for extension arrays with non matchingNAraisingValueError(GH 48608)Bug in
DataFrame.pivot_table()raising unexpectedFutureWarningwhen setting datetime column as index (GH 48683)Bug in
DataFrame.sort_values()emitting unnecessaryFutureWarningwhen called onDataFramewith boolean sparse columns (GH 48784)Bug in
arrays.ArrowExtensionArraywith a comparison operator to an invalid object would not raise aNotImplementedError(GH 48833)
Other#
Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (GH 48692)
Contributors#
A total of 16 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Amay Patel +
Deepak Sirohiwal +
Dennis Chukwunta
Gaurav Sheni
Himanshu Wagh +
Lorenzo Vainigli +
Marc Garcia
Marco Edward Gorelli
Matthew Roeschke
MeeseeksMachine
Noa Tamir
Pandas Development Team
Patrick Hoefler
Richard Shadrach
Shantanu
Torsten Wörtwein