What’s new in 1.5.1 (October 19, 2022)#
These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.
Behavior of groupby
with categorical groupers (GH48645)#
In versions of pandas prior to 1.5, groupby
with dropna=False
would still drop
NA values when the grouper was a categorical dtype. A fix for this was attempted in
1.5, however it introduced a regression where passing observed=False
and
dropna=False
to groupby
would result in only observed categories. It was found
that the patch fixing the dropna=False
bug is incompatible with observed=False
,
and decided that the best resolution is to restore the correct observed=False
behavior at the cost of reintroducing the dropna=False
bug.
In [1]: df = pd.DataFrame(
...: {
...: "x": pd.Categorical([1, None], categories=[1, 2, 3]),
...: "y": [3, 4],
...: }
...: )
...:
In [2]: df
Out[2]:
x y
0 1 3
1 NaN 4
[2 rows x 2 columns]
1.5.0 behavior:
In [3]: # Correct behavior, NA values are not dropped
df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
y
x
1 3
NaN 4
In [4]: # Incorrect behavior, only observed categories present
df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
y
x
1 3
NaN 4
1.5.1 behavior:
# Incorrect behavior, NA values are dropped
In [3]: df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
y
x
1 3
[1 rows x 1 columns]
# Correct behavior, unobserved categories present (NA values still dropped)
In [4]: df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
y
x
1 3
2 0
3 0
[3 rows x 1 columns]
Fixed regressions#
Fixed Regression in
Series.__setitem__()
castingNone
toNaN
for object dtype (GH48665)Fixed Regression in
DataFrame.loc()
when setting values as aDataFrame
with allTrue
indexer (GH48701)Regression in
read_csv()
causing anEmptyDataError
when using an UTF-8 file handle that was already read from (GH48646)Regression in
to_datetime()
whenutc=True
andarg
contained timezone naive and aware arguments raised aValueError
(GH48678)Fixed regression in
DataFrame.loc()
raisingFutureWarning
when setting an emptyDataFrame
(GH48480)Fixed regression in
DataFrame.describe()
raisingTypeError
when result containsNA
(GH48778)Fixed regression in
DataFrame.plot()
ignoring invalidcolormap
forkind="scatter"
(GH48726)Fixed regression in
MultiIndex.values()
resettingfreq
attribute of underlyingIndex
object (GH49054)Fixed performance regression in
factorize()
whenna_sentinel
is notNone
andsort=False
(GH48620)Fixed regression causing an
AttributeError
during warning emitted if the provided table name inDataFrame.to_sql()
and the table name actually used in the database do not match (GH48733)Fixed regression in
to_datetime()
whenarg
was a date string with nanosecond andformat
contained%f
would raise aValueError
(GH48767)Fixed regression in
testing.assert_frame_equal()
raising forMultiIndex
withCategorical
andcheck_like=True
(GH48975)Fixed regression in
DataFrame.fillna()
replacing wrong values fordatetime64[ns]
dtype andinplace=True
(GH48863)Fixed
DataFrameGroupBy.size()
not returning a Series whenaxis=1
(GH48738)Fixed Regression in
DataFrameGroupBy.apply()
when user defined function is called on an empty dataframe (GH47985)Fixed regression in
DataFrame.apply()
when passing non-zeroaxis
via keyword argument (GH48656)Fixed regression in
Series.groupby()
andDataFrame.groupby()
when the grouper is a nullable data type (e.g.Int64
) or a PyArrow-backed string array, contains null values, anddropna=False
(GH48794)Fixed performance regression in
Series.isin()
with mismatching dtypes (GH49162)Fixed regression in
DataFrame.to_parquet()
raising when file name was specified asbytes
(GH48944)Fixed regression in
ExcelWriter
where thebook
attribute could no longer be set; however setting this attribute is now deprecated and this ability will be removed in a future version of pandas (GH48780)Fixed regression in
DataFrame.corrwith()
when computing correlation on tied data withmethod="spearman"
(GH48826)
Bug fixes#
Bug in
Series.__getitem__()
not falling back to positional for integer keys and booleanIndex
(GH48653)Bug in
DataFrame.to_hdf()
raisingAssertionError
with boolean index (GH48667)Bug in
testing.assert_index_equal()
for extension arrays with non matchingNA
raisingValueError
(GH48608)Bug in
DataFrame.pivot_table()
raising unexpectedFutureWarning
when setting datetime column as index (GH48683)Bug in
DataFrame.sort_values()
emitting unnecessaryFutureWarning
when called onDataFrame
with boolean sparse columns (GH48784)Bug in
arrays.ArrowExtensionArray
with a comparison operator to an invalid object would not raise aNotImplementedError
(GH48833)
Other#
Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (GH48692)
Contributors#
A total of 16 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Amay Patel +
Deepak Sirohiwal +
Dennis Chukwunta
Gaurav Sheni
Himanshu Wagh +
Lorenzo Vainigli +
Marc Garcia
Marco Edward Gorelli
Matthew Roeschke
MeeseeksMachine
Noa Tamir
Pandas Development Team
Patrick Hoefler
Richard Shadrach
Shantanu
Torsten Wörtwein