Version 0.22.0 (December 29, 2017)#
This is a major release from 0.21.1 and includes a single, API-breaking change. We recommend that all users upgrade to this version after carefully reading the release note (singular!).
Backwards incompatible API changes#
pandas 0.22.0 changes the handling of empty and all-NA sums and products. The summary is that
The sum of an empty or all-NA
Series
is now0
The product of an empty or all-NA
Series
is now1
We’ve added a
min_count
parameter to.sum()
and.prod()
controlling the minimum number of valid values for the result to be valid. If fewer thanmin_count
non-NA values are present, the result is NA. The default is0
. To returnNaN
, the 0.21 behavior, usemin_count=1
.
Some background: In pandas 0.21, we fixed a long-standing inconsistency
in the return value of all-NA series depending on whether or not bottleneck
was installed. See Sum/prod of all-NaN or empty Series/DataFrames is now consistently NaN. At the same
time, we changed the sum and prod of an empty Series
to also be NaN
.
Based on feedback, we’ve partially reverted those changes.
Arithmetic operations#
The default sum for empty or all-NA Series
is now 0
.
pandas 0.21.x
In [1]: pd.Series([]).sum()
Out[1]: nan
In [2]: pd.Series([np.nan]).sum()
Out[2]: nan
pandas 0.22.0
In [1]: pd.Series([]).sum()
Out[1]: 0
In [2]: pd.Series([np.nan]).sum()
Out[2]: 0.0
The default behavior is the same as pandas 0.20.3 with bottleneck installed. It
also matches the behavior of NumPy’s np.nansum
on empty and all-NA arrays.
To have the sum of an empty series return NaN
(the default behavior of
pandas 0.20.3 without bottleneck, or pandas 0.21.x), use the min_count
keyword.
In [3]: pd.Series([]).sum(min_count=1)
Out[3]: nan
Thanks to the skipna
parameter, the .sum
on an all-NA
series is conceptually the same as the .sum
of an empty one with
skipna=True
(the default).
In [4]: pd.Series([np.nan]).sum(min_count=1) # skipna=True by default
Out[4]: nan
The min_count
parameter refers to the minimum number of non-null values
required for a non-NA sum or product.
Series.prod()
has been updated to behave the same as Series.sum()
,
returning 1
instead.
In [5]: pd.Series([]).prod()
Out[5]: 1
In [6]: pd.Series([np.nan]).prod()
Out[6]: 1.0
In [7]: pd.Series([]).prod(min_count=1)
Out[7]: nan
These changes affect DataFrame.sum()
and DataFrame.prod()
as well.
Finally, a few less obvious places in pandas are affected by this change.
Grouping by a Categorical#
Grouping by a Categorical
and summing now returns 0
instead of
NaN
for categories with no observations. The product now returns 1
instead of NaN
.
pandas 0.21.x
In [8]: grouper = pd.Categorical(['a', 'a'], categories=['a', 'b'])
In [9]: pd.Series([1, 2]).groupby(grouper).sum()
Out[9]:
a 3.0
b NaN
dtype: float64
pandas 0.22
In [8]: grouper = pd.Categorical(["a", "a"], categories=["a", "b"])
In [9]: pd.Series([1, 2]).groupby(grouper).sum()
Out[9]:
a 3
b 0
Length: 2, dtype: int64
To restore the 0.21 behavior of returning NaN
for unobserved groups,
use min_count>=1
.
In [10]: pd.Series([1, 2]).groupby(grouper).sum(min_count=1)
Out[10]:
a 3.0
b NaN
Length: 2, dtype: float64
Resample#
The sum and product of all-NA bins has changed from NaN
to 0
for
sum and 1
for product.
pandas 0.21.x
In [11]: s = pd.Series([1, 1, np.nan, np.nan],
....: index=pd.date_range('2017', periods=4))
....: s
Out[11]:
2017-01-01 1.0
2017-01-02 1.0
2017-01-03 NaN
2017-01-04 NaN
Freq: D, dtype: float64
In [12]: s.resample('2d').sum()
Out[12]:
2017-01-01 2.0
2017-01-03 NaN
Freq: 2D, dtype: float64
pandas 0.22.0
In [11]: s = pd.Series([1, 1, np.nan, np.nan], index=pd.date_range("2017", periods=4))
In [12]: s.resample("2d").sum()
Out[12]:
2017-01-01 2.0
2017-01-03 0.0
Freq: 2D, Length: 2, dtype: float64
To restore the 0.21 behavior of returning NaN
, use min_count>=1
.
In [13]: s.resample("2d").sum(min_count=1)
Out[13]:
2017-01-01 2.0
2017-01-03 NaN
Freq: 2D, Length: 2, dtype: float64
In particular, upsampling and taking the sum or product is affected, as upsampling introduces missing values even if the original series was entirely valid.
pandas 0.21.x
In [14]: idx = pd.DatetimeIndex(['2017-01-01', '2017-01-02'])
In [15]: pd.Series([1, 2], index=idx).resample('12H').sum()
Out[15]:
2017-01-01 00:00:00 1.0
2017-01-01 12:00:00 NaN
2017-01-02 00:00:00 2.0
Freq: 12H, dtype: float64
pandas 0.22.0
In [14]: idx = pd.DatetimeIndex(["2017-01-01", "2017-01-02"])
In [15]: pd.Series([1, 2], index=idx).resample("12H").sum()
Out[15]:
2017-01-01 00:00:00 1
2017-01-01 12:00:00 0
2017-01-02 00:00:00 2
Freq: 12H, Length: 3, dtype: int64
Once again, the min_count
keyword is available to restore the 0.21 behavior.
In [16]: pd.Series([1, 2], index=idx).resample("12H").sum(min_count=1)
Out[16]:
2017-01-01 00:00:00 1.0
2017-01-01 12:00:00 NaN
2017-01-02 00:00:00 2.0
Freq: 12H, Length: 3, dtype: float64
Rolling and expanding#
Rolling and expanding already have a min_periods
keyword that behaves
similar to min_count
. The only case that changes is when doing a rolling
or expanding sum with min_periods=0
. Previously this returned NaN
,
when fewer than min_periods
non-NA values were in the window. Now it
returns 0
.
pandas 0.21.1
In [17]: s = pd.Series([np.nan, np.nan])
In [18]: s.rolling(2, min_periods=0).sum()
Out[18]:
0 NaN
1 NaN
dtype: float64
pandas 0.22.0
In [17]: s = pd.Series([np.nan, np.nan])
In [18]: s.rolling(2, min_periods=0).sum()
Out[18]:
0 0.0
1 0.0
Length: 2, dtype: float64
The default behavior of min_periods=None
, implying that min_periods
equals the window size, is unchanged.
Compatibility#
If you maintain a library that should work across pandas versions, it
may be easiest to exclude pandas 0.21 from your requirements. Otherwise, all your
sum()
calls would need to check if the Series
is empty before summing.
With setuptools, in your setup.py
use:
install_requires=['pandas!=0.21.*', ...]
With conda, use
requirements:
run:
- pandas !=0.21.0,!=0.21.1
Note that the inconsistency in the return value for all-NA series is still there for pandas 0.20.3 and earlier. Avoiding pandas 0.21 will only help with the empty case.
Contributors#
A total of 1 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Tom Augspurger