v0.18.1 (May 3, 2016)¶
This is a minor bug-fix release from 0.18.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
.groupby(...)
has been enhanced to provide convenient syntax when working with.rolling(..)
,.expanding(..)
and.resample(..)
per group, see herepd.to_datetime()
has gained the ability to assemble dates from aDataFrame
, see here- Method chaining improvements, see here.
- Custom business hour offset, see here.
- Many bug fixes in the handling of
sparse
, see here - Expanded the Tutorials section with a feature on modern pandas, courtesy of @TomAugsburger. (GH13045).
What’s new in v0.18.1
New features¶
Custom Business Hour¶
The CustomBusinessHour
is a mixture of BusinessHour
and CustomBusinessDay
which
allows you to specify arbitrary holidays. For details,
see Custom Business Hour (GH11514)
In [1]: from pandas.tseries.offsets import CustomBusinessHour
In [2]: from pandas.tseries.holiday import USFederalHolidayCalendar
In [3]: bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar())
Friday before MLK Day
In [4]: import datetime
In [5]: dt = datetime.datetime(2014, 1, 17, 15)
In [6]: dt + bhour_us
Out[6]: Timestamp('2014-01-17 16:00:00')
Tuesday after MLK Day (Monday is skipped because it’s a holiday)
In [7]: dt + bhour_us * 2
Out[7]: Timestamp('2014-01-20 09:00:00')
.groupby(..)
syntax with window and resample operations¶
.groupby(...)
has been enhanced to provide convenient syntax when working with .rolling(..)
, .expanding(..)
and .resample(..)
per group, see (GH12486, GH12738).
You can now use .rolling(..)
and .expanding(..)
as methods on groupbys. These return another deferred object (similar to what .rolling()
and .expanding()
do on ungrouped pandas objects). You can then operate on these RollingGroupby
objects in a similar manner.
Previously you would have to do this to get a rolling window mean per-group:
In [8]: df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
...: 'B': np.arange(40)})
...:
In [9]: df
Out[9]:
A B
0 1 0
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
.. .. ..
33 3 33
34 3 34
35 3 35
36 3 36
37 3 37
38 3 38
39 3 39
[40 rows x 2 columns]
In [10]: df.groupby('A').apply(lambda x: x.rolling(4).B.mean())
Out[10]:
A
1 0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
5 3.5
6 4.5
...
3 33 NaN
34 NaN
35 33.5
36 34.5
37 35.5
38 36.5
39 37.5
Name: B, Length: 40, dtype: float64
Now you can do:
In [11]: df.groupby('A').rolling(4).B.mean()
Out[11]:
A
1 0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
5 3.5
6 4.5
...
3 33 NaN
34 NaN
35 33.5
36 34.5
37 35.5
38 36.5
39 37.5
Name: B, Length: 40, dtype: float64
For .resample(..)
type of operations, previously you would have to:
In [12]: df = pd.DataFrame({'date': pd.date_range(start='2016-01-01',
....: periods=4,
....: freq='W'),
....: 'group': [1, 1, 2, 2],
....: 'val': [5, 6, 7, 8]}).set_index('date')
....:
In [13]: df
Out[13]:
group val
date
2016-01-03 1 5
2016-01-10 1 6
2016-01-17 2 7
2016-01-24 2 8
[4 rows x 2 columns]
In [14]: df.groupby('group').apply(lambda x: x.resample('1D').ffill())
Out[14]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
... ... ...
2 2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8
[16 rows x 2 columns]
Now you can do:
In [15]: df.groupby('group').resample('1D').ffill()
Out[15]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
... ... ...
2 2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8
[16 rows x 2 columns]
Method chaining improvements¶
The following methods / indexers now accept a callable
. It is intended to make
these more useful in method chains, see the documentation.
(GH11485, GH12533)
.where()
and.mask()
.loc[]
,iloc[]
and.ix[]
[]
indexing
.where()
and .mask()
¶
These can accept a callable for the condition and other
arguments.
In [16]: df = pd.DataFrame({'A': [1, 2, 3],
....: 'B': [4, 5, 6],
....: 'C': [7, 8, 9]})
....:
In [17]: df.where(lambda x: x > 4, lambda x: x + 10)
Out[17]:
A B C
0 11 14 7
1 12 5 8
2 13 6 9
[3 rows x 3 columns]
.loc[]
, .iloc[]
, .ix[]
¶
These can accept a callable, and a tuple of callable as a slicer. The callable can return a valid boolean indexer or anything which is valid for these indexer’s input.
# callable returns bool indexer
In [18]: df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10]
Out[18]:
B C
1 5 8
2 6 9
[2 rows x 2 columns]
# callable returns list of labels
In [19]: df.loc[lambda x: [1, 2], lambda x: ['A', 'B']]