Time Series / Date functionality — pandas 0.7.3 documentation

DateOffset objects¶

A DateOffset instance represents a frequency increment. Different offset logic via subclasses:

Class name	Description
DateOffset	Generic offset class, defaults to 1 calendar day
BDay	business day (weekday)
Week	one week, optionally anchored on a day of the week
MonthEnd	calendar month end
BMonthEnd	business month end
QuarterEnd	calendar quarter end
BQuarterEnd	business quarter end
YearEnd	calendar year end
YearBegin	calendar year begin
BYearEnd	business year end
Hour	one hour
Minute	one minute
Second	one second

The basic DateOffset takes the same arguments as dateutil.relativedelta, which works like:

In [847]: d = datetime(2008, 8, 18)

In [848]: d + relativedelta(months=4, days=5)
Out[848]: datetime.datetime(2008, 12, 23, 0, 0)

We could have done the same thing with DateOffset:

In [849]: from pandas.core.datetools import *

In [850]: d + DateOffset(months=4, days=5)
Out[850]: datetime.datetime(2008, 12, 23, 0, 0)

The key features of a DateOffset object are:

it can be added / subtracted to/from a datetime object to obtain a shifted date

it can be multiplied by an integer (positive or negative) so that the increment will be applied multiple times

it has rollforward and rollback methods for moving a date forward or backward to the next or previous “offset date”

Subclasses of DateOffset define the apply function which dictates custom date increment logic, such as adding business days:

class BDay(DateOffset):
    """DateOffset increments between business days"""
    def apply(self, other):
        ...

In [851]: d - 5 * BDay()
Out[851]: datetime.datetime(2008, 8, 11, 0, 0)

In [852]: d + BMonthEnd()
Out[852]: datetime.datetime(2008, 8, 29, 0, 0)

The rollforward and rollback methods do exactly what you would expect:

In [853]: d
Out[853]: datetime.datetime(2008, 8, 18, 0, 0)

In [854]: offset = BMonthEnd()

In [855]: offset.rollforward(d)
Out[855]: datetime.datetime(2008, 8, 29, 0, 0)

In [856]: offset.rollback(d)
Out[856]: datetime.datetime(2008, 7, 31, 0, 0)

It’s definitely worth exploring the pandas.core.datetools module and the various docstrings for the classes.

Parametric offsets¶

Some of the offsets can be “parameterized” when created to result in different behavior. For example, the Week offset for generating weekly data accepts a weekday parameter which results in the generated dates always lying on a particular day of the week:

In [857]: d + Week()
Out[857]: datetime.datetime(2008, 8, 25, 0, 0)

In [858]: d + Week(weekday=4)
Out[858]: datetime.datetime(2008, 8, 22, 0, 0)

In [859]: (d + Week(weekday=4)).weekday()
Out[859]: 4

Time rules¶

A number of string aliases are given to useful common time series frequencies. We will refer to these aliases as time rules.

Rule name	Description
WEEKDAY	business day frequency
EOM	business month end frequency
W@MON	weekly frequency (mondays)
W@TUE	weekly frequency (tuesdays)
W@WED	weekly frequency (wednesdays)
W@THU	weekly frequency (thursdays)
W@FRI	weekly frequency (fridays)
Q@JAN	quarterly frequency, starting January
Q@FEB	quarterly frequency, starting February
Q@MAR	quarterly frequency, starting March
A@DEC	annual frequency, year end (December)
A@JAN	annual frequency, anchored end of January
A@FEB	annual frequency, anchored end of February
A@MAR	annual frequency, anchored end of March
A@APR	annual frequency, anchored end of April
A@MAY	annual frequency, anchored end of May
A@JUN	annual frequency, anchored end of June
A@JUL	annual frequency, anchored end of July
A@AUG	annual frequency, anchored end of August
A@SEP	annual frequency, anchored end of September
A@OCT	annual frequency, anchored end of October
A@NOV	annual frequency, anchored end of November

These can be used as arguments to DateRange and various other time series-related functions in pandas.

Generating date ranges (DateRange)¶

The DateRange class utilizes these offsets (and any ones that we might add) to generate fixed-frequency date ranges:

In [860]: start = datetime(2009, 1, 1)

In [861]: end = datetime(2010, 1, 1)

In [862]: rng = DateRange(start, end, offset=BDay())

In [863]: rng
Out[863]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2009-01-01 00:00:00, ..., 2010-01-01 00:00:00]
length: 262

In [864]: DateRange(start, end, offset=BMonthEnd())
Out[864]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessMonthEnd>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-12-31 00:00:00]
length: 12

Business day frequency is the default for DateRange. You can also strictly generate a DateRange of a certain length by providing either a start or end date and a periods argument:

In [865]: DateRange(start, periods=20)
Out[865]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2009-01-01 00:00:00, ..., 2009-01-28 00:00:00]
length: 20

In [866]: DateRange(end=end, periods=20)
Out[866]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2009-12-07 00:00:00, ..., 2010-01-01 00:00:00]
length: 20

The start and end dates are strictly inclusive. So it will not generate any dates outside of those dates if specified.

DateRange is a valid Index¶

One of the main uses for DateRange is as an index for pandas objects. When working with a lot of time series data, there are several reasons to use DateRange objects when possible:

A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice)

Fast shifting using the shift method on pandas objects

Unioning of overlapping DateRange objects with the same frequency is very fast (important for fast data alignment)

The DateRange is a valid index and can even be intelligent when doing slicing, etc.

In [867]: rng = DateRange(start, end, offset=BMonthEnd())

In [868]: ts = Series(randn(len(rng)), index=rng)

In [869]: ts.index
Out[869]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessMonthEnd>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-12-31 00:00:00]
length: 12

In [870]: ts[:5].index
Out[870]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessMonthEnd>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-05-29 00:00:00]
length: 5

In [871]: ts[::2].index
Out[871]: 
<class 'pandas.core.daterange.DateRange'>
offset: <2 BusinessMonthEnds>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-11-30 00:00:00]
length: 6

More complicated fancy indexing will result in an Index that is no longer a DateRange, however:

In [872]: ts[[0, 2, 6]].index
Out[872]: Index([2009-01-30 00:00:00, 2009-03-31 00:00:00, 2009-07-31 00:00:00], dtype=object)

Time series-related instance methods¶

Shifting / lagging¶

One may want to shift or lag the values in a TimeSeries back and forward in time. The method for this is shift, which is available on all of the pandas objects. In DataFrame, shift will currently only shift along the index and in Panel along the major_axis.

In [873]: ts = ts[:5]

In [874]: ts.shift(1)
Out[874]: 
2009-01-30         NaN
2009-02-27    0.469112
2009-03-31   -0.282863
2009-04-30   -1.509059
2009-05-29   -1.135632

The shift method accepts an offset argument which can accept a DateOffset class or other timedelta-like object or also a time rule:

In [875]: ts.shift(5, offset=datetools.bday)
Out[875]: 
2009-02-06    0.469112
2009-03-06   -0.282863
2009-04-07   -1.509059
2009-05-07   -1.135632
2009-06-05    1.212112

In [876]: ts.shift(5, offset='EOM')
Out[876]: 
2009-06-30    0.469112
2009-07-31   -0.282863
2009-08-31   -1.509059
2009-09-30   -1.135632
2009-10-30    1.212112

Frequency conversion¶

The primary function for changing frequencies is the asfreq function. This is basically just a thin, but convenient wrapper around reindex which generates a DateRange and calls reindex.

In [877]: dr = DateRange('1/1/2010', periods=3, offset=3 * datetools.bday)

In [878]: ts = Series(randn(3), index=dr)

In [879]: ts
Out[879]: 
2010-01-01    0.721555
2010-01-06   -0.706771
2010-01-11   -1.039575

In [880]: ts.asfreq(BDay())
Out[880]: 
2010-01-01    0.721555
2010-01-04         NaN
2010-01-05         NaN
2010-01-06   -0.706771
2010-01-07         NaN
2010-01-08         NaN
2010-01-11   -1.039575

In [881]: ts.asfreq(BDay(), method='pad')
Out[881]: 
2010-01-01    0.721555
2010-01-04    0.721555
2010-01-05    0.721555
2010-01-06   -0.706771
2010-01-07   -0.706771
2010-01-08   -0.706771
2010-01-11   -1.039575

Filling forward / backward¶

Related to asfreq and reindex is the fillna function documented in the missing data section.

Up- and downsampling¶

We plan to add some efficient methods for doing resampling during frequency conversion. For example, converting secondly data into 5-minutely data. This is extremely common in, but not limited to, financial applications.

Until then, your best bet is a clever (or kludgy, depending on your point of view) application of GroupBy. Carry out the following steps:

Generate the target DateRange of interest

dr1hour = DateRange(start, end, offset=Hour())
dr5day = DateRange(start, end, offset=5 * datetools.day)
dr10day = DateRange(start, end, offset=10 * datetools.day)

Use the asof function (“as of”) of the DateRange to do a groupby expression

grouped = data.groupby(dr5day.asof)
means = grouped.mean()

Here is a fully-worked example:

# some minutely data
In [882]: minutely = DateRange('1/3/2000 00:00:00', '1/3/2000 12:00:00',
   .....:                      offset=datetools.Minute())

In [883]: ts = Series(randn(len(minutely)), index=minutely)

In [884]: ts.index
Out[884]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 Minute>, tzinfo: None
[2000-01-03 00:00:00, ..., 2000-01-03 12:00:00]
length: 721

In [885]: hourly = DateRange('1/3/2000', '1/4/2000', offset=datetools.Hour())

In [886]: grouped = ts.groupby(hourly.asof)

In [887]: grouped.mean()
Out[887]: 
key_0
2000-01-03 00:00:00   -0.119068
2000-01-03 01:00:00    0.020282
2000-01-03 02:00:00    0.102562
2000-01-03 03:00:00   -0.106713
2000-01-03 04:00:00   -0.128935
2000-01-03 05:00:00   -0.146319
2000-01-03 06:00:00   -0.002938
2000-01-03 07:00:00   -0.131361
2000-01-03 08:00:00   -0.005749
2000-01-03 09:00:00   -0.399136
2000-01-03 10:00:00    0.097238
2000-01-03 11:00:00   -0.127307
2000-01-03 12:00:00   -0.273955

Some things to note about this method:

This is rather inefficient because we haven’t exploited the orderedness of the data at all. Calling the asof function on every date in the minutely time series is not strictly necessary. We’ll be writing some significantly more efficient methods in the near future

The dates in the result mark the beginning of the period. Be careful about which convention you use; you don’t want to end up misaligning data because you used the wrong upsampling convention

pandas 0.7.3 documentation

Table Of Contents

Search

Time Series / Date functionality¶

DateOffset objects¶

Parametric offsets¶

Time rules¶

Generating date ranges (DateRange)¶

DateRange is a valid Index¶

Up- and downsampling¶