pandas 0.7.3 documentation

Time Series / Date functionality

pandas has proven very successful as a tool for working with time series data, especially in the financial data analysis space. Over the coming year we will be looking to consolidate the various Python libraries for time series data, e.g. scikits.timeseries, using the new NumPy datetime64 dtype, to create a very nice integrated solution. Everything in pandas at the moment is based on using Python datetime objects.

In working with time series data, we will frequently seek to:

  • generate sequences of fixed-frequency dates
  • conform or convert time series to a particular frequency
  • compute “relative” dates based on various non-standard time increments (e.g. 5 business days before the last business day of the year), or “roll” dates forward or backward

pandas provides a relatively compact and self-contained set of tools for performing the above tasks.

Note

This area of pandas has gotten less development attention recently, though this should change in the near future.

DateOffset objects

A DateOffset instance represents a frequency increment. Different offset logic via subclasses:

Class name Description
DateOffset Generic offset class, defaults to 1 calendar day
BDay business day (weekday)
Week one week, optionally anchored on a day of the week
MonthEnd calendar month end
BMonthEnd business month end
QuarterEnd calendar quarter end
BQuarterEnd business quarter end
YearEnd calendar year end
YearBegin calendar year begin
BYearEnd business year end
Hour one hour
Minute one minute
Second one second

The basic DateOffset takes the same arguments as dateutil.relativedelta, which works like:

In [847]: d = datetime(2008, 8, 18)

In [848]: d + relativedelta(months=4, days=5)
Out[848]: datetime.datetime(2008, 12, 23, 0, 0)

We could have done the same thing with DateOffset:

In [849]: from pandas.core.datetools import *

In [850]: d + DateOffset(months=4, days=5)
Out[850]: datetime.datetime(2008, 12, 23, 0, 0)

The key features of a DateOffset object are:

  • it can be added / subtracted to/from a datetime object to obtain a shifted date
  • it can be multiplied by an integer (positive or negative) so that the increment will be applied multiple times
  • it has rollforward and rollback methods for moving a date forward or backward to the next or previous “offset date”

Subclasses of DateOffset define the apply function which dictates custom date increment logic, such as adding business days:

class BDay(DateOffset):
    """DateOffset increments between business days"""
    def apply(self, other):
        ...
In [851]: d - 5 * BDay()
Out[851]: datetime.datetime(2008, 8, 11, 0, 0)

In [852]: d + BMonthEnd()
Out[852]: datetime.datetime(2008, 8, 29, 0, 0)

The rollforward and rollback methods do exactly what you would expect:

In [853]: d
Out[853]: datetime.datetime(2008, 8, 18, 0, 0)

In [854]: offset = BMonthEnd()

In [855]: offset.rollforward(d)
Out[855]: datetime.datetime(2008, 8, 29, 0, 0)

In [856]: offset.rollback(d)
Out[856]: datetime.datetime(2008, 7, 31, 0, 0)

It’s definitely worth exploring the pandas.core.datetools module and the various docstrings for the classes.

Parametric offsets

Some of the offsets can be “parameterized” when created to result in different behavior. For example, the Week offset for generating weekly data accepts a weekday parameter which results in the generated dates always lying on a particular day of the week:

In [857]: d + Week()
Out[857]: datetime.datetime(2008, 8, 25, 0, 0)

In [858]: d + Week(weekday=4)
Out[858]: datetime.datetime(2008, 8, 22, 0, 0)

In [859]: (d + Week(weekday=4)).weekday()
Out[859]: 4

Time rules

A number of string aliases are given to useful common time series frequencies. We will refer to these aliases as time rules.

Rule name Description
WEEKDAY business day frequency
EOM business month end frequency
W@MON weekly frequency (mondays)
W@TUE weekly frequency (tuesdays)
W@WED weekly frequency (wednesdays)
W@THU weekly frequency (thursdays)
W@FRI weekly frequency (fridays)
Q@JAN quarterly frequency, starting January
Q@FEB quarterly frequency, starting February
Q@MAR quarterly frequency, starting March
A@DEC annual frequency, year end (December)
A@JAN annual frequency, anchored end of January
A@FEB annual frequency, anchored end of February
A@MAR annual frequency, anchored end of March
A@APR annual frequency, anchored end of April
A@MAY annual frequency, anchored end of May
A@JUN annual frequency, anchored end of June
A@JUL annual frequency, anchored end of July
A@AUG annual frequency, anchored end of August
A@SEP annual frequency, anchored end of September
A@OCT annual frequency, anchored end of October
A@NOV annual frequency, anchored end of November

These can be used as arguments to DateRange and various other time series-related functions in pandas.

Generating date ranges (DateRange)

The DateRange class utilizes these offsets (and any ones that we might add) to generate fixed-frequency date ranges:

In [860]: start = datetime(2009, 1, 1)

In [861]: end = datetime(2010, 1, 1)

In [862]: rng = DateRange(start, end, offset=BDay())

In [863]: rng
Out[863]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2009-01-01 00:00:00, ..., 2010-01-01 00:00:00]
length: 262

In [864]: DateRange(start, end, offset=BMonthEnd())
Out[864]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessMonthEnd>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-12-31 00:00:00]
length: 12

Business day frequency is the default for DateRange. You can also strictly generate a DateRange of a certain length by providing either a start or end date and a periods argument:

In [865]: DateRange(start, periods=20)
Out[865]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2009-01-01 00:00:00, ..., 2009-01-28 00:00:00]
length: 20

In [866]: DateRange(end=end, periods=20)
Out[866]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[2009-12-07 00:00:00, ..., 2010-01-01 00:00:00]
length: 20

The start and end dates are strictly inclusive. So it will not generate any dates outside of those dates if specified.

DateRange is a valid Index

One of the main uses for DateRange is as an index for pandas objects. When working with a lot of time series data, there are several reasons to use DateRange objects when possible:

  • A large range of dates for various offsets are pre-computed and cached under the hood in order to make generating subsequent date ranges very fast (just have to grab a slice)
  • Fast shifting using the shift method on pandas objects
  • Unioning of overlapping DateRange objects with the same frequency is very fast (important for fast data alignment)

The DateRange is a valid index and can even be intelligent when doing slicing, etc.

In [867]: rng = DateRange(start, end, offset=BMonthEnd())

In [868]: ts = Series(randn(len(rng)), index=rng)

In [869]: ts.index
Out[869]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessMonthEnd>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-12-31 00:00:00]
length: 12

In [870]: ts[:5].index
Out[870]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessMonthEnd>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-05-29 00:00:00]
length: 5

In [871]: ts[::2].index
Out[871]: 
<class 'pandas.core.daterange.DateRange'>
offset: <2 BusinessMonthEnds>, tzinfo: None
[2009-01-30 00:00:00, ..., 2009-11-30 00:00:00]
length: 6

More complicated fancy indexing will result in an Index that is no longer a DateRange, however:

In [872]: ts[[0, 2, 6]].index
Out[872]: Index([2009-01-30 00:00:00, 2009-03-31 00:00:00, 2009-07-31 00:00:00], dtype=object)

Up- and downsampling

We plan to add some efficient methods for doing resampling during frequency conversion. For example, converting secondly data into 5-minutely data. This is extremely common in, but not limited to, financial applications.

Until then, your best bet is a clever (or kludgy, depending on your point of view) application of GroupBy. Carry out the following steps:

  1. Generate the target DateRange of interest
dr1hour = DateRange(start, end, offset=Hour())
dr5day = DateRange(start, end, offset=5 * datetools.day)
dr10day = DateRange(start, end, offset=10 * datetools.day)
  1. Use the asof function (“as of”) of the DateRange to do a groupby expression
grouped = data.groupby(dr5day.asof)
means = grouped.mean()

Here is a fully-worked example:

# some minutely data
In [882]: minutely = DateRange('1/3/2000 00:00:00', '1/3/2000 12:00:00',
   .....:                      offset=datetools.Minute())

In [883]: ts = Series(randn(len(minutely)), index=minutely)

In [884]: ts.index
Out[884]: 
<class 'pandas.core.daterange.DateRange'>
offset: <1 Minute>, tzinfo: None
[2000-01-03 00:00:00, ..., 2000-01-03 12:00:00]
length: 721

In [885]: hourly = DateRange('1/3/2000', '1/4/2000', offset=datetools.Hour())

In [886]: grouped = ts.groupby(hourly.asof)

In [887]: grouped.mean()
Out[887]: 
key_0
2000-01-03 00:00:00   -0.119068
2000-01-03 01:00:00    0.020282
2000-01-03 02:00:00    0.102562
2000-01-03 03:00:00   -0.106713
2000-01-03 04:00:00   -0.128935
2000-01-03 05:00:00   -0.146319
2000-01-03 06:00:00   -0.002938
2000-01-03 07:00:00   -0.131361
2000-01-03 08:00:00   -0.005749
2000-01-03 09:00:00   -0.399136
2000-01-03 10:00:00    0.097238
2000-01-03 11:00:00   -0.127307
2000-01-03 12:00:00   -0.273955

Some things to note about this method:

  • This is rather inefficient because we haven’t exploited the orderedness of the data at all. Calling the asof function on every date in the minutely time series is not strictly necessary. We’ll be writing some significantly more efficient methods in the near future
  • The dates in the result mark the beginning of the period. Be careful about which convention you use; you don’t want to end up misaligning data because you used the wrong upsampling convention