Cookbook¶

In [1]: df = pd.DataFrame(np.random.randn(6,1), index=pd.date_range('2013-08-01', periods=6, freq='B'), columns=list('A'))

In [2]: df.ix[3,'A'] = np.nan

In [3]: df

                   A
2013-08-01  0.469112
2013-08-02 -0.282863
2013-08-05 -1.509059
2013-08-06       NaN
2013-08-07  1.212112
2013-08-08 -0.173215
[6 rows x 1 columns]

In [4]: df.reindex(df.index[::-1]).ffill()

                   A
2013-08-08 -0.173215
2013-08-07  1.212112
2013-08-06  1.212112
2013-08-05 -1.509059
2013-08-02 -0.282863
2013-08-01  0.469112
[6 rows x 1 columns]

cumsum reset at NaN values

Replace¶

Using replace with backrefs

Timeseries¶

Between times

Using indexer between time

Vectorized Lookup

Turn a matrix with hours in columns and days in rows into a continous row sequence in the form of a time series. How to rearrange a python pandas dataframe?

Resampling¶

The Resample docs.

TimeGrouping of values grouped across time

TimeGrouping #2

Using TimeGrouper and another grouping to create subgroups, then apply a custom function

Resampling with custom periods

Resample intraday frame without adding new days

Resample minute data

Resample with groupby

Merge¶

The Concat docs. The Join docs.

emulate R rbind

Self Join

How to set the index and join

KDB like asof join

Join with a criteria based on the values

Plotting¶

The Plotting docs.

Make Matplotlib look like R

Setting x-axis major and minor labels

Plotting multiple charts in an ipython notebook

Creating a multi-line plot

Plotting a heatmap

Annotate a time-series plot

Annotate a time-series plot #2

Data In/Out¶

Performance comparison of SQL vs HDF5

CSV¶

The CSV docs

read_csv in action

appending to a csv

Reading a csv chunk-by-chunk

Reading only certain rows of a csv chunk-by-chunk

Reading the first few lines of a frame

Reading a file that is compressed but not by gzip/bz2 (the native compresed formats which read_csv understands). This example shows a WinZipped file, but is a general application of opening the file within a context manager and using that handle to read. See here

Inferring dtypes from a file

Dealing with bad lines

Dealing with bad lines II

Reading CSV with Unix timestamps and converting to local timezone

Write a multi-row index CSV without writing duplicates

SQL¶

The SQL docs

Reading from databases with SQL

Excel¶

The Excel docs

Reading from a filelike handle

Reading HTML tables from a server that cannot handle the default request header

HDFStore¶

The HDFStores docs

Simple Queries with a Timestamp Index

Managing heteregenous data using a linked multiple table hierarchy

Merging on-disk tables with millions of rows

Deduplicating a large store by chunks, essentially a recusive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here

Creating a store chunk-by-chunk from a csv file

Appending to a store, while creating a unique index

Large Data work flows

Reading in a sequence of files, then providing a global unique index to a store while appending

Groupby on a HDFStore

Counting with a HDFStore

Troubleshoot HDFStore exceptions

Setting min_itemsize with strings

Using ptrepack to create a completely-sorted-index on a store

Storing Attributes to a group node

In [5]: df = DataFrame(np.random.randn(8,3))

In [6]: store = HDFStore('test.h5')

In [7]: store.put('df',df)

# you can store an arbitrary python object via pickle
In [8]: store.get_storer('df').attrs.my_attribute = dict(A = 10)

In [9]: store.get_storer('df').attrs.my_attribute
{'A': 10}

Computation¶

Numerical integration (sample-based) of a time series

Miscellaneous¶

The Timedeltas docs.

Operating with timedeltas

Create timedeltas with date differences

Adding days to dates in a dataframe

Aliasing Axis Names¶

To globally provide aliases for axis names, one can define these 2 functions:

In [10]: def set_axis_alias(cls, axis, alias):
   ....:      if axis not in cls._AXIS_NUMBERS:
   ....:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   ....:      cls._AXIS_ALIASES[alias] = axis
   ....: 

In [11]: def clear_axis_alias(cls, axis, alias):
   ....:      if axis not in cls._AXIS_NUMBERS:
   ....:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   ....:      cls._AXIS_ALIASES.pop(alias,None)
   ....: 

In [12]: set_axis_alias(DataFrame,'columns', 'myaxis2')

In [13]: df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])

In [14]: df2.sum(axis='myaxis2')

i1   -0.499427
i2    0.966720
i3    0.174175
dtype: float64

In [15]: clear_axis_alias(DataFrame,'columns', 'myaxis2')

Table Of Contents

Search

Cookbook¶

Idioms¶

Selection¶

MultiIndexing¶

Arithmetic¶

Slicing¶

Sorting¶

Levels¶

panelnd¶

Missing Data¶

Replace¶

Grouping¶

Expanding Data¶

Splitting¶

Pivot¶

Apply¶

Timeseries¶

Resampling¶

Merge¶

Plotting¶

Data In/Out¶

CSV¶

SQL¶

Excel¶

HDFStore¶

Computation¶

Miscellaneous¶

Aliasing Axis Names¶

Navigation

Table Of Contents

Search

Cookbook¶

Idioms¶

Selection¶

MultiIndexing¶

Arithmetic¶

Slicing¶

Sorting¶

Levels¶

panelnd¶

Missing Data¶

Replace¶

Grouping¶

Expanding Data¶

Splitting¶

Pivot¶

Apply¶

Timeseries¶

Resampling¶

Merge¶

Plotting¶

Data In/Out¶

CSV¶

SQL¶

Excel¶

HDFStore¶

Computation¶

Miscellaneous¶

Aliasing Axis Names¶

Navigation