Cookbook

This is a respository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.

This is a great First Pull Request (to add interesting links and/or put short code inline for existing links)

Data In/Out

HDFStore

The HDFStores docs

Simple Queries with a Timestamp Index

Managing heteregenous data using a linked multiple table hierarchy

Merging on-disk tables with millions of rows

Deduplicating a large store by chunks, essentially a recusive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here

Large Data work flows

Groupby on a HDFStore

Troubleshoot HDFStore exceptions

Setting min_itemsize with strings

Storing Attributes to a group node

In [440]: df = DataFrame(np.random.randn(8,3))

In [441]: store = HDFStore('test.h5')

In [442]: store.put('df',df)

# you can store an arbitrary python object via pickle
In [443]: store.get_storer('df').attrs.my_attribute = dict(A = 10)

In [444]: store.get_storer('df').attrs.my_attribute
Out[444]: {'A': 10}

Aliasing Axis Names

To globally provide aliases for axis names, one can define these 2 functions:

In [445]: def set_axis_alias(cls, axis, alias):
   .....:      if axis not in cls._AXIS_NUMBERS:
   .....:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   .....:      cls._AXIS_ALIASES[alias] = axis
   .....:

In [446]: def clear_axis_alias(cls, axis, alias):
   .....:      if axis not in cls._AXIS_NUMBERS:
   .....:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   .....:      cls._AXIS_ALIASES.pop(alias,None)
   .....:

In [447]: set_axis_alias(DataFrame,'columns', 'myaxis2')

In [448]: df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])

In [449]: df2.sum(axis='myaxis2')
Out[449]: 
i1    0.981751
i2   -2.754270
i3   -1.528539
dtype: float64

In [450]: clear_axis_alias(DataFrame,'columns', 'myaxis2')