Cookbook

This is a respository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.

This is a great First Pull Request (to add interesting links and/or put short code inline for existing links)

Selection

The indexing docs.

Indexing using both row labels and conditionals, see here

Use loc for label-oriented slicing and iloc positional slicing, see here

Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions, see here

Mask a panel by using np.where and then reconstructing the panel with the new masked values here

Using ~ to take the complement of a boolean array, see here

Efficiently creating columns using applymap

Missing Data

The missing data docs.

Data In/Out

Performance comparison of SQL vs HDF5

CSV

The CSV docs

read_csv in action

appending to a csv

Reading a csv chunk-by-chunk

Reading the first few lines of a frame

Reading a file that is compressed but not by gzip/bz2 (the native compresed formats which read_csv understands). This example shows a WinZipped file, but is a general application of opening the file within a context manager and using that handle to read. See here

Inferring dtypes from a file

Dealing with bad lines

Dealing with bad lines II

Reading CSV with Unix timestamps and converting to local timezone

Write a multi-row index CSV without writing duplicates

HDFStore

The HDFStores docs

Simple Queries with a Timestamp Index

Managing heteregenous data using a linked multiple table hierarchy

Merging on-disk tables with millions of rows

Deduplicating a large store by chunks, essentially a recusive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here

Large Data work flows

Reading in a sequence of files, then providing a global unique index to a store while appending

Groupby on a HDFStore

Troubleshoot HDFStore exceptions

Setting min_itemsize with strings

Storing Attributes to a group node

In [1]: df = DataFrame(np.random.randn(8,3))

In [2]: store = HDFStore('test.h5')

In [3]: store.put('df',df)

# you can store an arbitrary python object via pickle
In [4]: store.get_storer('df').attrs.my_attribute = dict(A = 10)

In [5]: store.get_storer('df').attrs.my_attribute
{'A': 10}

Aliasing Axis Names

To globally provide aliases for axis names, one can define these 2 functions:

In [6]: def set_axis_alias(cls, axis, alias):
   ...:      if axis not in cls._AXIS_NUMBERS:
   ...:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   ...:      cls._AXIS_ALIASES[alias] = axis
   ...: 

In [7]: def clear_axis_alias(cls, axis, alias):
   ...:      if axis not in cls._AXIS_NUMBERS:
   ...:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   ...:      cls._AXIS_ALIASES.pop(alias,None)
   ...: 

In [8]: set_axis_alias(DataFrame,'columns', 'myaxis2')

In [9]: df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])

In [10]: df2.sum(axis='myaxis2')

i1    0.981751
i2   -2.754270
i3   -1.528539
dtype: float64

In [11]: clear_axis_alias(DataFrame,'columns', 'myaxis2')