Cookbook¶
This is a respository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.
This is a great First Pull Request (to add interesting links and/or put short code inline for existing links)
Selection¶
The indexing docs.
Using loc and iloc in selections
MultiIndexing¶
The multindexing docs.
Grouping¶
The grouping docs.
Apply to different items in a group
Replacing values with groupby means
Sort by group with aggregation
Create multiple aggregated columns
Splitting¶
Timeseries¶
Turn a matrix with hours in columns and days in rows into a continous row sequence in the form of a time series. How to rearrange a python pandas dataframe?
Data In/Out¶
HDFStore¶
The HDFStores docs
Simple Queries with a Timestamp Index
Managing heteregenous data using a linked multiple table hierarchy
Merging on-disk tables with millions of rows
Deduplicating a large store by chunks, essentially a recusive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here
Troubleshoot HDFStore exceptions
Setting min_itemsize with strings
Storing Attributes to a group node
In [440]: df = DataFrame(np.random.randn(8,3))
In [441]: store = HDFStore('test.h5')
In [442]: store.put('df',df)
# you can store an arbitrary python object via pickle
In [443]: store.get_storer('df').attrs.my_attribute = dict(A = 10)
In [444]: store.get_storer('df').attrs.my_attribute
Out[444]: {'A': 10}
Miscellaneous¶
The Timedeltas docs.
Aliasing Axis Names¶
To globally provide aliases for axis names, one can define these 2 functions:
In [445]: def set_axis_alias(cls, axis, alias):
.....: if axis not in cls._AXIS_NUMBERS:
.....: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
.....: cls._AXIS_ALIASES[alias] = axis
.....:
In [446]: def clear_axis_alias(cls, axis, alias):
.....: if axis not in cls._AXIS_NUMBERS:
.....: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
.....: cls._AXIS_ALIASES.pop(alias,None)
.....:
In [447]: set_axis_alias(DataFrame,'columns', 'myaxis2')
In [448]: df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])
In [449]: df2.sum(axis='myaxis2')
Out[449]:
i1 0.981751
i2 -2.754270
i3 -1.528539
dtype: float64
In [450]: clear_axis_alias(DataFrame,'columns', 'myaxis2')