Cookbook¶
This is a respository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.
This is a great First Pull Request (to add interesting links and/or put short code inline for existing links)
Idioms¶
These are some neat pandas idioms
How to split a frame with a boolean criterion?
Selection¶
The indexing docs.
Indexing using both row labels and conditionals, see here
Use loc for label-oriented slicing and iloc positional slicing, see here
Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions, see here
Mask a panel by using np.where and then reconstructing the panel with the new masked values here
Using ~ to take the complement of a boolean array, see here
MultiIndexing¶
The multindexing docs.
Grouping¶
The grouping docs.
Apply to different items in a group
Replacing values with groupby means
Sort by group with aggregation
Create multiple aggregated columns
Create a value counts column and reassign back to the DataFrame
Splitting¶
Timeseries¶
Turn a matrix with hours in columns and days in rows into a continous row sequence in the form of a time series. How to rearrange a python pandas dataframe?
Plotting¶
The Plotting docs.
Setting x-axis major and minor labels
Data In/Out¶
Performance comparison of SQL vs HDF5
CSV¶
The CSV docs
Reading the first few lines of a frame
Reading a file that is compressed but not by gzip/bz2 (the native compresed formats which read_csv understands). This example shows a WinZipped file, but is a general application of opening the file within a context manager and using that handle to read. See here
Reading CSV with Unix timestamps and converting to local timezone
HDFStore¶
The HDFStores docs
Simple Queries with a Timestamp Index
Managing heteregenous data using a linked multiple table hierarchy
Merging on-disk tables with millions of rows
Deduplicating a large store by chunks, essentially a recusive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here
Reading in a sequence of files, then providing a global unique index to a store while appending
Troubleshoot HDFStore exceptions
Setting min_itemsize with strings
Storing Attributes to a group node
In [1]: df = DataFrame(np.random.randn(8,3))
In [2]: store = HDFStore('test.h5')
In [3]: store.put('df',df)
# you can store an arbitrary python object via pickle
In [4]: store.get_storer('df').attrs.my_attribute = dict(A = 10)
In [5]: store.get_storer('df').attrs.my_attribute
{'A': 10}
Aliasing Axis Names¶
To globally provide aliases for axis names, one can define these 2 functions:
In [6]: def set_axis_alias(cls, axis, alias):
...: if axis not in cls._AXIS_NUMBERS:
...: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
...: cls._AXIS_ALIASES[alias] = axis
...:
In [7]: def clear_axis_alias(cls, axis, alias):
...: if axis not in cls._AXIS_NUMBERS:
...: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
...: cls._AXIS_ALIASES.pop(alias,None)
...:
In [8]: set_axis_alias(DataFrame,'columns', 'myaxis2')
In [9]: df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])
In [10]: df2.sum(axis='myaxis2')
i1 0.981751
i2 -2.754270
i3 -1.528539
dtype: float64
In [11]: clear_axis_alias(DataFrame,'columns', 'myaxis2')