Cookbook¶
This is a repository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.
This is a great First Pull Request (to add interesting links and/or put short code inline for existing links)
Idioms¶
These are some neat pandas idioms
How to split a frame with a boolean criterion?
How to select from a frame with complex criteria?
Select rows closest to a user-defined number
How to reduce a sequence (e.g. of Series) using a binary operator
Selection¶
The indexing docs.
Indexing using both row labels and conditionals, see here
Use loc for label-oriented slicing and iloc positional slicing, see here
Extend a panel frame by transposing, adding a new dimension, and transposing back to the original dimensions, see here
Mask a panel by using np.where and then reconstructing the panel with the new masked values here
Using ~ to take the complement of a boolean array, see here
MultiIndexing¶
The multindexing docs.
Missing Data¶
The missing data docs.
Fill forward a reversed timeseries
In [1]: df = pd.DataFrame(np.random.randn(6,1), index=pd.date_range('2013-08-01', periods=6, freq='B'), columns=list('A'))
In [2]: df.ix[3,'A'] = np.nan
In [3]: df
Out[3]:
A
2013-08-01 0.469112
2013-08-02 -0.282863
2013-08-05 -1.509059
2013-08-06 NaN
2013-08-07 1.212112
2013-08-08 -0.173215
[6 rows x 1 columns]
In [4]: df.reindex(df.index[::-1]).ffill()
Out[4]:
A
2013-08-08 -0.173215
2013-08-07 1.212112
2013-08-06 1.212112
2013-08-05 -1.509059
2013-08-02 -0.282863
2013-08-01 0.469112
[6 rows x 1 columns]
Replace¶
Grouping¶
The grouping docs.
Apply to different items in a group
Replacing values with groupby means
Sort by group with aggregation
Create multiple aggregated columns
Create a value counts column and reassign back to the DataFrame
Splitting¶
Timeseries¶
Turn a matrix with hours in columns and days in rows into a continuous row sequence in the form of a time series. How to rearrange a python pandas DataFrame?
Plotting¶
The Plotting docs.
Setting x-axis major and minor labels
Plotting multiple charts in an ipython notebook
Annotate a time-series plot #2
Generate Embedded plots in excel files using Pandas, Vincent and xlsxwriter
Data In/Out¶
Performance comparison of SQL vs HDF5
CSV¶
The CSV docs
Reading only certain rows of a csv chunk-by-chunk
Reading the first few lines of a frame
Reading a file that is compressed but not by gzip/bz2 (the native compressed formats which read_csv understands). This example shows a WinZipped file, but is a general application of opening the file within a context manager and using that handle to read. See here
Reading CSV with Unix timestamps and converting to local timezone
Excel¶
The Excel docs
Reading from a filelike handle
Reading HTML tables from a server that cannot handle the default request header
HDFStore¶
The HDFStores docs
Simple Queries with a Timestamp Index
Managing heterogeneous data using a linked multiple table hierarchy
Merging on-disk tables with millions of rows
Deduplicating a large store by chunks, essentially a recursive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here
Creating a store chunk-by-chunk from a csv file
Appending to a store, while creating a unique index
Reading in a sequence of files, then providing a global unique index to a store while appending
Troubleshoot HDFStore exceptions
Setting min_itemsize with strings
Using ptrepack to create a completely-sorted-index on a store
Storing Attributes to a group node
In [5]: df = DataFrame(np.random.randn(8,3))
In [6]: store = HDFStore('test.h5')
In [7]: store.put('df',df)
# you can store an arbitrary python object via pickle
In [8]: store.get_storer('df').attrs.my_attribute = dict(A = 10)
In [9]: store.get_storer('df').attrs.my_attribute
Out[9]: {'A': 10}
Aliasing Axis Names¶
To globally provide aliases for axis names, one can define these 2 functions:
In [10]: def set_axis_alias(cls, axis, alias):
....: if axis not in cls._AXIS_NUMBERS:
....: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
....: cls._AXIS_ALIASES[alias] = axis
....: