Cookbook¶

Turn a matrix with hours in columns and days in rows into a continous row sequence in the form of a time series. How to rearrange a python pandas dataframe?

Resampling¶

The Resample docs.

TimeGrouping of values grouped across time

TimeGrouping #2

Using TimeGrouper and another grouping to create subgroups, then apply a custom function

Resampling with custom periods

Resample intraday frame without adding new days

Resample minute data

Merge¶

The Concat docs. The Join docs.

emulate R rbind

Self Join

How to set the index and join

KDB like asof join

Join with a criteria based on the values

Plotting¶

The Plotting docs.

Make Matplotlib look like R

Setting x-axis major and minor labels

Plotting multiple charts in an ipython notebook

Creating a multi-line plot

Plotting a heatmap

Annotate a time-series plot

Data In/Out¶

Performance comparison of SQL vs HDF5

CSV¶

The CSV docs

read_csv in action

appending to a csv

Reading a csv chunk-by-chunk

Reading the first few lines of a frame

Reading a file that is compressed but not by gzip/bz2 (the native compresed formats which read_csv understands). This example shows a WinZipped file, but is a general application of opening the file within a context manager and using that handle to read. See here

Inferring dtypes from a file

Dealing with bad lines

Dealing with bad lines II

Reading CSV with Unix timestamps and converting to local timezone

Write a multi-row index CSV without writing duplicates

SQL¶

The SQL docs

Reading from databases with SQL

Excel¶

The Excel docs

Reading from a filelike handle

HDFStore¶

The HDFStores docs

Simple Queries with a Timestamp Index

Managing heteregenous data using a linked multiple table hierarchy

Merging on-disk tables with millions of rows

Deduplicating a large store by chunks, essentially a recusive reduction operation. Shows a function for taking in data from csv file and creating a store by chunks, with date parsing as well. See here

Large Data work flows

Reading in a sequence of files, then providing a global unique index to a store while appending

Groupby on a HDFStore

Troubleshoot HDFStore exceptions

Setting min_itemsize with strings

Storing Attributes to a group node

In [1]: df = DataFrame(np.random.randn(8,3))

In [2]: store = HDFStore('test.h5')

In [3]: store.put('df',df)

# you can store an arbitrary python object via pickle
In [4]: store.get_storer('df').attrs.my_attribute = dict(A = 10)

In [5]: store.get_storer('df').attrs.my_attribute
{'A': 10}

Computation¶

Numerical integration (sample-based) of a time series

Miscellaneous¶

The Timedeltas docs.

Operating with timedeltas

Create timedeltas with date differences

Adding days to dates in a dataframe

Aliasing Axis Names¶

To globally provide aliases for axis names, one can define these 2 functions:

In [6]: def set_axis_alias(cls, axis, alias):
   ...:      if axis not in cls._AXIS_NUMBERS:
   ...:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   ...:      cls._AXIS_ALIASES[alias] = axis
   ...: 

In [7]: def clear_axis_alias(cls, axis, alias):
   ...:      if axis not in cls._AXIS_NUMBERS:
   ...:          raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
   ...:      cls._AXIS_ALIASES.pop(alias,None)
   ...: 

In [8]: set_axis_alias(DataFrame,'columns', 'myaxis2')

In [9]: df2 = DataFrame(randn(3,2),columns=['c1','c2'],index=['i1','i2','i3'])

In [10]: df2.sum(axis='myaxis2')

i1    0.981751
i2   -2.754270
i3   -1.528539
dtype: float64

In [11]: clear_axis_alias(DataFrame,'columns', 'myaxis2')

Table Of Contents

Search

Cookbook¶

Idioms¶

Selection¶

MultiIndexing¶

Slicing¶

Sorting¶

Levels¶

Missing Data¶

Replace¶

Grouping¶

Expanding Data¶

Splitting¶

Pivot¶

Apply¶

Timeseries¶

Resampling¶

Merge¶

Plotting¶

Data In/Out¶

CSV¶

SQL¶

Excel¶

HDFStore¶

Computation¶

Miscellaneous¶

Aliasing Axis Names¶

Navigation

Table Of Contents

Search

Cookbook¶

Idioms¶

Selection¶

MultiIndexing¶

Slicing¶

Sorting¶

Levels¶

Missing Data¶

Replace¶

Grouping¶

Expanding Data¶

Splitting¶

Pivot¶

Apply¶

Timeseries¶

Resampling¶

Merge¶

Plotting¶

Data In/Out¶

CSV¶

SQL¶

Excel¶

HDFStore¶

Computation¶

Miscellaneous¶

Aliasing Axis Names¶

Navigation