pandas 0.8.1 documentation

API Reference

General functions

Data manipulations

pivot_table(data[, values, rows, cols, ...]) Create a spreadsheet-style pivot table as a DataFrame. The levels in the
merge(left, right[, how, on, left_on, ...]) Merge DataFrame objects by performing a database-style join operation by
concat(objs[, axis, join, join_axes, ...]) Concatenate pandas objects along a particular axis with optional set logic along the other axes.

Pickling

load(path) Load pickled pandas object (or any other pickled object) from the specified
save(obj, path) Pickle (serialize) object to input file path

File IO

read_table(filepath_or_buffer[, sep, ...]) Read general delimited file into DataFrame
read_csv(filepath_or_buffer[, sep, dialect, ...]) Read CSV (comma-separated) file into DataFrame
ExcelFile.parse(sheetname[, header, ...]) Read Excel table into DataFrame

HDFStore: PyTables (HDF5)

HDFStore.put(key, value[, table, append, ...]) Store object in HDFStore
HDFStore.get(key) Retrieve pandas object stored in file

Standard moving window functions

rolling_count(arg, window[, freq, time_rule]) Rolling count of number of non-NaN observations inside provided window.
rolling_sum(arg, window[, min_periods, ...]) Moving sum
rolling_mean(arg, window[, min_periods, ...]) Moving mean
rolling_median(arg, window[, min_periods, ...]) O(N log(window)) implementation using skip list
rolling_var(arg, window[, min_periods, ...]) Unbiased moving variance
rolling_std(arg, window[, min_periods, ...]) Unbiased moving standard deviation
rolling_corr(arg1, arg2, window[, ...]) Moving sample correlation
rolling_cov(arg1, arg2, window[, ...]) Unbiased moving covariance
rolling_skew(arg, window[, min_periods, ...]) Unbiased moving skewness
rolling_kurt(arg, window[, min_periods, ...]) Unbiased moving kurtosis
rolling_apply(arg, window, func[, ...]) Generic moving function application
rolling_quantile(arg, window, quantile[, ...]) Moving quantile

Exponentially-weighted moving window functions

ewma(arg[, com, span, min_periods, freq, ...]) Exponentially-weighted moving average
ewmstd(arg[, com, span, min_periods, bias, ...]) Exponentially-weighted moving std
ewmvar(arg[, com, span, min_periods, bias, ...]) Exponentially-weighted moving variance
ewmcorr(arg1, arg2[, com, span, ...]) Exponentially-weighted moving correlation
ewmcov(arg1, arg2[, com, span, min_periods, ...]) Exponentially-weighted moving covariance

Series

Attributes and underlying data

Axes
  • index: axis labels
Series.values Return Series as ndarray
Series.dtype Data-type of the array’s elements.
Series.isnull(obj) Replacement for numpy.isnan / -numpy.isfinite which is suitable for use on object arrays.
Series.notnull(obj) Replacement for numpy.isfinite / -numpy.isnan which is suitable for use on object arrays.

Conversion / Constructors

Series.__init__([data, index, dtype, name, copy]) One-dimensional ndarray with axis labels (including time series).
Series.astype(dtype) See numpy.ndarray.astype
Series.copy([order]) Return new Series with copy of underlying values

Indexing, iteration

Series.get(label[, default]) Returns value occupying requested label, default to specified missing value if not present.
Series.ix
Series.__iter__()
Series.iteritems([index]) Lazily iterate over (index, value) tuples

Binary operator functions

Series.add(other[, level, fill_value]) Binary operator add with support to substitute a fill_value for missing data
Series.div(other[, level, fill_value]) Binary operator divide with support to substitute a fill_value for missing data
Series.mul(other[, level, fill_value]) Binary operator multiply with support to substitute a fill_value for missing data
Series.sub(other[, level, fill_value]) Binary operator subtract with support to substitute a fill_value for missing data
Series.combine(other, func[, fill_value]) Perform elementwise binary operation on two Series using given function
Series.combine_first(other) Combine Series values, choosing the calling Series’s values

Function application, GroupBy

Series.apply(func[, convert_dtype]) Invoke function on values of Series. Can be ufunc or Python function
Series.map(arg) Map values of Series using input correspondence (which can be
Series.groupby([by, axis, level, as_index, ...]) Group series using mapper (dict or key function, apply given function

Computations / Descriptive Stats

Series.autocorr() Lag-1 autocorrelation
Series.clip([lower, upper, out]) Trim values at input threshold(s)
Series.clip_lower(threshold) Return copy of series with values below given value truncated
Series.clip_upper(threshold) Return copy of series with values above given value truncated
Series.corr(other[, method]) Compute correlation two Series, excluding missing values
Series.count([level]) Return number of non-NA/null observations in the Series
Series.cumprod([axis, dtype, out, skipna]) Cumulative product of values.
Series.cumsum([axis, dtype, out, skipna]) Cumulative sum of values.
Series.describe([percentile_width]) Generate various summary statistics of Series, excluding NaN
Series.diff([periods]) 1st discrete difference of object
Series.max([axis, out, skipna, level]) Return maximum of values
Series.mean([axis, dtype, out, skipna, level]) Return mean of values
Series.median([axis, dtype, out, skipna, level]) Return median of values
Series.min([axis, out, skipna, level]) Return minimum of values
Series.prod([axis, dtype, out, skipna, level]) Return product of values
Series.quantile([q]) Return value at the given quantile, a la scoreatpercentile in
Series.skew([skipna, level]) Return unbiased skewness of values
Series.std([axis, dtype, out, ddof, skipna, ...]) Return standard deviation of values
Series.sum([axis, dtype, out, skipna, level]) Return sum of values
Series.var([axis, dtype, out, ddof, skipna, ...]) Return variance of values
Series.value_counts() Returns Series containing counts of unique values. The resulting Series

Reindexing / Selection / Label manipulation

Series.align(other[, join, level, copy, ...]) Align two Series object with the specified join method
Series.drop(labels[, axis, level]) Return new object with labels in requested axis removed
Series.reindex([index, method, level, ...]) Conform Series to new index with optional filling logic, placing
Series.reindex_like(other[, method, limit]) Reindex Series to match index of another Series, optionally with
Series.rename(mapper[, inplace]) Alter Series index using dict or function
Series.select(crit[, axis]) Return data corresponding to axis labels matching criteria
Series.take(indices[, axis]) Analogous to ndarray.take, return Series corresponding to requested
Series.truncate([before, after, copy]) Function truncate a sorted DataFrame / Series before and/or after

Missing data handling

Series.dropna() Return Series without null values
Series.fillna([value, method, inplace, limit]) Fill NA/NaN values using the specified method
Series.interpolate([method]) Interpolate missing values (after the first valid value)

Reshaping, sorting

Series.argsort([axis, kind, order]) Overrides ndarray.argsort.
Series.order([na_last, ascending, kind]) Sorts Series object, by value, maintaining index-value link
Series.sort([axis, kind, order]) Sort values and index labels by value, in place.
Series.sort_index([ascending]) Sort object by labels (along an axis)
Series.sortlevel([level, ascending]) Sort Series with MultiIndex by chosen level. Data will be
Series.unstack([level]) Unstack, a.k.a.

Combining / joining / merging

Series.append(to_append[, verify_integrity]) Concatenate two or more Series. The indexes must not overlap

Plotting

Series.hist([ax, grid, xlabelsize, xrot, ...]) Draw histogram of the input series using matplotlib
Series.plot(series[, label, kind, ...]) Plot the input series with the index on the x-axis using matplotlib

Serialization / IO / Conversion

Series.from_csv(path[, sep, parse_dates, ...]) Read delimited file into Series
Series.load(path)
Series.save(path)
Series.to_csv(path[, index, sep, na_rep, ...]) Write Series to a comma-separated values (csv) file
Series.to_dict() Convert Series to {label -> value} dict
Series.to_sparse([kind, fill_value]) Convert Series to SparseSeries

DataFrame

Attributes and underlying data

Axes

  • index: row labels
  • columns: column labels
DataFrame.as_matrix([columns]) Convert the frame to its Numpy-array matrix representation. Columns
DataFrame.dtypes
DataFrame.get_dtype_counts()
DataFrame.values Convert the frame to its Numpy-array matrix representation. Columns
DataFrame.axes
DataFrame.ndim
DataFrame.shape

Conversion / Constructors

DataFrame.__init__([data, index, columns, ...]) Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
DataFrame.astype(dtype) Cast object to input numpy.dtype
DataFrame.copy([deep]) Make a copy of this object

Indexing, iteration

DataFrame.ix
DataFrame.insert(loc, column, value) Insert column into DataFrame at specified location. Raises Exception if
DataFrame.__iter__() Iterate over columns of the frame.
DataFrame.iteritems() Iterator over (column, series) pairs
DataFrame.pop(item) Return column and drop from frame.
DataFrame.xs(key[, axis, level, copy]) Returns a cross-section (row or column) from the DataFrame as a Series

Binary operator functions

DataFrame.add(other[, axis, level, fill_value]) Binary operator add with support to substitute a fill_value for missing data in
DataFrame.div(other[, axis, level, fill_value]) Binary operator divide with support to substitute a fill_value for missing data in
DataFrame.mul(other[, axis, level, fill_value]) Binary operator multiply with support to substitute a fill_value for missing data in
DataFrame.sub(other[, axis, level, fill_value]) Binary operator subtract with support to substitute a fill_value for missing data in
DataFrame.radd(other[, axis, level, fill_value]) Binary operator radd with support to substitute a fill_value for missing data in
DataFrame.rdiv(other[, axis, level, fill_value]) Binary operator rdivide with support to substitute a fill_value for missing data in
DataFrame.rmul(other[, axis, level, fill_value]) Binary operator rmultiply with support to substitute a fill_value for missing data in
DataFrame.rsub(other[, axis, level, fill_value]) Binary operator rsubtract with support to substitute a fill_value for missing data in
DataFrame.combine(other, func[, fill_value]) Add two DataFrame objects and do not propagate NaN values, so if for a
DataFrame.combineAdd(other) Add two DataFrame objects and do not propagate
DataFrame.combine_first(other) Combine two DataFrame objects and default to non-null values in frame
DataFrame.combineMult(other) Multiply two DataFrame objects and do not propagate NaN values, so if

Function application, GroupBy

DataFrame.apply(func[, axis, broadcast, ...]) Applies function along input axis of DataFrame. Objects passed to
DataFrame.applymap(func) Apply a function to a DataFrame that is intended to operate
DataFrame.groupby([by, axis, level, ...]) Group series using mapper (dict or key function, apply given function

Computations / Descriptive Stats

DataFrame.clip([upper, lower]) Trim values at input threshold(s)
DataFrame.clip_lower(threshold) Trim values below threshold
DataFrame.clip_upper(threshold) Trim values above threshold
DataFrame.corr([method]) Compute pairwise correlation of columns, excluding NA/null values
DataFrame.corrwith(other[, axis, drop]) Compute pairwise correlation between rows or columns of two DataFrame
DataFrame.count([axis, level, numeric_only]) Return Series with number of non-NA/null observations over requested
DataFrame.cumprod([axis, skipna]) Return cumulative product over requested axis as DataFrame
DataFrame.cumsum([axis, skipna]) Return DataFrame of cumulative sums over requested axis.
DataFrame.describe([percentile_width]) Generate various summary statistics of each column, excluding
DataFrame.diff([periods]) 1st discrete difference of object
DataFrame.mad([axis, skipna, level]) Return mean absolute deviation over requested axis.
DataFrame.max([axis, skipna, level]) Return maximum over requested axis.
DataFrame.mean([axis, skipna, level]) Return mean over requested axis.
DataFrame.median([axis, skipna, level]) Return median over requested axis.
DataFrame.min([axis, skipna, level]) Return minimum over requested axis.
DataFrame.prod([axis, skipna, level]) Return product over requested axis.
DataFrame.quantile([q, axis]) Return values at the given quantile over requested axis, a la
DataFrame.skew([axis, skipna, level]) Return unbiased skewness over requested axis.
DataFrame.sum([axis, numeric_only, skipna, ...]) Return sum over requested axis.
DataFrame.std([axis, skipna, level, ddof]) Return standard deviation over requested axis.
DataFrame.var([axis, skipna, level, ddof]) Return variance over requested axis.

Reindexing / Selection / Label manipulation

DataFrame.add_prefix(prefix) Concatenate prefix string with panel items names.
DataFrame.add_suffix(suffix) Concatenate suffix string with panel items names
DataFrame.align(other[, join, axis, level, ...]) Align two DataFrame object on their index and columns with the
DataFrame.drop(labels[, axis, level]) Return new object with labels in requested axis removed
DataFrame.filter([items, like, regex]) Restrict frame’s columns to set of items or wildcard
DataFrame.reindex([index, columns, method, ...]) Conform DataFrame to new index with optional filling logic, placing
DataFrame.reindex_like(other[, method, ...]) Reindex DataFrame to match indices of another DataFrame, optionally
DataFrame.rename([index, columns, copy, inplace]) Alter index and / or columns using input function or functions.
DataFrame.select(crit[, axis]) Return data corresponding to axis labels matching criteria
DataFrame.take(indices[, axis]) Analogous to ndarray.take, return DataFrame corresponding to requested
DataFrame.truncate([before, after, copy]) Function truncate a sorted DataFrame / Series before and/or after
DataFrame.head([n]) Returns first n rows of DataFrame
DataFrame.tail([n]) Returns last n rows of DataFrame

Missing data handling

DataFrame.dropna([axis, how, thresh, subset]) Return object with labels on given axis omitted where alternately any
DataFrame.fillna([value, method, axis, ...]) Fill NA/NaN values using the specified method

Reshaping, sorting, transposing

DataFrame.sort_index([axis, by, ascending, ...]) Sort DataFrame either by labels (along either axis) or by the values in
DataFrame.delevel(*args, **kwargs)
DataFrame.pivot([index, columns, values]) Reshape data (produce a “pivot” table) based on column values.
DataFrame.sortlevel([level, axis, ascending]) Sort multilevel index by chosen axis and primary level.
DataFrame.swaplevel(i, j[, axis]) Swap levels i and j in a MultiIndex on a particular axis
DataFrame.stack([level, dropna]) Pivot a level of the (possibly hierarchical) column labels, returning a
DataFrame.unstack([level]) Pivot a level of the (necessarily hierarchical) index labels, returning
DataFrame.T Returns a DataFrame with the rows/columns switched. If the DataFrame is
DataFrame.transpose() Returns a DataFrame with the rows/columns switched. If the DataFrame is

Combining / joining / merging

DataFrame.join(other[, on, how, lsuffix, ...]) Join columns with other DataFrame either on index or on a key
DataFrame.merge(right[, how, on, left_on, ...]) Merge DataFrame objects by performing a database-style join operation by
DataFrame.append(other[, ignore_index, ...]) Append columns of other to end of this frame’s columns and index, returning a new object.

Time series-related

DataFrame.asfreq(freq[, method, how]) Convert all TimeSeries inside to specified frequency using DateOffset
DataFrame.shift([periods, freq]) Shift the index of the DataFrame by desired number of periods with an
DataFrame.first_valid_index() Return label for first non-NA/null value
DataFrame.last_valid_index() Return label for last non-NA/null value

Plotting

DataFrame.hist(data[, grid, xlabelsize, ...]) Draw Histogram the DataFrame’s series using matplotlib / pylab.
DataFrame.plot([frame, x, y, subplots, ...]) Make line or bar plot of DataFrame’s series with the index on the x-axis

Serialization / IO / Conversion

DataFrame.from_csv(path[, header, sep, ...]) Read delimited file into DataFrame
DataFrame.from_records(data[, index, ...]) Convert structured or record ndarray to DataFrame
DataFrame.to_csv(path_or_buf[, sep, na_rep, ...]) Write DataFrame to a comma-separated values (csv) file
DataFrame.to_excel(excel_writer[, ...]) Write DataFrame to a excel sheet
DataFrame.to_dict([outtype]) Convert DataFrame to dictionary.
DataFrame.to_records([index]) Convert DataFrame to record array. Index will be put in the
DataFrame.to_sparse([fill_value, kind]) Convert to SparseDataFrame
DataFrame.to_string([buf, columns, ...]) Render a DataFrame to a console-friendly tabular output.
DataFrame.save(path)
DataFrame.load(path)
DataFrame.info([verbose, buf]) Concise summary of a DataFrame, used in __repr__ when very large.

Panel

Computations / Descriptive Stats