API Reference¶
General functions¶
Data manipulations¶
| pivot_table(data[, values, rows, cols, ...]) | Create a spreadsheet-style pivot table as a DataFrame. The levels in the |
| merge(left, right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
| concat(objs[, axis, join, join_axes, ...]) | Concatenate pandas objects along a particular axis with optional set logic along the other axes. |
Pickling¶
| load(path) | Load pickled pandas object (or any other pickled object) from the specified |
| save(obj, path) | Pickle (serialize) object to input file path |
File IO¶
| read_table(filepath_or_buffer[, sep, ...]) | Read general delimited file into DataFrame |
| read_csv(filepath_or_buffer[, sep, header, ...]) | Read CSV (comma-separated) file into DataFrame |
| ExcelFile.parse(sheetname[, header, ...]) | Read Excel table into DataFrame |
HDFStore: PyTables (HDF5)¶
| HDFStore.put(key, value[, table, append, ...]) | Store object in HDFStore |
| HDFStore.get(key) | Retrieve pandas object stored in file |
Standard moving window functions¶
| rolling_count(arg, window[, time_rule]) | Rolling count of number of non-NaN observations inside provided window. |
| rolling_sum(arg, window[, min_periods, ...]) | Moving sum |
| rolling_mean(arg, window[, min_periods, ...]) | Moving mean |
| rolling_median(arg, window[, min_periods, ...]) | O(N log(window)) implementation using skip list |
| rolling_var(arg, window[, min_periods, ...]) | Unbiased moving variance |
| rolling_std(arg, window[, min_periods, ...]) | Unbiased moving standard deviation |
| rolling_corr(arg1, arg2, window[, ...]) | Moving sample correlation |
| rolling_cov(arg1, arg2, window[, ...]) | Unbiased moving covariance |
| rolling_skew(arg, window[, min_periods, ...]) | Unbiased moving skewness |
| rolling_kurt(arg, window[, min_periods, ...]) | Unbiased moving kurtosis |
| rolling_apply(arg, window, func[, ...]) | Generic moving function application |
| rolling_quantile(arg, window, quantile[, ...]) | Moving quantile |
Exponentially-weighted moving window functions¶
| ewma(arg[, com, span, min_periods, time_rule]) | Exponentially-weighted moving average |
| ewmstd(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving std |
| ewmvar(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving variance |
| ewmcorr(arg1, arg2[, com, span, ...]) | Exponentially-weighted moving correlation |
| ewmcov(arg1, arg2[, com, span, min_periods, ...]) | Exponentially-weighted moving covariance |
Series¶
Attributes and underlying data¶
- Axes
- index: axis labels
| Series.values | Return Series as ndarray |
| Series.dtype | Data-type of the array’s elements. |
| Series.isnull(obj) | Replacement for numpy.isnan / -numpy.isfinite which is suitable for use on object arrays. |
| Series.notnull(obj) | Replacement for numpy.isfinite / -numpy.isnan which is suitable for use on object arrays. |
Conversion / Constructors¶
| Series.__init__([data, index, dtype, name, copy]) | One-dimensional ndarray with axis labels (including time series). |
| Series.astype(t) | Copy of the array, cast to a specified type. |
| Series.copy() | Return new Series with copy of underlying values |
Indexing, iteration¶
| Series.get(label[, default]) | Returns value occupying requested label, default to specified missing value if not present. |
| Series.ix | |
| Series.__iter__() | |
| Series.iteritems([index]) | Lazily iterate over (index, value) tuples |
Binary operator functions¶
| Series.add(other[, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data |
| Series.div(other[, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data |
| Series.mul(other[, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data |
| Series.sub(other[, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data |
| Series.combine(other, func[, fill_value]) | Perform elementwise binary operation on two Series using given function |
| Series.combine_first(other) | Combine Series values, choosing the calling Series’s values |
Function application, GroupBy¶
| Series.apply(func) | Invoke function on values of Series. Can be ufunc or Python function |
| Series.map(arg) | Map values of Series using input correspondence (which can be |
| Series.groupby([by, axis, level, as_index, sort]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
| Series.autocorr() | Lag-1 autocorrelation |
| Series.clip([lower, upper, out]) | Trim values at input threshold(s) |
| Series.clip_lower(threshold) | Return copy of series with values below given value truncated |
| Series.clip_upper(threshold) | Return copy of series with values above given value truncated |
| Series.corr(other[, method]) | Compute correlation two Series, excluding missing values |
| Series.count([level]) | Return number of non-NA/null observations in the Series |
| Series.cumprod([axis, dtype, out, skipna]) | Cumulative product of values. |
| Series.cumsum([axis, dtype, out, skipna]) | Cumulative sum of values. |
| Series.describe() | Generate various summary statistics of Series, excluding NaN |
| Series.diff([periods]) | 1st discrete difference of object |
| Series.max([axis, out, skipna, level]) | Return maximum of values |
| Series.mean([axis, dtype, out, skipna, level]) | Return mean of values |
| Series.median([skipna, level]) | Return median of values |
| Series.min([axis, out, skipna, level]) | Return minimum of values |
| Series.prod([axis, dtype, out, skipna, level]) | Return product of values |
| Series.quantile([q]) | Return value at the given quantile, a la scoreatpercentile in |
| Series.skew([skipna, level]) | Return unbiased skewness of values |
| Series.std([axis, dtype, out, ddof, skipna, ...]) | Return unbiased standard deviation of values |
| Series.sum([axis, dtype, out, skipna, level]) | Return sum of values |
| Series.var([axis, dtype, out, ddof, skipna, ...]) | Return unbiased variance of values |
| Series.value_counts() | Returns Series containing counts of unique values. The resulting Series |
Reindexing / Selection / Label manipulation¶
| Series.align(other[, join, level, copy]) | Align two Series object with the specified join method |
| Series.drop(labels[, axis]) | Return new object with labels in requested axis removed |
| Series.reindex([index, method, level, copy]) | Conform Series to new index with optional filling logic, placing |
| Series.reindex_like(other[, method]) | Reindex Series to match index of another Series, optionally with |
| Series.rename(mapper) | Alter Series index using dict or function |
| Series.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
| Series.take(indices[, axis]) | Analogous to ndarray.take, return Series corresponding to requested |
| Series.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
Missing data handling¶
| Series.dropna() | Return Series without null values |
| Series.fillna([value, method]) | Fill NA/NaN values using the specified method |
| Series.interpolate([method]) | Interpolate missing values (after the first valid value) |
Reshaping, sorting¶
| Series.argsort([axis, kind, order]) | Overrides ndarray.argsort. |
| Series.order([na_last, ascending, kind]) | Sorts Series object, by value, maintaining index-value link |
| Series.sort([axis, kind, order]) | Sort values and index labels by value, in place. |
| Series.sort_index([ascending]) | Sort object by labels (along an axis) |
| Series.sortlevel([level, ascending]) | Sort Series with MultiIndex by chosen level. Data will be |
| Series.unstack([level]) | Unstack, a.k.a. |
Combining / joining / merging¶
| Series.append(to_append) | Concatenate two or more Series. The indexes must not overlap |
Plotting¶
| Series.hist([ax, grid]) | Draw histogram of the input series using matplotlib |
| Series.plot([label, kind, use_index, rot, ...]) | Plot the input series with the index on the x-axis using matplotlib |
Serialization / IO / Conversion¶
| Series.from_csv(path[, sep, parse_dates, ...]) | Read delimited file into Series |
| Series.load(path) | |
| Series.save(path) | |
| Series.to_csv(path[, index, sep, na_rep, ...]) | Write Series to a comma-separated values (csv) file |
| Series.to_dict() | Convert Series to {label -> value} dict |
| Series.to_sparse([kind, fill_value]) | Convert Series to SparseSeries |
DataFrame¶
Attributes and underlying data¶
Axes
- index: row labels
- columns: column labels
| DataFrame.as_matrix([columns]) | Convert the frame to its Numpy-array matrix representation. Columns |
| DataFrame.dtypes | |
| DataFrame.get_dtype_counts() | |
| DataFrame.values | Convert the frame to its Numpy-array matrix representation. Columns |
| DataFrame.axes | |
| DataFrame.ndim | |
| DataFrame.shape |
Conversion / Constructors¶
| DataFrame.__init__([data, index, columns, ...]) | Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). |
| DataFrame.astype(dtype) | Cast object to input numpy.dtype |
| DataFrame.copy([deep]) | Make a copy of this object |
Indexing, iteration¶
| DataFrame.ix | |
| DataFrame.insert(loc, column, value) | Insert column into DataFrame at specified location. Raises Exception if |
| DataFrame.__iter__() | Iterate over columns of the frame. |
| DataFrame.iteritems() | Iterator over (column, series) pairs |
| DataFrame.pop(item) | Return column and drop from frame. |
| DataFrame.xs(key[, axis, level, copy]) | Returns a cross-section (row or column) from the DataFrame as a Series |
Binary operator functions¶
| DataFrame.add(other[, axis, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data in |
| DataFrame.div(other[, axis, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data in |
| DataFrame.mul(other[, axis, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data in |
| DataFrame.sub(other[, axis, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data in |
| DataFrame.radd(other[, axis, level, fill_value]) | Binary operator radd with support to substitute a fill_value for missing data in |
| DataFrame.rdiv(other[, axis, level, fill_value]) | Binary operator rdivide with support to substitute a fill_value for missing data in |
| DataFrame.rmul(other[, axis, level, fill_value]) | Binary operator rmultiply with support to substitute a fill_value for missing data in |
| DataFrame.rsub(other[, axis, level, fill_value]) | Binary operator rsubtract with support to substitute a fill_value for missing data in |
| DataFrame.combine(other, func[, fill_value]) | Add two DataFrame objects and do not propagate NaN values, so if for a |
| DataFrame.combineAdd(other) | Add two DataFrame objects and do not propagate |
| DataFrame.combine_first(other) | Combine two DataFrame objects and default to non-null values in frame |
| DataFrame.combineMult(other) | Multiply two DataFrame objects and do not propagate NaN values, so if |
Function application, GroupBy¶
| DataFrame.apply(func[, axis, broadcast, ...]) | Applies function along input axis of DataFrame. Objects passed to |
| DataFrame.applymap(func) | Apply a function to a DataFrame that is intended to operate |
| DataFrame.groupby([by, axis, level, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
| DataFrame.clip([upper, lower]) | Trim values at input threshold(s) |
| DataFrame.clip_lower(threshold) | Trim values below threshold |
| DataFrame.clip_upper(threshold) | Trim values above threshold |
| DataFrame.corr([method]) | Compute pairwise correlation of columns, excluding NA/null values |
| DataFrame.corrwith(other[, axis, drop]) | Compute pairwise correlation between rows or columns of two DataFrame |
| DataFrame.count([axis, level, numeric_only]) | Return Series with number of non-NA/null observations over requested |
| DataFrame.cumprod([axis, skipna]) | Return cumulative product over requested axis as DataFrame |
| DataFrame.cumsum([axis, skipna]) | Return DataFrame of cumulative sums over requested axis. |
| DataFrame.describe() | Generate various summary statistics of each column, excluding NaN |
| DataFrame.diff([periods]) | 1st discrete difference of object |
| DataFrame.mad([axis, skipna, level]) | Return median absolute deviation over requested axis. |
| DataFrame.max([axis, skipna, level]) | Return maximum over requested axis. |
| DataFrame.mean([axis, skipna, level]) | Return mean over requested axis. |
| DataFrame.median([axis, skipna, level]) | Return median over requested axis. |
| DataFrame.min([axis, skipna, level]) | Return minimum over requested axis. |
| DataFrame.prod([axis, skipna, level]) | Return product over requested axis. |
| DataFrame.quantile([q, axis]) | Return values at the given quantile over requested axis, a la |
| DataFrame.skew([axis, skipna, level]) | Return unbiased skewness over requested axis. |
| DataFrame.sum([axis, numeric_only, skipna, ...]) | Return sum over requested axis. |
| DataFrame.std([axis, skipna, level]) | Return unbiased standard deviation over requested axis. |
| DataFrame.var([axis, skipna, level]) | Return unbiased variance over requested axis. |
Reindexing / Selection / Label manipulation¶
| DataFrame.add_prefix(prefix) | Concatenate prefix string with panel items names. |
| DataFrame.add_suffix(suffix) | Concatenate suffix string with panel items names |
| DataFrame.align(other[, join, axis, level, copy]) | Align two DataFrame object on their index and columns with the |
| DataFrame.drop(labels[, axis]) | Return new object with labels in requested axis removed |
| DataFrame.filter([items, like, regex]) | Restrict frame’s columns to set of items or wildcard |
| DataFrame.reindex([index, columns, method, ...]) | Conform DataFrame to new index with optional filling logic, placing |
| DataFrame.reindex_like(other[, method, copy]) | Reindex DataFrame to match indices of another DataFrame, optionally |
| DataFrame.rename([index, columns, copy]) | Alter index and / or columns using input function or functions. |
| DataFrame.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
| DataFrame.take(indices[, axis]) | Analogous to ndarray.take, return DataFrame corresponding to requested |
| DataFrame.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
| DataFrame.head([n]) | Returns first n rows of DataFrame |
| DataFrame.tail([n]) | Returns last n rows of DataFrame |
Missing data handling¶
| DataFrame.dropna([axis, how, thresh, subset]) | Return object with labels on given axis omitted where alternately any |
| DataFrame.fillna([value, method]) | Fill NA/NaN values using the specified method. Member Series / |
Reshaping, sorting, transposing¶
| DataFrame.sort_index([axis, by, ascending]) | Sort DataFrame either by labels (along either axis) or by the values in |
| DataFrame.delevel(*args, **kwargs) | |
| DataFrame.pivot([index, columns, values]) | Reshape data (produce a “pivot” table) based on column values. |
| DataFrame.sortlevel([level, axis, ascending]) | Sort multilevel index by chosen axis and primary level. |
| DataFrame.swaplevel(i, j[, axis]) | Swap levels i and j in a MultiIndex on a particular axis |
| DataFrame.stack([level, dropna]) | Pivot a level of the (possibly hierarchical) column labels, returning a |
| DataFrame.unstack([level]) | Pivot a level of the (necessarily hierarchical) index labels, returning |
| DataFrame.T | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
| DataFrame.transpose() | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
Combining / joining / merging¶
| DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns with other DataFrame either on index or on a key |
| DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
| DataFrame.append(other[, ignore_index, ...]) | Append columns of other to end of this frame’s columns and index, returning a new object. |
Time series-related¶
| DataFrame.asfreq(freq[, method]) | Convert all TimeSeries inside to specified frequency using DateOffset |
| DataFrame.shift(periods[, offset]) | Shift the index of the DataFrame by desired number of periods with an |
| DataFrame.first_valid_index() | Return label for first non-NA/null value |
| DataFrame.last_valid_index() | Return label for last non-NA/null value |
Plotting¶
| DataFrame.hist([grid]) | Draw Histogram the DataFrame’s series using matplotlib / pylab. |
| DataFrame.plot([subplots, sharex, sharey, ...]) | Make line plot of DataFrame’s series with the index on the x-axis using |
Serialization / IO / Conversion¶
| DataFrame.from_csv(path[, header, sep, ...]) | Read delimited file into DataFrame |
| DataFrame.from_records(data[, index, ...]) | Convert structured or record ndarray to DataFrame |
| DataFrame.to_csv(path_or_buf[, sep, na_rep, ...]) | Write DataFrame to a comma-separated values (csv) file |
| DataFrame.to_dict() | Convert DataFrame to nested dictionary |
| DataFrame.to_records([index]) | Convert DataFrame to record array. Index will be put in the |
| DataFrame.to_sparse([fill_value, kind]) | Convert to SparseDataFrame |
| DataFrame.to_string([buf, columns, ...]) | Render a DataFrame to a console-friendly tabular output. |
| DataFrame.save(path) | |
| DataFrame.load(path) | |
| DataFrame.info([verbose, buf]) | Concise summary of a DataFrame, used in __repr__ when very large. |