API Reference¶
General functions¶
Data manipulations¶
pivot_table(data[, values, rows, cols, ...]) | Create a spreadsheet-style pivot table as a DataFrame. The levels in the |
merge(left, right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
concat(objs[, axis, join, join_axes, ...]) | Concatenate pandas objects along a particular axis with optional set logic along the other axes. |
Pickling¶
load(path) | Load pickled pandas object (or any other pickled object) from the specified |
save(obj, path) | Pickle (serialize) object to input file path |
File IO¶
read_table(filepath_or_buffer[, sep, ...]) | Read general delimited file into DataFrame |
read_csv(filepath_or_buffer[, sep, header, ...]) | Read CSV (comma-separated) file into DataFrame |
ExcelFile.parse(sheetname[, header, ...]) | Read Excel table into DataFrame |
HDFStore: PyTables (HDF5)¶
HDFStore.put(key, value[, table, append, ...]) | Store object in HDFStore |
HDFStore.get(key) | Retrieve pandas object stored in file |
Standard moving window functions¶
rolling_count(arg, window[, time_rule]) | Rolling count of number of non-NaN observations inside provided window. |
rolling_sum(arg, window[, min_periods, ...]) | Moving sum |
rolling_mean(arg, window[, min_periods, ...]) | Moving mean |
rolling_median(arg, window[, min_periods, ...]) | O(N log(window)) implementation using skip list |
rolling_var(arg, window[, min_periods, ...]) | Unbiased moving variance |
rolling_std(arg, window[, min_periods, ...]) | Unbiased moving standard deviation |
rolling_corr(arg1, arg2, window[, ...]) | Moving sample correlation |
rolling_cov(arg1, arg2, window[, ...]) | Unbiased moving covariance |
rolling_skew(arg, window[, min_periods, ...]) | Unbiased moving skewness |
rolling_kurt(arg, window[, min_periods, ...]) | Unbiased moving kurtosis |
rolling_apply(arg, window, func[, ...]) | Generic moving function application |
rolling_quantile(arg, window, quantile[, ...]) | Moving quantile |
Exponentially-weighted moving window functions¶
ewma(arg[, com, span, min_periods, time_rule]) | Exponentially-weighted moving average |
ewmstd(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving std |
ewmvar(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving variance |
ewmcorr(arg1, arg2[, com, span, ...]) | Exponentially-weighted moving correlation |
ewmcov(arg1, arg2[, com, span, min_periods, ...]) | Exponentially-weighted moving covariance |
Series¶
Attributes and underlying data¶
- Axes
- index: axis labels
Series.values | Return Series as ndarray |
Series.dtype | Data-type of the array’s elements. |
Series.isnull(obj) | Replacement for numpy.isnan / -numpy.isfinite which is suitable for use on object arrays. |
Series.notnull(obj) | Replacement for numpy.isfinite / -numpy.isnan which is suitable for use on object arrays. |
Conversion / Constructors¶
Series.__init__([data, index, dtype, name, copy]) | One-dimensional ndarray with axis labels (including time series). |
Series.astype(dtype) | See numpy.ndarray.astype |
Series.copy() | Return new Series with copy of underlying values |
Indexing, iteration¶
Series.get(label[, default]) | Returns value occupying requested label, default to specified missing value if not present. |
Series.ix | |
Series.__iter__() | |
Series.iteritems([index]) | Lazily iterate over (index, value) tuples |
Binary operator functions¶
Series.add(other[, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data |
Series.div(other[, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data |
Series.mul(other[, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data |
Series.sub(other[, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data |
Series.combine(other, func[, fill_value]) | Perform elementwise binary operation on two Series using given function |
Series.combine_first(other) | Combine Series values, choosing the calling Series’s values |
Function application, GroupBy¶
Series.apply(func) | Invoke function on values of Series. Can be ufunc or Python function |
Series.map(arg) | Map values of Series using input correspondence (which can be |
Series.groupby([by, axis, level, as_index, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
Series.autocorr() | Lag-1 autocorrelation |
Series.clip([lower, upper, out]) | Trim values at input threshold(s) |
Series.clip_lower(threshold) | Return copy of series with values below given value truncated |
Series.clip_upper(threshold) | Return copy of series with values above given value truncated |
Series.corr(other[, method]) | Compute correlation two Series, excluding missing values |
Series.count([level]) | Return number of non-NA/null observations in the Series |
Series.cumprod([axis, dtype, out, skipna]) | Cumulative product of values. |
Series.cumsum([axis, dtype, out, skipna]) | Cumulative sum of values. |
Series.describe([percentile_width]) | Generate various summary statistics of Series, excluding NaN |
Series.diff([periods]) | 1st discrete difference of object |
Series.max([axis, out, skipna, level]) | Return maximum of values |
Series.mean([axis, dtype, out, skipna, level]) | Return mean of values |
Series.median([skipna, level]) | Return median of values |
Series.min([axis, out, skipna, level]) | Return minimum of values |
Series.prod([axis, dtype, out, skipna, level]) | Return product of values |
Series.quantile([q]) | Return value at the given quantile, a la scoreatpercentile in |
Series.skew([skipna, level]) | Return unbiased skewness of values |
Series.std([axis, dtype, out, ddof, skipna, ...]) | Return standard deviation of values |
Series.sum([axis, dtype, out, skipna, level]) | Return sum of values |
Series.var([axis, dtype, out, ddof, skipna, ...]) | Return variance of values |
Series.value_counts() | Returns Series containing counts of unique values. The resulting Series |
Reindexing / Selection / Label manipulation¶
Series.align(other[, join, level, copy, ...]) | Align two Series object with the specified join method |
Series.drop(labels[, axis, level]) | Return new object with labels in requested axis removed |
Series.reindex([index, method, level, ...]) | Conform Series to new index with optional filling logic, placing |
Series.reindex_like(other[, method]) | Reindex Series to match index of another Series, optionally with |
Series.rename(mapper) | Alter Series index using dict or function |
Series.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
Series.take(indices[, axis]) | Analogous to ndarray.take, return Series corresponding to requested |
Series.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
Missing data handling¶
Series.dropna() | Return Series without null values |
Series.fillna([value, method, inplace]) | Fill NA/NaN values using the specified method |
Series.interpolate([method]) | Interpolate missing values (after the first valid value) |
Reshaping, sorting¶
Series.argsort([axis, kind, order]) | Overrides ndarray.argsort. |
Series.order([na_last, ascending, kind]) | Sorts Series object, by value, maintaining index-value link |
Series.sort([axis, kind, order]) | Sort values and index labels by value, in place. |
Series.sort_index([ascending]) | Sort object by labels (along an axis) |
Series.sortlevel([level, ascending]) | Sort Series with MultiIndex by chosen level. Data will be |
Series.unstack([level]) | Unstack, a.k.a. |
Combining / joining / merging¶
Series.append(to_append) | Concatenate two or more Series. The indexes must not overlap |
Plotting¶
Series.hist([ax, grid, xlabelsize, xrot, ...]) | Draw histogram of the input series using matplotlib |
Series.plot(series[, label, kind, ...]) | Plot the input series with the index on the x-axis using matplotlib |
Serialization / IO / Conversion¶
Series.from_csv(path[, sep, parse_dates, ...]) | Read delimited file into Series |
Series.load(path) | |
Series.save(path) | |
Series.to_csv(path[, index, sep, na_rep, ...]) | Write Series to a comma-separated values (csv) file |
Series.to_dict() | Convert Series to {label -> value} dict |
Series.to_sparse([kind, fill_value]) | Convert Series to SparseSeries |
DataFrame¶
Attributes and underlying data¶
Axes
- index: row labels
- columns: column labels
DataFrame.as_matrix([columns]) | Convert the frame to its Numpy-array matrix representation. Columns |
DataFrame.dtypes | |
DataFrame.get_dtype_counts() | |
DataFrame.values | Convert the frame to its Numpy-array matrix representation. Columns |
DataFrame.axes | |
DataFrame.ndim | |
DataFrame.shape |
Conversion / Constructors¶
DataFrame.__init__([data, index, columns, ...]) | Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). |
DataFrame.astype(dtype) | Cast object to input numpy.dtype |
DataFrame.copy([deep]) | Make a copy of this object |
Indexing, iteration¶
DataFrame.ix | |
DataFrame.insert(loc, column, value) | Insert column into DataFrame at specified location. Raises Exception if |
DataFrame.__iter__() | Iterate over columns of the frame. |
DataFrame.iteritems() | Iterator over (column, series) pairs |
DataFrame.pop(item) | Return column and drop from frame. |
DataFrame.xs(key[, axis, level, copy]) | Returns a cross-section (row or column) from the DataFrame as a Series |
Binary operator functions¶
DataFrame.add(other[, axis, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data in |
DataFrame.div(other[, axis, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data in |
DataFrame.mul(other[, axis, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data in |
DataFrame.sub(other[, axis, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data in |
DataFrame.radd(other[, axis, level, fill_value]) | Binary operator radd with support to substitute a fill_value for missing data in |
DataFrame.rdiv(other[, axis, level, fill_value]) | Binary operator rdivide with support to substitute a fill_value for missing data in |
DataFrame.rmul(other[, axis, level, fill_value]) | Binary operator rmultiply with support to substitute a fill_value for missing data in |
DataFrame.rsub(other[, axis, level, fill_value]) | Binary operator rsubtract with support to substitute a fill_value for missing data in |
DataFrame.combine(other, func[, fill_value]) | Add two DataFrame objects and do not propagate NaN values, so if for a |
DataFrame.combineAdd(other) | Add two DataFrame objects and do not propagate |
DataFrame.combine_first(other) | Combine two DataFrame objects and default to non-null values in frame |
DataFrame.combineMult(other) | Multiply two DataFrame objects and do not propagate NaN values, so if |
Function application, GroupBy¶
DataFrame.apply(func[, axis, broadcast, ...]) | Applies function along input axis of DataFrame. Objects passed to |
DataFrame.applymap(func) | Apply a function to a DataFrame that is intended to operate |
DataFrame.groupby([by, axis, level, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
DataFrame.clip([upper, lower]) | Trim values at input threshold(s) |
DataFrame.clip_lower(threshold) | Trim values below threshold |
DataFrame.clip_upper(threshold) | Trim values above threshold |
DataFrame.corr([method]) | Compute pairwise correlation of columns, excluding NA/null values |
DataFrame.corrwith(other[, axis, drop]) | Compute pairwise correlation between rows or columns of two DataFrame |
DataFrame.count([axis, level, numeric_only]) | Return Series with number of non-NA/null observations over requested |
DataFrame.cumprod([axis, skipna]) | Return cumulative product over requested axis as DataFrame |
DataFrame.cumsum([axis, skipna]) | Return DataFrame of cumulative sums over requested axis. |
DataFrame.describe([percentile_width]) | Generate various summary statistics of each column, excluding |
DataFrame.diff([periods]) | 1st discrete difference of object |
DataFrame.mad([axis, skipna, level]) | Return mean absolute deviation over requested axis. |
DataFrame.max([axis, skipna, level]) | Return maximum over requested axis. |
DataFrame.mean([axis, skipna, level]) | Return mean over requested axis. |
DataFrame.median([axis, skipna, level]) | Return median over requested axis. |
DataFrame.min([axis, skipna, level]) | Return minimum over requested axis. |
DataFrame.prod([axis, skipna, level]) | Return product over requested axis. |
DataFrame.quantile([q, axis]) | Return values at the given quantile over requested axis, a la |
DataFrame.skew([axis, skipna, level]) | Return unbiased skewness over requested axis. |
DataFrame.sum([axis, numeric_only, skipna, ...]) | Return sum over requested axis. |
DataFrame.std([axis, skipna, level, ddof]) | Return standard deviation over requested axis. |
DataFrame.var([axis, skipna, level, ddof]) | Return variance over requested axis. |
Reindexing / Selection / Label manipulation¶
DataFrame.add_prefix(prefix) | Concatenate prefix string with panel items names. |
DataFrame.add_suffix(suffix) | Concatenate suffix string with panel items names |
DataFrame.align(other[, join, axis, level, ...]) | Align two DataFrame object on their index and columns with the |
DataFrame.drop(labels[, axis, level]) | Return new object with labels in requested axis removed |
DataFrame.filter([items, like, regex]) | Restrict frame’s columns to set of items or wildcard |
DataFrame.reindex([index, columns, method, ...]) | Conform DataFrame to new index with optional filling logic, placing |
DataFrame.reindex_like(other[, method, copy]) | Reindex DataFrame to match indices of another DataFrame, optionally |
DataFrame.rename([index, columns, copy]) | Alter index and / or columns using input function or functions. |
DataFrame.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
DataFrame.take(indices[, axis]) | Analogous to ndarray.take, return DataFrame corresponding to requested |
DataFrame.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
DataFrame.head([n]) | Returns first n rows of DataFrame |
DataFrame.tail([n]) | Returns last n rows of DataFrame |
Missing data handling¶
DataFrame.dropna([axis, how, thresh, subset]) | Return object with labels on given axis omitted where alternately any |
DataFrame.fillna([value, method, axis, inplace]) | Fill NA/NaN values using the specified method |
Reshaping, sorting, transposing¶
DataFrame.sort_index([axis, by, ascending]) | Sort DataFrame either by labels (along either axis) or by the values in |
DataFrame.delevel(*args, **kwargs) | |
DataFrame.pivot([index, columns, values]) | Reshape data (produce a “pivot” table) based on column values. |
DataFrame.sortlevel([level, axis, ascending]) | Sort multilevel index by chosen axis and primary level. |
DataFrame.swaplevel(i, j[, axis]) | Swap levels i and j in a MultiIndex on a particular axis |
DataFrame.stack([level, dropna]) | Pivot a level of the (possibly hierarchical) column labels, returning a |
DataFrame.unstack([level]) | Pivot a level of the (necessarily hierarchical) index labels, returning |
DataFrame.T | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
DataFrame.transpose() | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
Combining / joining / merging¶
DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns with other DataFrame either on index or on a key |
DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
DataFrame.append(other[, ignore_index, ...]) | Append columns of other to end of this frame’s columns and index, returning a new object. |
Time series-related¶
DataFrame.asfreq(freq[, method]) | Convert all TimeSeries inside to specified frequency using DateOffset |
DataFrame.shift(periods[, offset]) | Shift the index of the DataFrame by desired number of periods with an |
DataFrame.first_valid_index() | Return label for first non-NA/null value |
DataFrame.last_valid_index() | Return label for last non-NA/null value |
Plotting¶
DataFrame.hist(data[, grid, xlabelsize, ...]) | Draw Histogram the DataFrame’s series using matplotlib / pylab. |
DataFrame.plot([frame, subplots, sharex, ...]) | Make line or bar plot of DataFrame’s series with the index on the x-axis |
Serialization / IO / Conversion¶
DataFrame.from_csv(path[, header, sep, ...]) | Read delimited file into DataFrame |
DataFrame.from_records(data[, index, ...]) | Convert structured or record ndarray to DataFrame |
DataFrame.to_csv(path_or_buf[, sep, na_rep, ...]) | Write DataFrame to a comma-separated values (csv) file |
DataFrame.to_excel(excel_writer[, ...]) | Write DataFrame to a excel sheet |
DataFrame.to_dict() | Convert DataFrame to nested dictionary |
DataFrame.to_records([index]) | Convert DataFrame to record array. Index will be put in the |
DataFrame.to_sparse([fill_value, kind]) | Convert to SparseDataFrame |
DataFrame.to_string([buf, columns, ...]) | Render a DataFrame to a console-friendly tabular output. |
DataFrame.save(path) | |
DataFrame.load(path) | |
DataFrame.info([verbose, buf]) | Concise summary of a DataFrame, used in __repr__ when very large. |