API Reference¶
General functions¶
Data manipulations¶
pivot_table(data[, values, rows, cols, ...]) | Create a spreadsheet-style pivot table as a DataFrame. The levels in the |
merge(left, right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
concat(objs[, axis, join, join_axes, ...]) | Concatenate pandas objects along a particular axis with optional set logic along the other axes. |
Pickling¶
load(path) | Load pickled pandas object (or any other pickled object) from the specified |
save(obj, path) | Pickle (serialize) object to input file path |
File IO¶
read_table(filepath_or_buffer[, sep, ...]) | Read general delimited file into DataFrame |
read_csv(filepath_or_buffer[, sep, dialect, ...]) | Read CSV (comma-separated) file into DataFrame |
ExcelFile.parse(sheetname[, header, ...]) | Read Excel table into DataFrame |
HDFStore: PyTables (HDF5)¶
HDFStore.put(key, value[, table, append, ...]) | Store object in HDFStore |
HDFStore.get(key) | Retrieve pandas object stored in file |
Standard moving window functions¶
rolling_count(arg, window[, freq, center, ...]) | Rolling count of number of non-NaN observations inside provided window. |
rolling_sum(arg, window[, min_periods, ...]) | Moving sum |
rolling_mean(arg, window[, min_periods, ...]) | Moving mean |
rolling_median(arg, window[, min_periods, ...]) | O(N log(window)) implementation using skip list |
rolling_var(arg, window[, min_periods, ...]) | Unbiased moving variance |
rolling_std(arg, window[, min_periods, ...]) | Unbiased moving standard deviation |
rolling_corr(arg1, arg2, window[, ...]) | Moving sample correlation |
rolling_cov(arg1, arg2, window[, ...]) | Unbiased moving covariance |
rolling_skew(arg, window[, min_periods, ...]) | Unbiased moving skewness |
rolling_kurt(arg, window[, min_periods, ...]) | Unbiased moving kurtosis |
rolling_apply(arg, window, func[, ...]) | Generic moving function application |
rolling_quantile(arg, window, quantile[, ...]) | Moving quantile |
Standard expanding window functions¶
expanding_count(arg[, freq, center, time_rule]) | Expanding count of number of non-NaN observations. |
expanding_sum(arg[, min_periods, freq, ...]) | Expanding sum |
expanding_mean(arg[, min_periods, freq, ...]) | Expanding mean |
expanding_median(arg[, min_periods, freq, ...]) | O(N log(window)) implementation using skip list |
expanding_var(arg[, min_periods, freq, ...]) | Unbiased expanding variance |
expanding_std(arg[, min_periods, freq, ...]) | Unbiased expanding standard deviation |
expanding_corr(arg1, arg2[, min_periods, ...]) | Expanding sample correlation |
expanding_cov(arg1, arg2[, min_periods, ...]) | Unbiased expanding covariance |
expanding_skew(arg[, min_periods, freq, ...]) | Unbiased expanding skewness |
expanding_kurt(arg[, min_periods, freq, ...]) | Unbiased expanding kurtosis |
expanding_apply(arg, func[, min_periods, ...]) | Generic expanding function application |
expanding_quantile(arg, quantile[, ...]) | Expanding quantile |
Exponentially-weighted moving window functions¶
ewma(arg[, com, span, min_periods, freq, ...]) | Exponentially-weighted moving average |
ewmstd(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving std |
ewmvar(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving variance |
ewmcorr(arg1, arg2[, com, span, ...]) | Exponentially-weighted moving correlation |
ewmcov(arg1, arg2[, com, span, min_periods, ...]) | Exponentially-weighted moving covariance |
Series¶
Attributes and underlying data¶
- Axes
- index: axis labels
Series.values | Return Series as ndarray |
Series.dtype | Data-type of the array’s elements. |
Series.isnull(obj) | Detect missing values (NaN in numeric arrays, None/NaN in object arrays) |
Series.notnull(obj) | Replacement for numpy.isfinite / -numpy.isnan which is suitable for use on object arrays. |
Conversion / Constructors¶
Series.__init__([data, index, dtype, name, copy]) | One-dimensional ndarray with axis labels (including time series). |
Series.astype(dtype) | See numpy.ndarray.astype |
Series.copy([order]) | Return new Series with copy of underlying values |
Indexing, iteration¶
Series.get(label[, default]) | Returns value occupying requested label, default to specified missing value if not present. |
Series.ix | |
Series.__iter__() | |
Series.iteritems([index]) | Lazily iterate over (index, value) tuples |
Binary operator functions¶
Series.add(other[, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data |
Series.div(other[, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data |
Series.mul(other[, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data |
Series.sub(other[, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data |
Series.combine(other, func[, fill_value]) | Perform elementwise binary operation on two Series using given function |
Series.combine_first(other) | Combine Series values, choosing the calling Series’s values |
Series.round([decimals, out]) | Return a with each element rounded to the given number of decimals. |
Function application, GroupBy¶
Series.apply(func[, convert_dtype, args]) | Invoke function on values of Series. Can be ufunc (a NumPy function |
Series.map(arg[, na_action]) | Map values of Series using input correspondence (which can be |
Series.groupby([by, axis, level, as_index, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
Series.abs() | Return an object with absolute value taken. |
Series.any([axis, out]) | Returns True if any of the elements of a evaluate to True. |
Series.autocorr() | Lag-1 autocorrelation |
Series.between(left, right[, inclusive]) | Return boolean Series equivalent to left <= series <= right. NA values |
Series.clip([lower, upper, out]) | Trim values at input threshold(s) |
Series.clip_lower(threshold) | Return copy of series with values below given value truncated |
Series.clip_upper(threshold) | Return copy of series with values above given value truncated |
Series.corr(other[, method, min_periods]) | Compute correlation two Series, excluding missing values |
Series.count([level]) | Return number of non-NA/null observations in the Series |
Series.cov(other[, min_periods]) | Compute covariance with Series, excluding missing values |
Series.cummax([axis, dtype, out, skipna]) | Cumulative max of values. |
Series.cummin([axis, dtype, out, skipna]) | Cumulative min of values. |
Series.cumprod([axis, dtype, out, skipna]) | Cumulative product of values. |
Series.cumsum([axis, dtype, out, skipna]) | Cumulative sum of values. |
Series.describe([percentile_width]) | Generate various summary statistics of Series, excluding NaN |
Series.diff([periods]) | 1st discrete difference of object |
Series.kurt([skipna, level]) | Return unbiased kurtosis of values |
Series.mad([skipna, level]) | Return mean absolute deviation of values |
Series.max([axis, out, skipna, level]) | Return maximum of values |
Series.mean([axis, dtype, out, skipna, level]) | Return mean of values |
Series.median([axis, dtype, out, skipna, level]) | Return median of values |
Series.min([axis, out, skipna, level]) | Return minimum of values |
Series.nunique() | Return count of unique elements in the Series |
Series.pct_change([periods, fill_method, ...]) | Percent change over given number of periods |
Series.prod([axis, dtype, out, skipna, level]) | Return product of values |
Series.quantile([q]) | Return value at the given quantile, a la scoreatpercentile in |
Series.rank([method, na_option, ascending]) | Compute data ranks (1 through n). |
Series.skew([skipna, level]) | Return unbiased skewness of values |
Series.std([axis, dtype, out, ddof, skipna, ...]) | Return standard deviation of values |
Series.sum([axis, dtype, out, skipna, level]) | Return sum of values |
Series.unique() | Return array of unique values in the Series. Significantly faster than |
Series.var([axis, dtype, out, ddof, skipna, ...]) | Return variance of values |
Series.value_counts() | Returns Series containing counts of unique values. The resulting Series |
Reindexing / Selection / Label manipulation¶
Series.align(other[, join, level, copy, ...]) | Align two Series object with the specified join method |
Series.drop(labels[, axis, level]) | Return new object with labels in requested axis removed |
Series.first(offset) | Convenience method for subsetting initial periods of time series data |
Series.head([n]) | Returns first n rows of Series |
Series.idxmax([axis, out, skipna]) | Index of first occurrence of maximum of values. |
Series.idxmin([axis, out, skipna]) | Index of first occurrence of minimum of values. |
Series.isin(values) | Return boolean vector showing whether each element in the Series is |
Series.last(offset) | Convenience method for subsetting final periods of time series data |
Series.reindex([index, method, level, ...]) | Conform Series to new index with optional filling logic, placing |
Series.reindex_like(other[, method, limit]) | Reindex Series to match index of another Series, optionally with |
Series.rename(mapper[, inplace]) | Alter Series index using dict or function |
Series.reset_index([level, drop, name, inplace]) | Analogous to the DataFrame.reset_index function, see docstring there. |
Series.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
Series.take(indices[, axis]) | Analogous to ndarray.take, return Series corresponding to requested |
Series.tail([n]) | Returns last n rows of Series |
Series.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
Missing data handling¶
Series.dropna() | Return Series without null values |
Series.fillna([value, method, inplace, limit]) | Fill NA/NaN values using the specified method |
Series.interpolate([method]) | Interpolate missing values (after the first valid value) |
Reshaping, sorting¶
Series.argsort([axis, kind, order]) | Overrides ndarray.argsort. |
Series.order([na_last, ascending, kind]) | Sorts Series object, by value, maintaining index-value link |
Series.reorder_levels(order) | Rearrange index levels using input order. |
Series.sort([axis, kind, order]) | Sort values and index labels by value, in place. |
Series.sort_index([ascending]) | Sort object by labels (along an axis) |
Series.sortlevel([level, ascending]) | Sort Series with MultiIndex by chosen level. Data will be |
Series.swaplevel(i, j[, copy]) | Swap levels i and j in a MultiIndex |
Series.unstack([level]) | Unstack, a.k.a. |
Combining / joining / merging¶
Series.append(to_append[, verify_integrity]) | Concatenate two or more Series. The indexes must not overlap |
Series.replace(to_replace[, value, method, ...]) | Replace arbitrary values in a Series |
Series.update(other) | Modify Series in place using non-NA values from passed |
Plotting¶
Series.hist([by, ax, grid, xlabelsize, ...]) | Draw histogram of the input series using matplotlib |
Series.plot(series[, label, kind, ...]) | Plot the input series with the index on the x-axis using matplotlib |
Serialization / IO / Conversion¶
Series.from_csv(path[, sep, parse_dates, ...]) | Read delimited file into Series |
Series.load(path) | |
Series.save(path) | |
Series.to_csv(path[, index, sep, na_rep, ...]) | Write Series to a comma-separated values (csv) file |
Series.to_dict() | Convert Series to {label -> value} dict |
Series.to_sparse([kind, fill_value]) | Convert Series to SparseSeries |
Series.to_string([buf, na_rep, ...]) | Render a string representation of the Series |
DataFrame¶
Attributes and underlying data¶
Axes
- index: row labels
- columns: column labels
DataFrame.as_matrix([columns]) | Convert the frame to its Numpy-array matrix representation. Columns |
DataFrame.dtypes | |
DataFrame.get_dtype_counts() | |
DataFrame.values | Convert the frame to its Numpy-array matrix representation. Columns |
DataFrame.axes | |
DataFrame.ndim | |
DataFrame.shape |
Conversion / Constructors¶
DataFrame.__init__([data, index, columns, ...]) | Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). |
DataFrame.astype(dtype) | Cast object to input numpy.dtype |
DataFrame.convert_objects([convert_dates]) | Attempt to infer better dtype for object columns |
DataFrame.copy([deep]) | Make a copy of this object |
Indexing, iteration¶
DataFrame.head([n]) | Returns first n rows of DataFrame |
DataFrame.ix | |
DataFrame.insert(loc, column, value) | Insert column into DataFrame at specified location. Raises Exception if |
DataFrame.__iter__() | Iterate over columns of the frame. |
DataFrame.iteritems() | Iterator over (column, series) pairs |
DataFrame.iterrows() | Iterate over rows of DataFrame as (index, Series) pairs |
DataFrame.itertuples([index]) | Iterate over rows of DataFrame as tuples, with index value |
DataFrame.lookup(row_labels, col_labels) | Label-based “fancy indexing” function for DataFrame. Given equal-length |
DataFrame.pop(item) | Return column and drop from frame. |
DataFrame.tail([n]) | Returns last n rows of DataFrame |
DataFrame.xs(key[, axis, level, copy]) | Returns a cross-section (row(s) or column(s)) from the DataFrame. |
Binary operator functions¶
DataFrame.add(other[, axis, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data in |
DataFrame.div(other[, axis, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data in |
DataFrame.mul(other[, axis, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data in |
DataFrame.sub(other[, axis, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data in |
DataFrame.radd(other[, axis, level, fill_value]) | Binary operator radd with support to substitute a fill_value for missing data in |
DataFrame.rdiv(other[, axis, level, fill_value]) | Binary operator rdivide with support to substitute a fill_value for missing data in |
DataFrame.rmul(other[, axis, level, fill_value]) | Binary operator rmultiply with support to substitute a fill_value for missing data in |
DataFrame.rsub(other[, axis, level, fill_value]) | Binary operator rsubtract with support to substitute a fill_value for missing data in |
DataFrame.combine(other, func[, fill_value]) | Add two DataFrame objects and do not propagate NaN values, so if for a |
DataFrame.combineAdd(other) | Add two DataFrame objects and do not propagate |
DataFrame.combine_first(other) | Combine two DataFrame objects and default to non-null values in frame |
DataFrame.combineMult(other) | Multiply two DataFrame objects and do not propagate NaN values, so if |
Function application, GroupBy¶
DataFrame.apply(func[, axis, broadcast, ...]) | Applies function along input axis of DataFrame. Objects passed to |
DataFrame.applymap(func) | Apply a function to a DataFrame that is intended to operate |
DataFrame.groupby([by, axis, level, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
DataFrame.abs() | Return an object with absolute value taken. |
DataFrame.any([axis, bool_only, skipna, level]) | Return whether any element is True over requested axis. |
DataFrame.clip([upper, lower]) | Trim values at input threshold(s) |
DataFrame.clip_lower(threshold) | Trim values below threshold |
DataFrame.clip_upper(threshold) | Trim values above threshold |
DataFrame.corr([method, min_periods]) | Compute pairwise correlation of columns, excluding NA/null values |
DataFrame.corrwith(other[, axis, drop]) | Compute pairwise correlation between rows or columns of two DataFrame |
DataFrame.count([axis, level, numeric_only]) | Return Series with number of non-NA/null observations over requested |
DataFrame.cov([min_periods]) | Compute pairwise covariance of columns, excluding NA/null values |
DataFrame.cummax([axis, skipna]) | Return DataFrame of cumulative max over requested axis. |
DataFrame.cummin([axis, skipna]) | Return DataFrame of cumulative min over requested axis. |
DataFrame.cumprod([axis, skipna]) | Return cumulative product over requested axis as DataFrame |
DataFrame.cumsum([axis, skipna]) | Return DataFrame of cumulative sums over requested axis. |
DataFrame.describe([percentile_width]) | Generate various summary statistics of each column, excluding |
DataFrame.diff([periods]) | 1st discrete difference of object |
DataFrame.kurt([axis, skipna, level]) | Return unbiased kurtosis over requested axis. |
DataFrame.mad([axis, skipna, level]) | Return mean absolute deviation over requested axis. |
DataFrame.max([axis, skipna, level]) | Return maximum over requested axis. |
DataFrame.mean([axis, skipna, level]) | Return mean over requested axis. |
DataFrame.median([axis, skipna, level]) | Return median over requested axis. |
DataFrame.min([axis, skipna, level]) | Return minimum over requested axis. |
DataFrame.pct_change([periods, fill_method, ...]) | Percent change over given number of periods |
DataFrame.prod([axis, skipna, level]) | Return product over requested axis. |
DataFrame.quantile([q, axis]) | Return values at the given quantile over requested axis, a la |
DataFrame.rank([axis, numeric_only, method, ...]) | Compute numerical data ranks (1 through n) along axis. |
DataFrame.skew([axis, skipna, level]) | Return unbiased skewness over requested axis. |
DataFrame.sum([axis, numeric_only, skipna, ...]) | Return sum over requested axis. |
DataFrame.std([axis, skipna, level, ddof]) | Return standard deviation over requested axis. |
DataFrame.var([axis, skipna, level, ddof]) | Return variance over requested axis. |
Reindexing / Selection / Label manipulation¶
DataFrame.add_prefix(prefix) | Concatenate prefix string with panel items names. |
DataFrame.add_suffix(suffix) | Concatenate suffix string with panel items names |
DataFrame.align(other[, join, axis, level, ...]) | Align two DataFrame object on their index and columns with the |
DataFrame.drop(labels[, axis, level]) | Return new object with labels in requested axis removed |
DataFrame.drop_duplicates([cols, take_last, ...]) | Return DataFrame with duplicate rows removed, optionally only |
DataFrame.duplicated([cols, take_last]) | Return boolean Series denoting duplicate rows, optionally only |
DataFrame.filter([items, like, regex]) | Restrict frame’s columns to set of items or wildcard |
DataFrame.first(offset) | Convenience method for subsetting initial periods of time series data |
DataFrame.head([n]) | Returns first n rows of DataFrame |
DataFrame.idxmax([axis, skipna]) | Return index of first occurrence of maximum over requested axis. |
DataFrame.idxmin([axis, skipna]) | Return index of first occurrence of minimum over requested axis. |
DataFrame.last(offset) | Convenience method for subsetting final periods of time series data |
DataFrame.reindex([index, columns, method, ...]) | Conform DataFrame to new index with optional filling logic, placing |
DataFrame.reindex_axis(labels[, axis, ...]) | Conform DataFrame to new index with optional filling logic, placing |
DataFrame.reindex_like(other[, method, ...]) | Reindex DataFrame to match indices of another DataFrame, optionally |
DataFrame.rename([index, columns, copy, inplace]) | Alter index and / or columns using input function or functions. |
DataFrame.reset_index([level, drop, ...]) | For DataFrame with multi-level index, return new DataFrame with |
DataFrame.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
DataFrame.set_index(keys[, drop, append, ...]) | Set the DataFrame index (row labels) using one or more existing |
DataFrame.tail([n]) | Returns last n rows of DataFrame |
DataFrame.take(indices[, axis]) | Analogous to ndarray.take, return DataFrame corresponding to requested |
DataFrame.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
Missing data handling¶
DataFrame.dropna([axis, how, thresh, subset]) | Return object with labels on given axis omitted where alternately any |
DataFrame.fillna([value, method, axis, ...]) | Fill NA/NaN values using the specified method |
Reshaping, sorting, transposing¶
DataFrame.delevel(*args, **kwargs) | |
DataFrame.pivot([index, columns, values]) | Reshape data (produce a “pivot” table) based on column values. |
DataFrame.reorder_levels(order[, axis]) | Rearrange index levels using input order. |
DataFrame.sort([columns, column, axis, ...]) | Sort DataFrame either by labels (along either axis) or by the values in |
DataFrame.sort_index([axis, by, ascending, ...]) | Sort DataFrame either by labels (along either axis) or by the values in |
DataFrame.sortlevel([level, axis, ...]) | Sort multilevel index by chosen axis and primary level. |
DataFrame.swaplevel(i, j[, axis]) | Swap levels i and j in a MultiIndex on a particular axis |
DataFrame.stack([level, dropna]) | Pivot a level of the (possibly hierarchical) column labels, returning a |
DataFrame.unstack([level]) | Pivot a level of the (necessarily hierarchical) index labels, returning |
DataFrame.T | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
DataFrame.to_panel() | Transform long (stacked) format (DataFrame) into wide (3D, Panel) |
DataFrame.transpose() | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
Combining / joining / merging¶
DataFrame.append(other[, ignore_index, ...]) | Append columns of other to end of this frame’s columns and index, returning a new object. |
DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns with other DataFrame either on index or on a key |
DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
DataFrame.replace(to_replace[, value, ...]) | Replace values given in ‘to_replace’ with ‘value’ or using ‘method’ |
DataFrame.update(other[, join, overwrite, ...]) | Modify DataFrame in place using non-NA values from passed |
Time series-related¶
DataFrame.asfreq(freq[, method, how, normalize]) | Convert all TimeSeries inside to specified frequency using DateOffset |
DataFrame.shift([periods, freq]) | Shift the index of the DataFrame by desired number of periods with an |
DataFrame.first_valid_index() | Return label for first non-NA/null value |
DataFrame.last_valid_index() | Return label for last non-NA/null value |
DataFrame.resample(rule[, how, axis, ...]) | Convenience method for frequency conversion and resampling of regular time-series data. |
DataFrame.to_period([freq, axis, copy]) | Convert DataFrame from DatetimeIndex to PeriodIndex with desired |
DataFrame.to_timestamp([freq, how, axis, copy]) | Cast to DatetimeIndex of timestamps, at beginning of period |
DataFrame.tz_convert(tz[, axis, copy]) | Convert TimeSeries to target time zone. If it is time zone naive, it |
DataFrame.tz_localize(tz[, axis, copy]) | Localize tz-naive TimeSeries to target time zone |
Plotting¶
DataFrame.boxplot([column, by, ax, ...]) | Make a box plot from DataFrame column/columns optionally grouped |
DataFrame.hist(data[, column, by, grid, ...]) | Draw Histogram the DataFrame’s series using matplotlib / pylab. |
DataFrame.plot([frame, x, y, subplots, ...]) | Make line or bar plot of DataFrame’s series with the index on the x-axis |
Serialization / IO / Conversion¶
DataFrame.from_csv(path[, header, sep, ...]) | Read delimited file into DataFrame |
DataFrame.from_dict(data[, orient, dtype]) | Construct DataFrame from dict of array-like or dicts |
DataFrame.from_items(items[, columns, orient]) | Convert (key, value) pairs to DataFrame. The keys will be the axis |
DataFrame.from_records(data[, index, ...]) | Convert structured or record ndarray to DataFrame |
DataFrame.info([verbose, buf, max_cols]) | Concise summary of a DataFrame, used in __repr__ when very large. |
DataFrame.load(path) | |
DataFrame.save(path) | |
DataFrame.to_csv(path_or_buf[, sep, na_rep, ...]) | Write DataFrame to a comma-separated values (csv) file |
DataFrame.to_dict([outtype]) | Convert DataFrame to dictionary. |
DataFrame.to_excel(excel_writer[, ...]) | Write DataFrame to a excel sheet |
DataFrame.to_html([buf, columns, col_space, ...]) | to_html-specific options |
DataFrame.to_records([index]) | Convert DataFrame to record array. Index will be put in the |
DataFrame.to_sparse([fill_value, kind]) | Convert to SparseDataFrame |
DataFrame.to_string([buf, columns, ...]) | Render a DataFrame to a console-friendly tabular output. |