API Reference¶
General functions¶
Data manipulations¶
| pivot_table(data[, values, rows, cols, ...]) | Create a spreadsheet-style pivot table as a DataFrame. The levels in the |
| merge(left, right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
| concat(objs[, axis, join, join_axes, ...]) | Concatenate pandas objects along a particular axis with optional set logic along the other axes. |
Pickling¶
| load(path) | Load pickled pandas object (or any other pickled object) from the specified |
| save(obj, path) | Pickle (serialize) object to input file path |
File IO¶
| read_table(filepath_or_buffer[, sep, ...]) | Read general delimited file into DataFrame |
| read_csv(filepath_or_buffer[, sep, dialect, ...]) | Read CSV (comma-separated) file into DataFrame |
| ExcelFile.parse(sheetname[, header, ...]) | Read Excel table into DataFrame |
HDFStore: PyTables (HDF5)¶
| HDFStore.put(key, value[, table, append, ...]) | Store object in HDFStore |
| HDFStore.get(key) | Retrieve pandas object stored in file |
Standard moving window functions¶
| rolling_count(arg, window[, freq, time_rule]) | Rolling count of number of non-NaN observations inside provided window. |
| rolling_sum(arg, window[, min_periods, ...]) | Moving sum |
| rolling_mean(arg, window[, min_periods, ...]) | Moving mean |
| rolling_median(arg, window[, min_periods, ...]) | O(N log(window)) implementation using skip list |
| rolling_var(arg, window[, min_periods, ...]) | Unbiased moving variance |
| rolling_std(arg, window[, min_periods, ...]) | Unbiased moving standard deviation |
| rolling_corr(arg1, arg2, window[, ...]) | Moving sample correlation |
| rolling_cov(arg1, arg2, window[, ...]) | Unbiased moving covariance |
| rolling_skew(arg, window[, min_periods, ...]) | Unbiased moving skewness |
| rolling_kurt(arg, window[, min_periods, ...]) | Unbiased moving kurtosis |
| rolling_apply(arg, window, func[, ...]) | Generic moving function application |
| rolling_quantile(arg, window, quantile[, ...]) | Moving quantile |
Standard expanding window functions¶
| expanding_count(arg[, freq, time_rule]) | Expanding count of number of non-NaN observations. |
| expanding_sum(arg[, min_periods, freq, ...]) | Expanding sum |
| expanding_mean(arg[, min_periods, freq, ...]) | Expanding mean |
| expanding_median(arg[, min_periods, freq, ...]) | O(N log(window)) implementation using skip list |
| expanding_var(arg[, min_periods, freq, ...]) | Unbiased expanding variance |
| expanding_std(arg[, min_periods, freq, ...]) | Unbiased expanding standard deviation |
| expanding_corr(arg1, arg2[, min_periods, ...]) | Expanding sample correlation |
| expanding_cov(arg1, arg2[, min_periods, ...]) | Unbiased expanding covariance |
| expanding_skew(arg[, min_periods, freq, ...]) | Unbiased expanding skewness |
| expanding_kurt(arg[, min_periods, freq, ...]) | Unbiased expanding kurtosis |
| expanding_apply(arg, func[, min_periods, ...]) | Generic expanding function application |
| expanding_quantile(arg, quantile[, ...]) | Expanding quantile |
Exponentially-weighted moving window functions¶
| ewma(arg[, com, span, min_periods, freq, ...]) | Exponentially-weighted moving average |
| ewmstd(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving std |
| ewmvar(arg[, com, span, min_periods, bias, ...]) | Exponentially-weighted moving variance |
| ewmcorr(arg1, arg2[, com, span, ...]) | Exponentially-weighted moving correlation |
| ewmcov(arg1, arg2[, com, span, min_periods, ...]) | Exponentially-weighted moving covariance |
Series¶
Attributes and underlying data¶
- Axes
- index: axis labels
| Series.values | Return Series as ndarray |
| Series.dtype | Data-type of the array’s elements. |
| Series.isnull(obj) | Replacement for numpy.isnan / -numpy.isfinite which is suitable for use on object arrays. |
| Series.notnull(obj) | Replacement for numpy.isfinite / -numpy.isnan which is suitable for use on object arrays. |
Conversion / Constructors¶
| Series.__init__([data, index, dtype, name, copy]) | One-dimensional ndarray with axis labels (including time series). |
| Series.astype(dtype) | See numpy.ndarray.astype |
| Series.copy([order]) | Return new Series with copy of underlying values |
Indexing, iteration¶
| Series.get(label[, default]) | Returns value occupying requested label, default to specified missing value if not present. |
| Series.ix | |
| Series.__iter__() | |
| Series.iteritems([index]) | Lazily iterate over (index, value) tuples |
Binary operator functions¶
| Series.add(other[, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data |
| Series.div(other[, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data |
| Series.mul(other[, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data |
| Series.sub(other[, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data |
| Series.combine(other, func[, fill_value]) | Perform elementwise binary operation on two Series using given function |
| Series.combine_first(other) | Combine Series values, choosing the calling Series’s values |
| Series.round([decimals, out]) | Return a with each element rounded to the given number of decimals. |
Function application, GroupBy¶
| Series.apply(func[, convert_dtype, args]) | Invoke function on values of Series. Can be ufunc or Python function |
| Series.map(arg[, na_action]) | Map values of Series using input correspondence (which can be |
| Series.groupby([by, axis, level, as_index, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
| Series.abs() | Return an object with absolute value taken. |
| Series.any([axis, out]) | Returns True if any of the elements of a evaluate to True. |
| Series.autocorr() | Lag-1 autocorrelation |
| Series.between(left, right[, inclusive]) | Return boolean Series equivalent to left <= series <= right. NA values |
| Series.clip([lower, upper, out]) | Trim values at input threshold(s) |
| Series.clip_lower(threshold) | Return copy of series with values below given value truncated |
| Series.clip_upper(threshold) | Return copy of series with values above given value truncated |
| Series.corr(other[, method]) | Compute correlation two Series, excluding missing values |
| Series.count([level]) | Return number of non-NA/null observations in the Series |
| Series.cov(other) | Compute covariance with Series, excluding missing values |
| Series.cummax([axis, dtype, out, skipna]) | Cumulative max of values. |
| Series.cummin([axis, dtype, out, skipna]) | Cumulative min of values. |
| Series.cumprod([axis, dtype, out, skipna]) | Cumulative product of values. |
| Series.cumsum([axis, dtype, out, skipna]) | Cumulative sum of values. |
| Series.describe([percentile_width]) | Generate various summary statistics of Series, excluding NaN |
| Series.diff([periods]) | 1st discrete difference of object |
| Series.kurt([skipna, level]) | Return unbiased kurtosis of values |
| Series.mad([skipna, level]) | Return mean absolute deviation of values |
| Series.max([axis, out, skipna, level]) | Return maximum of values |
| Series.mean([axis, dtype, out, skipna, level]) | Return mean of values |
| Series.median([axis, dtype, out, skipna, level]) | Return median of values |
| Series.min([axis, out, skipna, level]) | Return minimum of values |
| Series.nunique() | Return count of unique elements in the Series |
| Series.pct_change([periods, fill_method, ...]) | Percent change over given number of periods |
| Series.prod([axis, dtype, out, skipna, level]) | Return product of values |
| Series.quantile([q]) | Return value at the given quantile, a la scoreatpercentile in |
| Series.rank([method, na_option, ascending]) | Compute data ranks (1 through n). |
| Series.skew([skipna, level]) | Return unbiased skewness of values |
| Series.std([axis, dtype, out, ddof, skipna, ...]) | Return standard deviation of values |
| Series.sum([axis, dtype, out, skipna, level]) | Return sum of values |
| Series.unique() | Return array of unique values in the Series. Significantly faster than |
| Series.var([axis, dtype, out, ddof, skipna, ...]) | Return variance of values |
| Series.value_counts() | Returns Series containing counts of unique values. The resulting Series |
Reindexing / Selection / Label manipulation¶
| Series.align(other[, join, level, copy, ...]) | Align two Series object with the specified join method |
| Series.drop(labels[, axis, level]) | Return new object with labels in requested axis removed |
| Series.first(offset) | Convenience method for subsetting initial periods of time series data |
| Series.head([n]) | Returns first n rows of Series |
| Series.idxmax([axis, out, skipna]) | Index of first occurrence of maximum of values. |
| Series.idxmin([axis, out, skipna]) | Index of first occurrence of minimum of values. |
| Series.isin(values) | Return boolean vector showing whether each element in the Series is |
| Series.last(offset) | Convenience method for subsetting final periods of time series data |
| Series.reindex([index, method, level, ...]) | Conform Series to new index with optional filling logic, placing |
| Series.reindex_like(other[, method, limit]) | Reindex Series to match index of another Series, optionally with |
| Series.rename(mapper[, inplace]) | Alter Series index using dict or function |
| Series.reset_index([level, drop, name, inplace]) | Analogous to the DataFrame.reset_index function, see docstring there. |
| Series.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
| Series.take(indices[, axis]) | Analogous to ndarray.take, return Series corresponding to requested |
| Series.tail([n]) | Returns last n rows of Series |
| Series.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
Missing data handling¶
| Series.dropna() | Return Series without null values |
| Series.fillna([value, method, inplace, limit]) | Fill NA/NaN values using the specified method |
| Series.interpolate([method]) | Interpolate missing values (after the first valid value) |
Reshaping, sorting¶
| Series.argsort([axis, kind, order]) | Overrides ndarray.argsort. |
| Series.order([na_last, ascending, kind]) | Sorts Series object, by value, maintaining index-value link |
| Series.reorder_levels(order) | Rearrange index levels using input order. |
| Series.sort([axis, kind, order]) | Sort values and index labels by value, in place. |
| Series.sort_index([ascending]) | Sort object by labels (along an axis) |
| Series.sortlevel([level, ascending]) | Sort Series with MultiIndex by chosen level. Data will be |
| Series.swaplevel(i, j[, copy]) | Swap levels i and j in a MultiIndex |
| Series.unstack([level]) | Unstack, a.k.a. |
Combining / joining / merging¶
| Series.append(to_append[, verify_integrity]) | Concatenate two or more Series. The indexes must not overlap |
| Series.replace(to_replace[, value, method, ...]) | Replace arbitrary values in a Series |
| Series.update(other) | Modify Series in place using non-NA values from passed |
Plotting¶
| Series.hist([ax, grid, xlabelsize, xrot, ...]) | Draw histogram of the input series using matplotlib |
| Series.plot(series[, label, kind, ...]) | Plot the input series with the index on the x-axis using matplotlib |
Serialization / IO / Conversion¶
| Series.from_csv(path[, sep, parse_dates, ...]) | Read delimited file into Series |
| Series.load(path) | |
| Series.save(path) | |
| Series.to_csv(path[, index, sep, na_rep, ...]) | Write Series to a comma-separated values (csv) file |
| Series.to_dict() | Convert Series to {label -> value} dict |
| Series.to_sparse([kind, fill_value]) | Convert Series to SparseSeries |
| Series.to_string([buf, na_rep, ...]) | Render a string representation of the Series |
DataFrame¶
Attributes and underlying data¶
Axes
- index: row labels
- columns: column labels
| DataFrame.as_matrix([columns]) | Convert the frame to its Numpy-array matrix representation. Columns |
| DataFrame.dtypes | |
| DataFrame.get_dtype_counts() | |
| DataFrame.values | Convert the frame to its Numpy-array matrix representation. Columns |
| DataFrame.axes | |
| DataFrame.ndim | |
| DataFrame.shape |
Conversion / Constructors¶
| DataFrame.__init__([data, index, columns, ...]) | Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). |
| DataFrame.astype(dtype) | Cast object to input numpy.dtype |
| DataFrame.convert_objects() | Attempt to infer better dtype for object columns |
| DataFrame.copy([deep]) | Make a copy of this object |
Indexing, iteration¶
| DataFrame.head([n]) | Returns first n rows of DataFrame |
| DataFrame.ix | |
| DataFrame.insert(loc, column, value) | Insert column into DataFrame at specified location. Raises Exception if |
| DataFrame.__iter__() | Iterate over columns of the frame. |
| DataFrame.iteritems() | Iterator over (column, series) pairs |
| DataFrame.iterrows() | Iterate over rows of DataFrame as (index, Series) pairs |
| DataFrame.itertuples([index]) | Iterate over rows of DataFrame as tuples, with index value |
| DataFrame.lookup(row_labels, col_labels) | Label-based “fancy indexing” function for DataFrame. Given equal-length |
| DataFrame.pop(item) | Return column and drop from frame. |
| DataFrame.tail([n]) | Returns last n rows of DataFrame |
| DataFrame.xs(key[, axis, level, copy]) | Returns a cross-section (row(s) or column(s)) from the DataFrame. |
Binary operator functions¶
| DataFrame.add(other[, axis, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data in |
| DataFrame.div(other[, axis, level, fill_value]) | Binary operator divide with support to substitute a fill_value for missing data in |
| DataFrame.mul(other[, axis, level, fill_value]) | Binary operator multiply with support to substitute a fill_value for missing data in |
| DataFrame.sub(other[, axis, level, fill_value]) | Binary operator subtract with support to substitute a fill_value for missing data in |
| DataFrame.radd(other[, axis, level, fill_value]) | Binary operator radd with support to substitute a fill_value for missing data in |
| DataFrame.rdiv(other[, axis, level, fill_value]) | Binary operator rdivide with support to substitute a fill_value for missing data in |
| DataFrame.rmul(other[, axis, level, fill_value]) | Binary operator rmultiply with support to substitute a fill_value for missing data in |
| DataFrame.rsub(other[, axis, level, fill_value]) | Binary operator rsubtract with support to substitute a fill_value for missing data in |
| DataFrame.combine(other, func[, fill_value]) | Add two DataFrame objects and do not propagate NaN values, so if for a |
| DataFrame.combineAdd(other) | Add two DataFrame objects and do not propagate |
| DataFrame.combine_first(other) | Combine two DataFrame objects and default to non-null values in frame |
| DataFrame.combineMult(other) | Multiply two DataFrame objects and do not propagate NaN values, so if |
Function application, GroupBy¶
| DataFrame.apply(func[, axis, broadcast, ...]) | Applies function along input axis of DataFrame. Objects passed to |
| DataFrame.applymap(func) | Apply a function to a DataFrame that is intended to operate |
| DataFrame.groupby([by, axis, level, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
| DataFrame.abs() | Return an object with absolute value taken. |
| DataFrame.any([axis, bool_only, skipna, level]) | Return whether any element is True over requested axis. |
| DataFrame.clip([upper, lower]) | Trim values at input threshold(s) |
| DataFrame.clip_lower(threshold) | Trim values below threshold |
| DataFrame.clip_upper(threshold) | Trim values above threshold |
| DataFrame.corr([method]) | Compute pairwise correlation of columns, excluding NA/null values |
| DataFrame.corrwith(other[, axis, drop]) | Compute pairwise correlation between rows or columns of two DataFrame |
| DataFrame.count([axis, level, numeric_only]) | Return Series with number of non-NA/null observations over requested |
| DataFrame.cov() | Compute pairwise covariance of columns, excluding NA/null values |
| DataFrame.cummax([axis, skipna]) | Return DataFrame of cumulative max over requested axis. |
| DataFrame.cummin([axis, skipna]) | Return DataFrame of cumulative min over requested axis. |
| DataFrame.cumprod([axis, skipna]) | Return cumulative product over requested axis as DataFrame |
| DataFrame.cumsum([axis, skipna]) | Return DataFrame of cumulative sums over requested axis. |
| DataFrame.describe([percentile_width]) | Generate various summary statistics of each column, excluding |
| DataFrame.diff([periods]) | 1st discrete difference of object |
| DataFrame.kurt([axis, skipna, level]) | Return unbiased kurtosis over requested axis. |
| DataFrame.mad([axis, skipna, level]) | Return mean absolute deviation over requested axis. |
| DataFrame.max([axis, skipna, level]) | Return maximum over requested axis. |
| DataFrame.mean([axis, skipna, level]) | Return mean over requested axis. |
| DataFrame.median([axis, skipna, level]) | Return median over requested axis. |
| DataFrame.min([axis, skipna, level]) | Return minimum over requested axis. |
| DataFrame.pct_change([periods, fill_method, ...]) | Percent change over given number of periods |
| DataFrame.prod([axis, skipna, level]) | Return product over requested axis. |
| DataFrame.quantile([q, axis]) | Return values at the given quantile over requested axis, a la |
| DataFrame.rank([axis, numeric_only, method, ...]) | Compute numerical data ranks (1 through n) along axis. |
| DataFrame.skew([axis, skipna, level]) | Return unbiased skewness over requested axis. |
| DataFrame.sum([axis, numeric_only, skipna, ...]) | Return sum over requested axis. |
| DataFrame.std([axis, skipna, level, ddof]) | Return standard deviation over requested axis. |
| DataFrame.var([axis, skipna, level, ddof]) | Return variance over requested axis. |
Reindexing / Selection / Label manipulation¶
| DataFrame.add_prefix(prefix) | Concatenate prefix string with panel items names. |
| DataFrame.add_suffix(suffix) | Concatenate suffix string with panel items names |
| DataFrame.align(other[, join, axis, level, ...]) | Align two DataFrame object on their index and columns with the |
| DataFrame.drop(labels[, axis, level]) | Return new object with labels in requested axis removed |
| DataFrame.drop_duplicates([cols, take_last, ...]) | Return DataFrame with duplicate rows removed, optionally only |
| DataFrame.duplicated([cols, take_last]) | Return boolean Series denoting duplicate rows, optionally only |
| DataFrame.filter([items, like, regex]) | Restrict frame’s columns to set of items or wildcard |
| DataFrame.first(offset) | Convenience method for subsetting initial periods of time series data |
| DataFrame.head([n]) | Returns first n rows of DataFrame |
| DataFrame.idxmax([axis, skipna]) | Return index of first occurrence of maximum over requested axis. |
| DataFrame.idxmin([axis, skipna]) | Return index of first occurrence of minimum over requested axis. |
| DataFrame.last(offset) | Convenience method for subsetting final periods of time series data |
| DataFrame.reindex([index, columns, method, ...]) | Conform DataFrame to new index with optional filling logic, placing |
| DataFrame.reindex_axis(labels[, axis, ...]) | Conform DataFrame to new index with optional filling logic, placing |
| DataFrame.reindex_like(other[, method, ...]) | Reindex DataFrame to match indices of another DataFrame, optionally |
| DataFrame.rename([index, columns, copy, inplace]) | Alter index and / or columns using input function or functions. |
| DataFrame.reset_index([level, drop, ...]) | For DataFrame with multi-level index, return new DataFrame with |
| DataFrame.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
| DataFrame.set_index(keys[, drop, append, ...]) | Set the DataFrame index (row labels) using one or more existing |
| DataFrame.tail([n]) | Returns last n rows of DataFrame |
| DataFrame.take(indices[, axis]) | Analogous to ndarray.take, return DataFrame corresponding to requested |
| DataFrame.truncate([before, after, copy]) | Function truncate a sorted DataFrame / Series before and/or after |
Missing data handling¶
| DataFrame.dropna([axis, how, thresh, subset]) | Return object with labels on given axis omitted where alternately any |
| DataFrame.fillna([value, method, axis, ...]) | Fill NA/NaN values using the specified method |
Reshaping, sorting, transposing¶
| DataFrame.delevel(*args, **kwargs) | |
| DataFrame.pivot([index, columns, values]) | Reshape data (produce a “pivot” table) based on column values. |
| DataFrame.reorder_levels(order[, axis]) | Rearrange index levels using input order. |
| DataFrame.sort([columns, column, axis, ...]) | Sort DataFrame either by labels (along either axis) or by the values in |
| DataFrame.sort_index([axis, by, ascending, ...]) | Sort DataFrame either by labels (along either axis) or by the values in |
| DataFrame.sortlevel([level, axis, ascending]) | Sort multilevel index by chosen axis and primary level. |
| DataFrame.swaplevel(i, j[, axis]) | Swap levels i and j in a MultiIndex on a particular axis |
| DataFrame.stack([level, dropna]) | Pivot a level of the (possibly hierarchical) column labels, returning a |
| DataFrame.unstack([level]) | Pivot a level of the (necessarily hierarchical) index labels, returning |
| DataFrame.T | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
| DataFrame.to_panel() | Transform long (stacked) format (DataFrame) into wide (3D, Panel) |
| DataFrame.transpose() | Returns a DataFrame with the rows/columns switched. If the DataFrame is |
Combining / joining / merging¶
| DataFrame.append(other[, ignore_index, ...]) | Append columns of other to end of this frame’s columns and index, returning a new object. |
| DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns with other DataFrame either on index or on a key |
| DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
| DataFrame.replace(to_replace[, value, ...]) | Replace values given in ‘to_replace’ with ‘value’ or using ‘method’ |
| DataFrame.update(other[, join, overwrite, ...]) | Modify DataFrame in place using non-NA values from passed |
Time series-related¶
| DataFrame.asfreq(freq[, method, how]) | Convert all TimeSeries inside to specified frequency using DateOffset |
| DataFrame.shift([periods, freq]) | Shift the index of the DataFrame by desired number of periods with an |
| DataFrame.first_valid_index() | Return label for first non-NA/null value |
| DataFrame.last_valid_index() | Return label for last non-NA/null value |
| DataFrame.resample(rule[, how, axis, ...]) | Convenience method for frequency conversion and resampling of regular time-series data. |
| DataFrame.to_period([freq, axis, copy]) | Convert DataFrame from DatetimeIndex to PeriodIndex with desired |
| DataFrame.to_timestamp([freq, how, axis, copy]) | Cast to DatetimeIndex of timestamps, at beginning of period |
| DataFrame.tz_convert(tz[, axis, copy]) | Convert TimeSeries to target time zone. If it is time zone naive, it |
| DataFrame.tz_localize(tz[, axis, copy]) | Localize tz-naive TimeSeries to target time zone |
Plotting¶
| DataFrame.boxplot([column, by, ax, ...]) | Make a box plot from DataFrame column/columns optionally grouped |
| DataFrame.hist(data[, grid, xlabelsize, ...]) | Draw Histogram the DataFrame’s series using matplotlib / pylab. |
| DataFrame.plot([frame, x, y, subplots, ...]) | Make line or bar plot of DataFrame’s series with the index on the x-axis |
Serialization / IO / Conversion¶
| DataFrame.from_csv(path[, header, sep, ...]) | Read delimited file into DataFrame |
| DataFrame.from_dict(data[, orient, dtype]) | Construct DataFrame from dict of array-like or dicts |
| DataFrame.from_items(items[, columns, orient]) | Convert (key, value) pairs to DataFrame. The keys will be the axis |
| DataFrame.from_records(data[, index, ...]) | Convert structured or record ndarray to DataFrame |
| DataFrame.info([verbose, buf]) | Concise summary of a DataFrame, used in __repr__ when very large. |
| DataFrame.load(path) | |
| DataFrame.save(path) | |
| DataFrame.to_csv(path_or_buf[, sep, na_rep, ...]) | Write DataFrame to a comma-separated values (csv) file |
| DataFrame.to_dict([outtype]) | Convert DataFrame to dictionary. |
| DataFrame.to_excel(excel_writer[, ...]) | Write DataFrame to a excel sheet |
| DataFrame.to_html([buf, columns, col_space, ...]) | to_html-specific options |
| DataFrame.to_records([index]) | Convert DataFrame to record array. Index will be put in the |
| DataFrame.to_sparse([fill_value, kind]) | Convert to SparseDataFrame |
| DataFrame.to_string([buf, columns, ...]) | Render a DataFrame to a console-friendly tabular output. |