API Reference¶
Input/Output¶
Pickling¶
read_pickle(path) | Load pickled pandas object (or any other pickled object) from the specified |
Flat File¶
read_table(filepath_or_buffer[, sep, ...]) | Read general delimited file into DataFrame |
read_csv(filepath_or_buffer[, sep, dialect, ...]) | Read CSV (comma-separated) file into DataFrame |
read_fwf(filepath_or_buffer[, colspecs, widths]) | Read a table of fixed-width formatted lines into DataFrame |
Clipboard¶
read_clipboard(**kwargs) | Read text from clipboard and pass to read_table. |
Excel¶
read_excel(io, sheetname, **kwds) | Read an Excel table into a pandas DataFrame |
ExcelFile.parse(sheetname[, header, ...]) | Read an Excel table into DataFrame |
HTML¶
read_html(io[, match, flavor, header, ...]) | Read HTML tables into a list of DataFrame objects. |
HDFStore: PyTables (HDF5)¶
read_hdf(path_or_buf, key, **kwargs) | read from the store, close it if we opened it |
HDFStore.put(key, value[, format, append]) | Store object in HDFStore |
HDFStore.append(key, value[, format, ...]) | Append to Table in file. Node must already exist and be Table |
HDFStore.get(key) | Retrieve pandas object stored in file |
HDFStore.select(key[, where, start, stop, ...]) | Retrieve pandas object stored in file, optionally based on where |
SQL¶
read_sql(sql, con[, index_col, ...]) | Returns a DataFrame corresponding to the result set of the query |
read_frame(sql, con[, index_col, ...]) | Returns a DataFrame corresponding to the result set of the query |
write_frame(frame, name, con[, flavor, ...]) | Write records stored in a DataFrame to a SQL database. |
Google BigQuery¶
read_gbq(query[, project_id, ...]) | Load data from Google BigQuery. |
to_gbq(dataframe, destination_table[, ...]) | Write a DataFrame to a Google BigQuery table. |
STATA¶
read_stata(filepath_or_buffer[, ...]) | Read Stata file into DataFrame |
StataReader.data([convert_dates, ...]) | Reads observations from Stata file, converting them into a dataframe |
StataReader.data_label() | Returns data label of Stata file |
StataReader.value_labels() | Returns a dict, associating each variable name a dict, associating |
StataReader.variable_labels() | Returns variable labels as a dict, associating each variable name |
StataWriter.write_file() |
General functions¶
Data manipulations¶
melt(frame[, id_vars, value_vars, var_name, ...]) | “Unpivots” a DataFrame from wide format to long format, optionally leaving |
pivot_table(data[, values, rows, cols, ...]) | Create a spreadsheet-style pivot table as a DataFrame. The levels in the |
crosstab(rows, cols[, values, rownames, ...]) | Compute a simple cross-tabulation of two (or more) factors. |
cut(x, bins[, right, labels, retbins, ...]) | Return indices of half-open bins to which each value of x belongs. |
qcut(x, q[, labels, retbins, precision]) | Quantile-based discretization function. |
merge(left, right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
concat(objs[, axis, join, join_axes, ...]) | Concatenate pandas objects along a particular axis with optional set logic along the other axes. |
get_dummies(data[, prefix, prefix_sep, dummy_na]) | Convert categorical variable into dummy/indicator variables |
Top-level missing data¶
isnull(obj) | Detect missing values (NaN in numeric arrays, None/NaN in object arrays) |
notnull(obj) | Replacement for numpy.isfinite / -numpy.isnan which is suitable for use on object arrays. |
Top-level dealing with datetimes¶
to_datetime(arg[, errors, dayfirst, utc, ...]) | Convert argument to datetime |
to_timedelta(arg[, box, unit]) | Convert argument to timedelta |
date_range([start, end, periods, freq, tz, ...]) | Return a fixed frequency datetime index, with day (calendar) as the default |
bdate_range([start, end, periods, freq, tz, ...]) | Return a fixed frequency datetime index, with business day as the default |
period_range([start, end, periods, freq, name]) | Return a fixed frequency datetime index, with day (calendar) as the default |
Top-level evaluation¶
eval(expr[, parser, engine, truediv, ...]) | Evaluate a Python expression as a string using various backends. |
Standard moving window functions¶
rolling_count(arg, window[, freq, center, ...]) | Rolling count of number of non-NaN observations inside provided window. |
rolling_sum(arg, window[, min_periods, ...]) | Moving sum |
rolling_mean(arg, window[, min_periods, ...]) | Moving mean |
rolling_median(arg, window[, min_periods, ...]) | O(N log(window)) implementation using skip list |
rolling_var(arg, window[, min_periods, ...]) | Unbiased moving variance |
rolling_std(arg, window[, min_periods, ...]) | Unbiased moving standard deviation |
rolling_min(arg, window[, min_periods, ...]) | Moving min of 1d array of dtype=float64 along axis=0 ignoring NaNs. |
rolling_max(arg, window[, min_periods, ...]) | Moving max of 1d array of dtype=float64 along axis=0 ignoring NaNs. |
rolling_corr(arg1, arg2, window[, ...]) | Moving sample correlation |
rolling_corr_pairwise(df, window[, min_periods]) | Computes pairwise rolling correlation matrices as Panel whose items are |
rolling_cov(arg1, arg2, window[, ...]) | Unbiased moving covariance |
rolling_skew(arg, window[, min_periods, ...]) | Unbiased moving skewness |
rolling_kurt(arg, window[, min_periods, ...]) | Unbiased moving kurtosis |
rolling_apply(arg, window, func[, ...]) | Generic moving function application |
rolling_quantile(arg, window, quantile[, ...]) | Moving quantile |
rolling_window(arg[, window, win_type, ...]) | Applies a moving window of type window_type and size window on the data. |
Standard expanding window functions¶
expanding_count(arg[, freq, center, time_rule]) | Expanding count of number of non-NaN observations. |
expanding_sum(arg[, min_periods, freq, ...]) | Expanding sum |
expanding_mean(arg[, min_periods, freq, ...]) | Expanding mean |
expanding_median(arg[, min_periods, freq, ...]) | O(N log(window)) implementation using skip list |
expanding_var(arg[, min_periods, freq, ...]) | Unbiased expanding variance |
expanding_std(arg[, min_periods, freq, ...]) | Unbiased expanding standard deviation |
expanding_min(arg[, min_periods, freq, ...]) | Moving min of 1d array of dtype=float64 along axis=0 ignoring NaNs. |
expanding_max(arg[, min_periods, freq, ...]) | Moving max of 1d array of dtype=float64 along axis=0 ignoring NaNs. |
expanding_corr(arg1, arg2[, min_periods, ...]) | Expanding sample correlation |
expanding_corr_pairwise(df[, min_periods]) | Computes pairwise expanding correlation matrices as Panel whose items are |
expanding_cov(arg1, arg2[, min_periods, ...]) | Unbiased expanding covariance |
expanding_skew(arg[, min_periods, freq, ...]) | Unbiased expanding skewness |
expanding_kurt(arg[, min_periods, freq, ...]) | Unbiased expanding kurtosis |
expanding_apply(arg, func[, min_periods, ...]) | Generic expanding function application |
expanding_quantile(arg, quantile[, ...]) | Expanding quantile |
Exponentially-weighted moving window functions¶
ewma(arg[, com, span, halflife, ...]) | Exponentially-weighted moving average |
ewmstd(arg[, com, span, halflife, ...]) | Exponentially-weighted moving std |
ewmvar(arg[, com, span, halflife, ...]) | Exponentially-weighted moving variance |
ewmcorr(arg1, arg2[, com, span, halflife, ...]) | Exponentially-weighted moving correlation |
ewmcov(arg1, arg2[, com, span, halflife, ...]) | Exponentially-weighted moving covariance |
Series¶
Constructor¶
Series([data, index, dtype, name, copy, ...]) | One-dimensional ndarray with axis labels (including time series). |
Attributes and underlying data¶
- Axes
- index: axis labels
Series.values | Return Series as ndarray |
Series.dtype | |
Series.isnull() | Return a boolean same-sized object indicating if the values are null |
Series.notnull() | Return a boolean same-sized object indicating if the values are |
Conversion¶
Series.astype(dtype[, copy, raise_on_error]) | Cast object to input numpy.dtype |
Series.copy([deep]) | Make a copy of this object |
Series.isnull() | Return a boolean same-sized object indicating if the values are null |
Series.notnull() | Return a boolean same-sized object indicating if the values are |
Indexing, iteration¶
Series.get(label[, default]) | Returns value occupying requested label, default to specified missing value if not present. |
Series.at | |
Series.iat | |
Series.ix | |
Series.loc | |
Series.iloc | |
Series.__iter__() | |
Series.iteritems() | Lazily iterate over (index, value) tuples |
For more information on .at, .iat, .ix, .loc, and .iloc, see the indexing documentation.
Binary operator functions¶
Series.add(other[, level, fill_value, axis]) | Binary operator add with support to substitute a fill_value for missing data |
Series.sub(other[, level, fill_value, axis]) | Binary operator sub with support to substitute a fill_value for missing data |
Series.mul(other[, level, fill_value, axis]) | Binary operator mul with support to substitute a fill_value for missing data |
Series.div(other[, level, fill_value, axis]) | Binary operator truediv with support to substitute a fill_value for missing data |
Series.truediv(other[, level, fill_value, axis]) | Binary operator truediv with support to substitute a fill_value for missing data |
Series.floordiv(other[, level, fill_value, axis]) | Binary operator floordiv with support to substitute a fill_value for missing data |
Series.mod(other[, level, fill_value, axis]) | Binary operator mod with support to substitute a fill_value for missing data |
Series.pow(other[, level, fill_value, axis]) | Binary operator pow with support to substitute a fill_value for missing data |
Series.radd(other[, level, fill_value, axis]) | Binary operator radd with support to substitute a fill_value for missing data |
Series.rsub(other[, level, fill_value, axis]) | Binary operator rsub with support to substitute a fill_value for missing data |
Series.rmul(other[, level, fill_value, axis]) | Binary operator rmul with support to substitute a fill_value for missing data |
Series.rdiv(other[, level, fill_value, axis]) | Binary operator rtruediv with support to substitute a fill_value for missing data |
Series.rtruediv(other[, level, fill_value, axis]) | Binary operator rtruediv with support to substitute a fill_value for missing data |
Series.rfloordiv(other[, level, fill_value, ...]) | Binary operator rfloordiv with support to substitute a fill_value for missing data |
Series.rmod(other[, level, fill_value, axis]) | Binary operator rmod with support to substitute a fill_value for missing data |
Series.rpow(other[, level, fill_value, axis]) | Binary operator rpow with support to substitute a fill_value for missing data |
Series.combine(other, func[, fill_value]) | Perform elementwise binary operation on two Series using given function |
Series.combine_first(other) | Combine Series values, choosing the calling Series’s values |
Series.round([decimals, out]) | Return a with each element rounded to the given number of decimals. |
Series.lt(other) | |
Series.gt(other) | |
Series.le(other) | |
Series.ge(other) | |
Series.ne(other) | |
Series.eq(other) |
Function application, GroupBy¶
Series.apply(func[, convert_dtype, args]) | Invoke function on values of Series. Can be ufunc (a NumPy function |
Series.map(arg[, na_action]) | Map values of Series using input correspondence (which can be |
Series.groupby([by, axis, level, as_index, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
Series.abs() | Return an object with absolute value taken. |
Series.any([axis, out]) | Returns True if any of the elements of a evaluate to True. |
Series.autocorr() | Lag-1 autocorrelation |
Series.between(left, right[, inclusive]) | Return boolean Series equivalent to left <= series <= right. NA values |
Series.clip([lower, upper, out]) | Trim values at input threshold(s) |
Series.clip_lower(threshold) | Return copy of the input with values below given value truncated |
Series.clip_upper(threshold) | Return copy of input with values above given value truncated |
Series.corr(other[, method, min_periods]) | Compute correlation with other Series, excluding missing values |
Series.count([level]) | Return number of non-NA/null observations in the Series |
Series.cov(other[, min_periods]) | Compute covariance with Series, excluding missing values |
Series.cummax([axis, dtype, out, skipna]) | Return cumulative max over requested axis. |
Series.cummin([axis, dtype, out, skipna]) | Return cumulative min over requested axis. |
Series.cumprod([axis, dtype, out, skipna]) | Return cumulative prod over requested axis. |
Series.cumsum([axis, dtype, out, skipna]) | Return cumulative sum over requested axis. |
Series.describe([percentile_width]) | Generate various summary statistics of Series, excluding NaN |
Series.diff([periods]) | 1st discrete difference of object |
Series.kurt([axis, skipna, level, numeric_only]) | Return unbiased kurtosis over requested axis |
Series.mad([axis, skipna, level]) | Return the mean absolute deviation of the values for the requested axis |
Series.max([axis, skipna, level, numeric_only]) | This method returns the maximum of the values in the object. |
Series.mean([axis, skipna, level, numeric_only]) | Return the mean of the values for the requested axis |
Series.median([axis, skipna, level, ...]) | Return the median of the values for the requested axis |
Series.min([axis, skipna, level, numeric_only]) | This method returns the minimum of the values in the object. |
Series.mode() | Returns the mode(s) of the dataset. |
Series.nunique() | Return count of unique elements in the Series |
Series.pct_change([periods, fill_method, ...]) | Percent change over given number of periods |
Series.prod([axis, skipna, level, numeric_only]) | Return the product of the values for the requested axis |
Series.quantile([q]) | Return value at the given quantile, a la scoreatpercentile in |
Series.rank([method, na_option, ascending]) | Compute data ranks (1 through n). |
Series.skew([axis, skipna, level, numeric_only]) | Return unbiased skew over requested axis |
Series.std([axis, skipna, level, ddof]) | Return unbiased standard deviation over requested axis |
Series.sum([axis, skipna, level, numeric_only]) | Return the sum of the values for the requested axis |
Series.unique() | Return array of unique values in the Series. Significantly faster than |
Series.var([axis, skipna, level, ddof]) | Return unbiased variance over requested axis |
Series.value_counts([normalize, sort, ...]) | Returns Series containing counts of unique values. The resulting Series |
Reindexing / Selection / Label manipulation¶
Series.align(other[, join, axis, level, ...]) | Align two object on their axes with the |
Series.drop(labels[, axis, level, inplace]) | Return new object with labels in requested axis removed |
Series.first(offset) | Convenience method for subsetting initial periods of time series data |
Series.head([n]) | Returns first n rows |
Series.idxmax([axis, out, skipna]) | Index of first occurrence of maximum of values. |
Series.idxmin([axis, out, skipna]) | Index of first occurrence of minimum of values. |
Series.isin(values) | Return a boolean Series showing whether each element |
Series.last(offset) | Convenience method for subsetting final periods of time series data |
Series.reindex([index]) | Conform Series to new index with optional filling logic, placing |
Series.reindex_like(other[, method, copy, limit]) | return an object with matching indicies to myself |
Series.rename([index]) | Alter axes input function or functions. |
Series.reset_index([level, drop, name, inplace]) | Analogous to the DataFrame.reset_index function, see docstring there. |
Series.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
Series.take(indices[, axis, convert]) | Analogous to ndarray.take, return Series corresponding to requested |
Series.tail([n]) | Returns last n rows |
Series.truncate([before, after, axis, copy]) | Truncates a sorted NDFrame before and/or after some particular |
Missing data handling¶
Series.dropna([axis, inplace]) | Return Series without null values |
Series.fillna([value, method, axis, ...]) | Fill NA/NaN values using the specified method |
Series.interpolate([method, axis, limit, ...]) | Interpolate values according to different methods. |
Reshaping, sorting¶
Series.argsort([axis, kind, order]) | Overrides ndarray.argsort. |
Series.order([na_last, ascending, kind]) | Sorts Series object, by value, maintaining index-value link |
Series.reorder_levels(order) | Rearrange index levels using input order. |
Series.sort([axis, kind, order, ascending]) | Sort values and index labels by value, in place. |
Series.sort_index([ascending]) | Sort object by labels (along an axis) |
Series.sortlevel([level, ascending]) | Sort Series with MultiIndex by chosen level. Data will be |
Series.swaplevel(i, j[, copy]) | Swap levels i and j in a MultiIndex |
Series.unstack([level]) | Unstack, a.k.a. |
Combining / joining / merging¶
Series.append(to_append[, verify_integrity]) | Concatenate two or more Series. The indexes must not overlap |
Series.replace([to_replace, value, inplace, ...]) | Replace values given in ‘to_replace’ with ‘value’. |
Series.update(other) | Modify Series in place using non-NA values from passed |
String handling¶
Series.str can be used to access the values of the series as strings and apply several methods to it. Due to implementation details the methods show up here as methods of the StringMethods class.
StringMethods.cat([others, sep, na_rep]) | Concatenate arrays of strings with given separator |
StringMethods.center(width) | “Center” strings, filling left and right side with additional whitespace |
StringMethods.contains(pat[, case, flags, na]) | Check whether given pattern is contained in each string in the array |
StringMethods.count(pat[, flags]) | Count occurrences of pattern in each string |
StringMethods.decode(encoding[, errors]) | Decode character string to unicode using indicated encoding |
StringMethods.encode(encoding[, errors]) | Encode character string to some other encoding using indicated encoding |
StringMethods.endswith(pat[, na]) | Return boolean array indicating whether each string ends with passed |
StringMethods.extract(pat[, flags]) | Find groups in each string using passed regular expression |
StringMethods.findall(pat[, flags]) | Find all occurrences of pattern or regular expression |
StringMethods.get(i) | Extract element from lists, tuples, or strings in each element in the array |
StringMethods.join(sep) | Join lists contained as elements in array, a la str.join |
StringMethods.len() | Compute length of each string in array. |
StringMethods.lower() | Convert strings in array to lowercase |
StringMethods.lstrip([to_strip]) | Strip whitespace (including newlines) from left side of each string in the |
StringMethods.match(pat[, flags]) | Deprecated: Find groups in each string using passed regular expression. |
StringMethods.pad(width[, side]) | Pad strings with whitespace |
StringMethods.repeat(repeats) | Duplicate each string in the array by indicated number of times |
StringMethods.replace(pat, repl[, n, case, ...]) | Replace |
StringMethods.rstrip([to_strip]) | Strip whitespace (including newlines) from right side of each string in the |
StringMethods.slice([start, stop, step]) | Slice substrings from each element in array |
StringMethods.slice_replace([i, j]) | Slice substrings from each element in array |
StringMethods.split([pat, n]) | Split each string (a la re.split) in array by given pattern, propagating NA |
StringMethods.startswith(pat[, na]) | Return boolean array indicating whether each string starts with passed |
StringMethods.strip([to_strip]) | Strip whitespace (including newlines) from each string in the array |
StringMethods.title() | Convert strings to titlecased version |
StringMethods.upper() | Convert strings in array to uppercase |
Plotting¶
Series.hist([by, ax, grid, xlabelsize, ...]) | Draw histogram of the input series using matplotlib |
Series.plot(series[, label, kind, ...]) | Plot the input series with the index on the x-axis using matplotlib |
Serialization / IO / Conversion¶
Series.from_csv(path[, sep, parse_dates, ...]) | Read delimited file into Series |
Series.to_pickle(path) | Pickle (serialize) object to input file path |
Series.to_csv(path[, index, sep, na_rep, ...]) | Write Series to a comma-separated values (csv) file |
Series.to_dict() | Convert Series to {label -> value} dict |
Series.to_frame([name]) | Convert Series to DataFrame |
Series.to_hdf(path_or_buf, key, **kwargs) | activate the HDFStore |
Series.to_json([path_or_buf, orient, ...]) | Convert the object to a JSON string. |
Series.to_sparse([kind, fill_value]) | Convert Series to SparseSeries |
Series.to_string([buf, na_rep, ...]) | Render a string representation of the Series |
Series.to_clipboard([excel, sep]) | Attempt to write text representation of object to the system clipboard |
DataFrame¶
Constructor¶
DataFrame([data, index, columns, dtype, copy]) | Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). |
Attributes and underlying data¶
Axes
- index: row labels
- columns: column labels
DataFrame.as_matrix([columns]) | Convert the frame to its Numpy-array matrix representation. Columns |
DataFrame.dtypes | |
DataFrame.get_dtype_counts() | return the counts of dtypes in this frame |
DataFrame.values | Numpy representation of NDFrame |
DataFrame.axes | |
DataFrame.ndim | Number of axes / array dimensions |
DataFrame.shape |
Conversion¶
DataFrame.astype(dtype[, copy, raise_on_error]) | Cast object to input numpy.dtype |
DataFrame.convert_objects([convert_dates, ...]) | Attempt to infer better dtype for object columns |
DataFrame.copy([deep]) | Make a copy of this object |
DataFrame.isnull() | Return a boolean same-sized object indicating if the values are null |
DataFrame.notnull() | Return a boolean same-sized object indicating if the values are |
Indexing, iteration¶
DataFrame.head([n]) | Returns first n rows |
DataFrame.at | |
DataFrame.iat | |
DataFrame.ix | |
DataFrame.loc | |
DataFrame.iloc | |
DataFrame.insert(loc, column, value[, ...]) | Insert column into DataFrame at specified location. |
DataFrame.__iter__() | Iterate over infor axis |
DataFrame.iteritems() | Iterator over (column, series) pairs |
DataFrame.iterrows() | Iterate over rows of DataFrame as (index, Series) pairs. |
DataFrame.itertuples([index]) | Iterate over rows of DataFrame as tuples, with index value |
DataFrame.lookup(row_labels, col_labels) | Label-based “fancy indexing” function for DataFrame. |
DataFrame.pop(item) | Return item and drop from frame. |
DataFrame.tail([n]) | Returns last n rows |
DataFrame.xs(key[, axis, level, copy, ...]) | Returns a cross-section (row(s) or column(s)) from the DataFrame. |
DataFrame.isin(values) | Return boolean DataFrame showing whether each element in the |
DataFrame.query(expr, **kwargs) | Query the columns of a frame with a boolean expression. |
For more information on .at, .iat, .ix, .loc, and .iloc, see the indexing documentation.
Binary operator functions¶
DataFrame.add(other[, axis, level, fill_value]) | Binary operator add with support to substitute a fill_value for missing data in |
DataFrame.sub(other[, axis, level, fill_value]) | Binary operator sub with support to substitute a fill_value for missing data in |
DataFrame.mul(other[, axis, level, fill_value]) | Binary operator mul with support to substitute a fill_value for missing data in |
DataFrame.div(other[, axis, level, fill_value]) | Binary operator truediv with support to substitute a fill_value for missing data in |
DataFrame.truediv(other[, axis, level, ...]) | Binary operator truediv with support to substitute a fill_value for missing data in |
DataFrame.floordiv(other[, axis, level, ...]) | Binary operator floordiv with support to substitute a fill_value for missing data in |
DataFrame.mod(other[, axis, level, fill_value]) | Binary operator mod with support to substitute a fill_value for missing data in |
DataFrame.pow(other[, axis, level, fill_value]) | Binary operator pow with support to substitute a fill_value for missing data in |
DataFrame.radd(other[, axis, level, fill_value]) | Binary operator radd with support to substitute a fill_value for missing data in |
DataFrame.rsub(other[, axis, level, fill_value]) | Binary operator rsub with support to substitute a fill_value for missing data in |
DataFrame.rmul(other[, axis, level, fill_value]) | Binary operator rmul with support to substitute a fill_value for missing data in |
DataFrame.rdiv(other[, axis, level, fill_value]) | Binary operator rtruediv with support to substitute a fill_value for missing data in |
DataFrame.rtruediv(other[, axis, level, ...]) | Binary operator rtruediv with support to substitute a fill_value for missing data in |
DataFrame.rfloordiv(other[, axis, level, ...]) | Binary operator rfloordiv with support to substitute a fill_value for missing data in |
DataFrame.rmod(other[, axis, level, fill_value]) | Binary operator rmod with support to substitute a fill_value for missing data in |
DataFrame.rpow(other[, axis, level, fill_value]) | Binary operator rpow with support to substitute a fill_value for missing data in |
DataFrame.lt(other[, axis, level]) | Wrapper for flexible comparison methods lt |
DataFrame.gt(other[, axis, level]) | Wrapper for flexible comparison methods gt |
DataFrame.le(other[, axis, level]) | Wrapper for flexible comparison methods le |
DataFrame.ge(other[, axis, level]) | Wrapper for flexible comparison methods ge |
DataFrame.ne(other[, axis, level]) | Wrapper for flexible comparison methods ne |
DataFrame.eq(other[, axis, level]) | Wrapper for flexible comparison methods eq |
DataFrame.combine(other, func[, fill_value, ...]) | Add two DataFrame objects and do not propagate NaN values, so if for a |
DataFrame.combineAdd(other) | Add two DataFrame objects and do not propagate |
DataFrame.combine_first(other) | Combine two DataFrame objects and default to non-null values in frame |
DataFrame.combineMult(other) | Multiply two DataFrame objects and do not propagate NaN values, so if |
Function application, GroupBy¶
DataFrame.apply(func[, axis, broadcast, ...]) | Applies function along input axis of DataFrame. |
DataFrame.applymap(func) | Apply a function to a DataFrame that is intended to operate |
DataFrame.groupby([by, axis, level, ...]) | Group series using mapper (dict or key function, apply given function |
Computations / Descriptive Stats¶
DataFrame.abs() | Return an object with absolute value taken. |
DataFrame.any([axis, bool_only, skipna, level]) | Return whether any element is True over requested axis. |
DataFrame.clip([lower, upper, out]) | Trim values at input threshold(s) |
DataFrame.clip_lower(threshold) | Return copy of the input with values below given value truncated |
DataFrame.clip_upper(threshold) | Return copy of input with values above given value truncated |
DataFrame.corr([method, min_periods]) | Compute pairwise correlation of columns, excluding NA/null values |
DataFrame.corrwith(other[, axis, drop]) | Compute pairwise correlation between rows or columns of two DataFrame |
DataFrame.count([axis, level, numeric_only]) | Return Series with number of non-NA/null observations over requested |
DataFrame.cov([min_periods]) | Compute pairwise covariance of columns, excluding NA/null values |
DataFrame.cummax([axis, dtype, out, skipna]) | Return cumulative max over requested axis. |
DataFrame.cummin([axis, dtype, out, skipna]) | Return cumulative min over requested axis. |
DataFrame.cumprod([axis, dtype, out, skipna]) | Return cumulative prod over requested axis. |
DataFrame.cumsum([axis, dtype, out, skipna]) | Return cumulative sum over requested axis. |
DataFrame.describe([percentile_width]) | Generate various summary statistics of each column, excluding |
DataFrame.diff([periods]) | 1st discrete difference of object |
DataFrame.eval(expr, **kwargs) | Evaluate an expression in the context of the calling DataFrame |
DataFrame.kurt([axis, skipna, level, ...]) | Return unbiased kurtosis over requested axis |
DataFrame.mad([axis, skipna, level]) | Return the mean absolute deviation of the values for the requested axis |
DataFrame.max([axis, skipna, level, ...]) | This method returns the maximum of the values in the object. |
DataFrame.mean([axis, skipna, level, ...]) | Return the mean of the values for the requested axis |
DataFrame.median([axis, skipna, level, ...]) | Return the median of the values for the requested axis |
DataFrame.min([axis, skipna, level, ...]) | This method returns the minimum of the values in the object. |
DataFrame.mode([axis, numeric_only]) | Gets the mode of each element along the axis selected. |
DataFrame.pct_change([periods, fill_method, ...]) | Percent change over given number of periods |
DataFrame.prod([axis, skipna, level, ...]) | Return the product of the values for the requested axis |
DataFrame.quantile([q, axis, numeric_only]) | Return values at the given quantile over requested axis, a la |
DataFrame.rank([axis, numeric_only, method, ...]) | Compute numerical data ranks (1 through n) along axis. |
DataFrame.skew([axis, skipna, level, ...]) | Return unbiased skew over requested axis |
DataFrame.sum([axis, skipna, level, ...]) | Return the sum of the values for the requested axis |
DataFrame.std([axis, skipna, level, ddof]) | Return unbiased standard deviation over requested axis |
DataFrame.var([axis, skipna, level, ddof]) | Return unbiased variance over requested axis |
Reindexing / Selection / Label manipulation¶
DataFrame.add_prefix(prefix) | Concatenate prefix string with panel items names. |
DataFrame.add_suffix(suffix) | Concatenate suffix string with panel items names |
DataFrame.align(other[, join, axis, level, ...]) | Align two object on their axes with the |
DataFrame.drop(labels[, axis, level, inplace]) | Return new object with labels in requested axis removed |
DataFrame.drop_duplicates([cols, take_last, ...]) | Return DataFrame with duplicate rows removed, optionally only |
DataFrame.duplicated([cols, take_last]) | Return boolean Series denoting duplicate rows, optionally only |
DataFrame.filter([items, like, regex, axis]) | Restrict the info axis to set of items or wildcard |
DataFrame.first(offset) | Convenience method for subsetting initial periods of time series data |
DataFrame.head([n]) | Returns first n rows |
DataFrame.idxmax([axis, skipna]) | Return index of first occurrence of maximum over requested axis. |
DataFrame.idxmin([axis, skipna]) | Return index of first occurrence of minimum over requested axis. |
DataFrame.last(offset) | Convenience method for subsetting final periods of time series data |
DataFrame.reindex([index, columns]) | Conform DataFrame to new index with optional filling logic, placing |
DataFrame.reindex_axis(labels[, axis, ...]) | Conform input object to new index with optional filling logic, |
DataFrame.reindex_like(other[, method, ...]) | return an object with matching indicies to myself |
DataFrame.rename([index, columns]) | Alter axes input function or functions. |
DataFrame.reset_index([level, drop, ...]) | For DataFrame with multi-level index, return new DataFrame with |
DataFrame.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
DataFrame.set_index(keys[, drop, append, ...]) | Set the DataFrame index (row labels) using one or more existing |
DataFrame.tail([n]) | Returns last n rows |
DataFrame.take(indices[, axis, convert, is_copy]) | Analogous to ndarray.take |
DataFrame.truncate([before, after, axis, copy]) | Truncates a sorted NDFrame before and/or after some particular |
Missing data handling¶
DataFrame.dropna([axis, how, thresh, ...]) | Return object with labels on given axis omitted where alternately any |
DataFrame.fillna([value, method, axis, ...]) | Fill NA/NaN values using the specified method |
DataFrame.replace([to_replace, value, ...]) | Replace values given in ‘to_replace’ with ‘value’. |
Reshaping, sorting, transposing¶
DataFrame.delevel(*args, **kwargs) | |
DataFrame.pivot([index, columns, values]) | Reshape data (produce a “pivot” table) based on column values. |
DataFrame.reorder_levels(order[, axis]) | Rearrange index levels using input order. |
DataFrame.sort([columns, column, axis, ...]) | Sort DataFrame either by labels (along either axis) or by the values in |
DataFrame.sort_index([axis, by, ascending, ...]) | Sort DataFrame either by labels (along either axis) or by the values in |
DataFrame.sortlevel([level, axis, ...]) | Sort multilevel index by chosen axis and primary level. |
DataFrame.swaplevel(i, j[, axis]) | Swap levels i and j in a MultiIndex on a particular axis |
DataFrame.stack([level, dropna]) | Pivot a level of the (possibly hierarchical) column labels, returning a |
DataFrame.unstack([level]) | Pivot a level of the (necessarily hierarchical) index labels, returning |
DataFrame.T | Transpose index and columns |
DataFrame.to_panel() | Transform long (stacked) format (DataFrame) into wide (3D, Panel) |
DataFrame.transpose() | Transpose index and columns |
Combining / joining / merging¶
DataFrame.append(other[, ignore_index, ...]) | Append columns of other to end of this frame’s columns and index, returning a new object. |
DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns with other DataFrame either on index or on a key |
DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame objects by performing a database-style join operation by |
DataFrame.update(other[, join, overwrite, ...]) | Modify DataFrame in place using non-NA values from passed |
Time series-related¶
DataFrame.asfreq(freq[, method, how, normalize]) | Convert all TimeSeries inside to specified frequency using DateOffset |
DataFrame.shift([periods, freq, axis]) | Shift index by desired number of periods with an optional time freq |
DataFrame.first_valid_index() | Return label for first non-NA/null value |
DataFrame.last_valid_index() | Return label for last non-NA/null value |
DataFrame.resample(rule[, how, axis, ...]) | Convenience method for frequency conversion and resampling of regular time-series data. |
DataFrame.to_period([freq, axis, copy]) | Convert DataFrame from DatetimeIndex to PeriodIndex with desired |
DataFrame.to_timestamp([freq, how, axis, copy]) | Cast to DatetimeIndex of timestamps, at beginning of period |
DataFrame.tz_convert(tz[, axis, copy]) | Convert TimeSeries to target time zone. If it is time zone naive, it |
DataFrame.tz_localize(tz[, axis, copy, ...]) | Localize tz-naive TimeSeries to target time zone |
Plotting¶
DataFrame.boxplot([column, by, ax, ...]) | Make a box plot from DataFrame column/columns optionally grouped |
DataFrame.hist(data[, column, by, grid, ...]) | Draw Histogram the DataFrame’s series using matplotlib / pylab. |
DataFrame.plot([frame, x, y, subplots, ...]) | Make line, bar, or scatter plots of DataFrame series with the index on the x-axis |
Serialization / IO / Conversion¶
DataFrame.from_csv(path[, header, sep, ...]) | Read delimited file into DataFrame |
DataFrame.from_dict(data[, orient, dtype]) | Construct DataFrame from dict of array-like or dicts |
DataFrame.from_items(items[, columns, orient]) | Convert (key, value) pairs to DataFrame. The keys will be the axis |
DataFrame.from_records(data[, index, ...]) | Convert structured or record ndarray to DataFrame |
DataFrame.info([verbose, buf, max_cols]) | Concise summary of a DataFrame. |
DataFrame.to_pickle(path) | Pickle (serialize) object to input file path |
DataFrame.to_csv(path_or_buf[, sep, na_rep, ...]) | Write DataFrame to a comma-separated values (csv) file |
DataFrame.to_hdf(path_or_buf, key, **kwargs) | activate the HDFStore |
DataFrame.to_dict([outtype]) | Convert DataFrame to dictionary. |
DataFrame.to_excel(excel_writer[, ...]) | Write DataFrame to a excel sheet |
DataFrame.to_json([path_or_buf, orient, ...]) | Convert the object to a JSON string. |
DataFrame.to_html([buf, columns, col_space, ...]) | Render a DataFrame as an HTML table. |
DataFrame.to_latex([buf, columns, ...]) | Render a DataFrame to a tabular environment table. |
DataFrame.to_stata(fname[, convert_dates, ...]) | A class for writing Stata binary dta files from array-like objects |
DataFrame.to_records([index, convert_datetime64]) | Convert DataFrame to record array. Index will be put in the |
DataFrame.to_sparse([fill_value, kind]) | Convert to SparseDataFrame |
DataFrame.to_string([buf, columns, ...]) | Render a DataFrame to a console-friendly tabular output. |
DataFrame.to_clipboard([excel, sep]) | Attempt to write text representation of object to the system clipboard |
Panel¶
Constructor¶
Panel([data, items, major_axis, minor_axis, ...]) | Represents wide format panel data, stored as 3-dimensional array |
Attributes and underlying data¶
Axes
- items: axis 0; each item corresponds to a DataFrame contained inside
- major_axis: axis 1; the index (rows) of each of the DataFrames
- minor_axis: axis 2; the columns of each of the DataFrames
Panel.values | Numpy representation of NDFrame |
Panel.axes | index(es) of the NDFrame |
Panel.ndim | Number of axes / array dimensions |
Panel.shape | tuple of axis dimensions |
Conversion¶
Panel.astype(dtype[, copy, raise_on_error]) | Cast object to input numpy.dtype |
Panel.copy([deep]) | Make a copy of this object |
Panel.isnull() | Return a boolean same-sized object indicating if the values are null |
Panel.notnull() | Return a boolean same-sized object indicating if the values are |
Getting and setting¶
Panel.get_value(*args) | Quickly retrieve single value at (item, major, minor) location |
Panel.set_value(*args) | Quickly set single value at (item, major, minor) location |
Indexing, iteration, slicing¶
Panel.at | |
Panel.iat | |
Panel.ix | |
Panel.loc | |
Panel.iloc | |
Panel.__iter__() | Iterate over infor axis |
Panel.iteritems() | Iterate over (label, values) on info axis |
Panel.pop(item) | Return item and drop from frame. |
Panel.xs(key[, axis, copy]) | Return slice of panel along selected axis |
Panel.major_xs(key[, copy]) | Return slice of panel along major axis |
Panel.minor_xs(key[, copy]) | Return slice of panel along minor axis |
For more information on .at, .iat, .ix, .loc, and .iloc, see the indexing documentation.
Binary operator functions¶
Panel.add(other[, axis]) | Wrapper method for add |
Panel.sub(other[, axis]) | Wrapper method for sub |
Panel.mul(other[, axis]) | Wrapper method for mul |
Panel.div(other[, axis]) | Wrapper method for truediv |
Panel.truediv(other[, axis]) | Wrapper method for truediv |
Panel.floordiv(other[, axis]) | Wrapper method for floordiv |
Panel.mod(other[, axis]) | Wrapper method for mod |
Panel.pow(other[, axis]) | Wrapper method for pow |
Panel.radd(other[, axis]) | Wrapper method for radd |
Panel.rsub(other[, axis]) | Wrapper method for rsub |
Panel.rmul(other[, axis]) | Wrapper method for rmul |
Panel.rdiv(other[, axis]) | Wrapper method for rtruediv |
Panel.rtruediv(other[, axis]) | Wrapper method for rtruediv |
Panel.rfloordiv(other[, axis]) | Wrapper method for rfloordiv |
Panel.rmod(other[, axis]) | Wrapper method for rmod |
Panel.rpow(other[, axis]) | Wrapper method for rpow |
Panel.lt(other) | Wrapper for comparison method lt |
Panel.gt(other) | Wrapper for comparison method gt |
Panel.le(other) | Wrapper for comparison method le |
Panel.ge(other) | Wrapper for comparison method ge |
Panel.ne(other) | Wrapper for comparison method ne |
Panel.eq(other) | Wrapper for comparison method eq |
Function application, GroupBy¶
Panel.apply(func[, axis]) | Apply |
Panel.groupby(function[, axis]) | Group data on given axis, returning GroupBy object |
Computations / Descriptive Stats¶
Panel.abs() | Return an object with absolute value taken. |
Panel.clip([lower, upper, out]) | Trim values at input threshold(s) |
Panel.clip_lower(threshold) | Return copy of the input with values below given value truncated |
Panel.clip_upper(threshold) | Return copy of input with values above given value truncated |
Panel.count([axis]) | Return number of observations over requested axis. |
Panel.cummax([axis, dtype, out, skipna]) | Return cumulative max over requested axis. |
Panel.cummin([axis, dtype, out, skipna]) | Return cumulative min over requested axis. |
Panel.cumprod([axis, dtype, out, skipna]) | Return cumulative prod over requested axis. |
Panel.cumsum([axis, dtype, out, skipna]) | Return cumulative sum over requested axis. |
Panel.max([axis, skipna, level, numeric_only]) | This method returns the maximum of the values in the object. |
Panel.mean([axis, skipna, level, numeric_only]) | Return the mean of the values for the requested axis |
Panel.median([axis, skipna, level, numeric_only]) | Return the median of the values for the requested axis |
Panel.min([axis, skipna, level, numeric_only]) | This method returns the minimum of the values in the object. |
Panel.pct_change([periods, fill_method, ...]) | Percent change over given number of periods |
Panel.prod([axis, skipna, level, numeric_only]) | Return the product of the values for the requested axis |
Panel.skew([axis, skipna, level, numeric_only]) | Return unbiased skew over requested axis |
Panel.sum([axis, skipna, level, numeric_only]) | Return the sum of the values for the requested axis |
Panel.std([axis, skipna, level, ddof]) | Return unbiased standard deviation over requested axis |
Panel.var([axis, skipna, level, ddof]) | Return unbiased variance over requested axis |
Reindexing / Selection / Label manipulation¶
Panel.add_prefix(prefix) | Concatenate prefix string with panel items names. |
Panel.add_suffix(suffix) | Concatenate suffix string with panel items names |
Panel.drop(labels[, axis, level, inplace]) | Return new object with labels in requested axis removed |
Panel.filter([items, like, regex, axis]) | Restrict the info axis to set of items or wildcard |
Panel.first(offset) | Convenience method for subsetting initial periods of time series data |
Panel.last(offset) | Convenience method for subsetting final periods of time series data |
Panel.reindex([items, major_axis, minor_axis]) | Conform Panel to new index with optional filling logic, placing |
Panel.reindex_axis(labels[, axis, method, ...]) | Conform input object to new index with optional filling logic, |
Panel.reindex_like(other[, method, copy, limit]) | return an object with matching indicies to myself |
Panel.rename([items, major_axis, minor_axis]) | Alter axes input function or functions. |
Panel.select(crit[, axis]) | Return data corresponding to axis labels matching criteria |
Panel.take(indices[, axis, convert, is_copy]) | Analogous to ndarray.take |
Panel.truncate([before, after, axis, copy]) | Truncates a sorted NDFrame before and/or after some particular |
Missing data handling¶
Panel.dropna([axis, how, inplace]) | Drop 2D from panel, holding passed axis constant |
Panel.fillna([value, method, axis, inplace, ...]) | Fill NA/NaN values using the specified method |
Reshaping, sorting, transposing¶
Panel.sort_index([axis, ascending]) | Sort object by labels (along an axis) |
Panel.swaplevel(i, j[, axis]) | Swap levels i and j in a MultiIndex on a particular axis |
Panel.transpose(*args, **kwargs) | Permute the dimensions of the Panel |
Panel.swapaxes(axis1, axis2[, copy]) | Interchange axes and swap values axes appropriately |
Panel.conform(frame[, axis]) | Conform input DataFrame to align with chosen axis pair. |
Combining / joining / merging¶
Panel.join(other[, how, lsuffix, rsuffix]) | Join items with other Panel either on major and minor axes column |
Panel.update(other[, join, overwrite, ...]) | Modify Panel in place using non-NA values from passed |
Time series-related¶
Panel.asfreq(freq[, method, how, normalize]) | Convert all TimeSeries inside to specified frequency using DateOffset |
Panel.shift(lags[, freq, axis]) | Shift major or minor axis by specified number of leads/lags. |
Panel.resample(rule[, how, axis, ...]) | Convenience method for frequency conversion and resampling of regular time-series data. |
Panel.tz_convert(tz[, axis, copy]) | Convert TimeSeries to target time zone. If it is time zone naive, it |
Panel.tz_localize(tz[, axis, copy, infer_dst]) | Localize tz-naive TimeSeries to target time zone |
Serialization / IO / Conversion¶
Panel.from_dict(data[, intersect, orient, dtype]) | Construct Panel from dict of DataFrame objects |
Panel.to_pickle(path) | Pickle (serialize) object to input file path |
Panel.to_excel(path[, na_rep, engine]) | Write each DataFrame in Panel to a separate excel sheet |
Panel.to_hdf(path_or_buf, key, **kwargs) | activate the HDFStore |
Panel.to_json([path_or_buf, orient, ...]) | Convert the object to a JSON string. |
Panel.to_sparse([fill_value, kind]) | Convert to SparsePanel |
Panel.to_frame([filter_observations]) | Transform wide format into long (stacked) format as DataFrame |
Panel.to_clipboard([excel, sep]) | Attempt to write text representation of object to the system clipboard |
Index¶
Many of these methods or variants thereof are available on the objects that contain an index (Series/Dataframe) and those should most likely be used before calling these methods directly.
Index | Immutable ndarray implementing an ordered, sliceable set. |
Modifying and Computations¶
Index.copy([names, name, dtype, deep]) | Make a copy of this object. |
Index.delete(loc) | Make new Index with passed location deleted |
Index.diff(other) | Compute sorted set difference of two Index objects |
Index.drop(labels) | Make new Index with passed list of labels deleted |
Index.equals(other) | Determines if two Index objects contain the same elements. |
Index.identical(other) | Similar to equals, but check that other comparable attributes are |
Index.insert(loc, item) | Make new Index inserting new item at location |
Index.order([return_indexer, ascending]) | Return sorted copy of Index |
Index.reindex(target[, method, level, ...]) | For Index, simply returns the new index and the results of |
Index.repeat(repeats[, axis]) | Repeat elements of an array. |
Index.set_names(names[, inplace]) | Set new names on index. |
Index.unique() | Return array of unique values in the Index. Significantly faster than |
Conversion¶
Index.astype(dtype) | |
Index.tolist() | Overridden version of ndarray.tolist |
Index.to_datetime([dayfirst]) | For an Index containing strings or datetime.datetime objects, attempt |
Index.to_series() | return a series with both index and values equal to the index keys |
Sorting¶
Index.argsort(*args, **kwargs) | See docstring for ndarray.argsort |
Index.order([return_indexer, ascending]) | Return sorted copy of Index |
Index.sort(*args, **kwargs) |
Time-specific operations¶
Index.shift([periods, freq]) | Shift Index containing datetime objects by input number of periods and |
Combining / joining / merging¶
Index.append(other) | Append a collection of Index options together |
Index.intersection(other) | Form the intersection of two Index objects. Sortedness of the result is |
Index.join(other[, how, level, return_indexers]) | Internal API method. Compute join_index and indexers to conform data |
Index.union(other) | Form the union of two Index objects and sorts if possible |
Selecting¶
Index.get_indexer(target[, method, limit]) | Compute indexer and mask for new index given the current index. |
Index.get_indexer_non_unique(target, **kwargs) | return an indexer suitable for taking from a non unique index |
Index.get_level_values(level) | Return vector of label values for requested level, equal to the length |
Index.get_loc(key) | Get integer location for requested label |
Index.get_value(series, key) | Fast lookup of value from 1-dimensional ndarray. |
Index.isin(values) | Compute boolean array of whether each index value is found in the |
Index.slice_indexer([start, end, step]) | For an ordered Index, compute the slice indexer for input labels and |
Index.slice_locs([start, end]) | For an ordered Index, compute the slice locations for input labels |
Properties¶
Index.is_monotonic | |
Index.is_numeric() |
DatetimeIndex¶
DatetimeIndex | Immutable ndarray of datetime64 data, represented internally as int64, and |
Time/Date Components¶
DatetimeIndex.year | |
DatetimeIndex.month | |
DatetimeIndex.day | |
DatetimeIndex.hour | |
DatetimeIndex.minute | |
DatetimeIndex.second | |
DatetimeIndex.microsecond | |
DatetimeIndex.nanosecond | |
DatetimeIndex.date | Returns numpy array of datetime.date. |
DatetimeIndex.time | Returns numpy array of datetime.time. |
DatetimeIndex.dayofyear | |
DatetimeIndex.weekofyear | |
DatetimeIndex.week | |
DatetimeIndex.dayofweek | |
DatetimeIndex.weekday | |
DatetimeIndex.quarter |
Selecting¶
DatetimeIndex.indexer_at_time(time[, asof]) | Select values at particular time of day (e.g. |
DatetimeIndex.indexer_between_time(...[, ...]) | Select values between particular times of day (e.g., 9:00-9:30AM) |
Time-specific operations¶
DatetimeIndex.normalize() | Return DatetimeIndex with times to midnight. Length is unaltered |
DatetimeIndex.snap([freq]) | Snap time stamps to nearest occurring frequency |
DatetimeIndex.tz_convert(tz) | Convert DatetimeIndex from one time zone to another (using pytz) |
DatetimeIndex.tz_localize(tz[, infer_dst]) | Localize tz-naive DatetimeIndex to given time zone (using pytz) |
Conversion¶
DatetimeIndex.to_datetime([dayfirst]) | |
DatetimeIndex.to_period([freq]) | Cast to PeriodIndex at a particular frequency |
DatetimeIndex.to_pydatetime() | Return DatetimeIndex as object ndarray of datetime.datetime objects |
GroupBy¶
GroupBy objects are returned by groupby calls: pandas.DataFrame.groupby(), pandas.Series.groupby(), etc.
Indexing, iteration¶
GroupBy.__iter__() | Groupby iterator |
GroupBy.groups | dict {group name -> group labels} |
GroupBy.indices | dict {group name -> group indices} |
GroupBy.get_group(name[, obj]) | Constructs NDFrame from group with provided name |
Function application¶
GroupBy.apply(func, *args, **kwargs) | Apply function and combine results together in an intelligent way. |
GroupBy.aggregate(func, *args, **kwargs) | |
GroupBy.transform(func, *args, **kwargs) |
Computations / Descriptive Stats¶
GroupBy.mean() | Compute mean of groups, excluding missing values |
GroupBy.median() | Compute median of groups, excluding missing values |
GroupBy.std([ddof]) | Compute standard deviation of groups, excluding missing values |
GroupBy.var([ddof]) | Compute variance of groups, excluding missing values |
GroupBy.ohlc() | Compute sum of values, excluding missing values |