pandas.DataFrame¶

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)[source]¶

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure

Parameters:

data : numpy ndarray (structured or homogeneous), dict, or DataFrame

Dict can contain Series, arrays, constants, or list-like objects

index : Index or array-like

Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided

columns : Index or array-like

Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided

dtype : dtype, default None

Data type to force, otherwise infer

copy : boolean, default False

Copy data from inputs. Only affects DataFrame / 2d ndarray input

See also

DataFrame.from_records: constructor from tuples, also record arrays
DataFrame.from_dict: from dicts of Series, arrays, or dicts
DataFrame.from_items: from sequence of (key, value) pairs

pandas.read_csv, pandas.read_table, pandas.read_clipboard

Examples

>>> d = {'col1': ts1, 'col2': ts2}
>>> df = DataFrame(data=d, index=index)
>>> df2 = DataFrame(np.random.randn(10, 5))
>>> df3 = DataFrame(np.random.randn(10, 5),
...                 columns=['a', 'b', 'c', 'd', 'e'])

Attributes

`T`	Transpose index and columns
`at`	Fast label-based scalar accessor
`axes`	Return a list with the row axis labels and column axis labels as the only members.
`blocks`	Internal property, property synonym for as_blocks()
`dtypes`	Return the dtypes in this object.
`empty`	True if NDFrame is entirely empty [no items], meaning any of the axes are of length 0.
`ftypes`	Return the ftypes (indication of sparse/dense and dtype) in this object.
`iat`	Fast integer location scalar accessor.
`iloc`	Purely integer-location based indexing for selection by position.
`is_copy`
`ix`	A primarily label-location based indexer, with integer position fallback.
`loc`	Purely label-location based indexer for selection by label.
`ndim`	Number of axes / array dimensions
`shape`	Return a tuple representing the dimensionality of the DataFrame.
`size`	number of elements in the NDFrame
`style`	Property returning a Styler object containing methods for building a styled HTML representation fo the DataFrame.
`values`	Numpy representation of NDFrame

Methods

`abs`()	Return an object with absolute value taken–only applicable to objects that are all numeric.
`add`(other[, axis, level, fill_value])	Addition of dataframe and other, element-wise (binary operator add).
`add_prefix`(prefix)	Concatenate prefix string with panel items names.
`add_suffix`(suffix)	Concatenate suffix string with panel items names.
`agg`(func[, axis])	Aggregate using callable, string, dict, or list of string/callables
`aggregate`(func[, axis])	Aggregate using callable, string, dict, or list of string/callables
`align`(other[, join, axis, level, copy, ...])	Align two object on their axes with the
`all`([axis, bool_only, skipna, level])	Return whether all elements are True over requested axis
`any`([axis, bool_only, skipna, level])	Return whether any element is True over requested axis
`append`(other[, ignore_index, verify_integrity])	Append rows of other to the end of this frame, returning a new object.
`apply`(func[, axis, broadcast, raw, reduce, args])	Applies function along input axis of DataFrame.
`applymap`(func)	Apply a function to a DataFrame that is intended to operate elementwise, i.e.
`as_blocks`([copy])	Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.
`as_matrix`([columns])	Convert the frame to its Numpy-array representation.
`asfreq`(freq[, method, how, normalize, ...])	Convert TimeSeries to specified frequency.
`asof`(where[, subset])	The last row without any NaN is taken (or the last row without
`assign`(**kwargs)	Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones.
`astype`(dtype[, copy, errors])	Cast object to input numpy.dtype
`at_time`(time[, asof])	Select values at particular time of day (e.g.
`between_time`(start_time, end_time[, ...])	Select values between particular times of the day (e.g., 9:00-9:30 AM).
`bfill`([axis, inplace, limit, downcast])	Synonym for `DataFrame.fillna(method='bfill')`
`bool`()	Return the bool of a single element PandasObject.
`boxplot`([column, by, ax, fontsize, rot, ...])	Make a box plot from DataFrame column optionally grouped by some columns or
`clip`([lower, upper, axis])	Trim values at input threshold(s).
`clip_lower`(threshold[, axis])	Return copy of the input with values below given value(s) truncated.
`clip_upper`(threshold[, axis])	Return copy of input with values above given value(s) truncated.
`combine`(other, func[, fill_value, overwrite])	Add two DataFrame objects and do not propagate NaN values, so if for a
`combine_first`(other)	Combine two DataFrame objects and default to non-null values in frame calling the method.
`compound`([axis, skipna, level])	Return the compound percentage of the values for the requested axis
`consolidate`([inplace])	DEPRECATED: consolidate will be an internal implementation only.
`convert_objects`([convert_dates, ...])	Deprecated.
`copy`([deep])	Make a copy of this objects data.
`corr`([method, min_periods])	Compute pairwise correlation of columns, excluding NA/null values
`corrwith`(other[, axis, drop])	Compute pairwise correlation between rows or columns of two DataFrame objects.
`count`([axis, level, numeric_only])	Return Series with number of non-NA/null observations over requested axis.
`cov`([min_periods])	Compute pairwise covariance of columns, excluding NA/null values
`cummax`([axis, skipna])	Return cumulative max over requested axis.
`cummin`([axis, skipna])	Return cumulative minimum over requested axis.
`cumprod`([axis, skipna])	Return cumulative product over requested axis.
`cumsum`([axis, skipna])	Return cumulative sum over requested axis.
`describe`([percentiles, include, exclude])	Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding `NaN` values.
`diff`([periods, axis])	1st discrete difference of object
`div`(other[, axis, level, fill_value])	Floating division of dataframe and other, element-wise (binary operator truediv).
`divide`(other[, axis, level, fill_value])	Floating division of dataframe and other, element-wise (binary operator truediv).
`dot`(other)	Matrix multiplication with DataFrame or Series objects
`drop`(labels[, axis, level, inplace, errors])	Return new object with labels in requested axis removed.
`drop_duplicates`([subset, keep, inplace])	Return DataFrame with duplicate rows removed, optionally only
`dropna`([axis, how, thresh, subset, inplace])	Return object with labels on given axis omitted where alternately any
`duplicated`([subset, keep])	Return boolean Series denoting duplicate rows, optionally only
`eq`(other[, axis, level])	Wrapper for flexible comparison methods eq
`equals`(other)	Determines if two NDFrame objects contain the same elements.
`eval`(expr[, inplace])	Evaluate an expression in the context of the calling DataFrame instance.
`ewm`([com, span, halflife, alpha, ...])	Provides exponential weighted functions
`expanding`([min_periods, freq, center, axis])	Provides expanding transformations.
`ffill`([axis, inplace, limit, downcast])	Synonym for `DataFrame.fillna(method='ffill')`
`fillna`([value, method, axis, inplace, ...])	Fill NA/NaN values using the specified method
`filter`([items, like, regex, axis])	Subset rows or columns of dataframe according to labels in the specified index.
`first`(offset)	Convenience method for subsetting initial periods of time series data based on a date offset.
`first_valid_index`()	Return label for first non-NA/null value
`floordiv`(other[, axis, level, fill_value])	Integer division of dataframe and other, element-wise (binary operator floordiv).
`from_csv`(path[, header, sep, index_col, ...])	Read CSV file (DISCOURAGED, please use `pandas.read_csv()` instead).
`from_dict`(data[, orient, dtype])	Construct DataFrame from dict of array-like or dicts
`from_items`(items[, columns, orient])	Convert (key, value) pairs to DataFrame.
`from_records`(data[, index, exclude, ...])	Convert structured or record ndarray to DataFrame
`ge`(other[, axis, level])	Wrapper for flexible comparison methods ge
`get`(key[, default])	Get item from object for given key (DataFrame column, Panel slice, etc.).
`get_dtype_counts`()	Return the counts of dtypes in this object.
`get_ftype_counts`()	Return the counts of ftypes in this object.
`get_value`(index, col[, takeable])	Quickly retrieve single value at passed column and index
`get_values`()	same as values (but handles sparseness conversions)
`groupby`([by, axis, level, as_index, sort, ...])	Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns.
`gt`(other[, axis, level])	Wrapper for flexible comparison methods gt
`head`([n])	Returns first n rows
`hist`(data[, column, by, grid, xlabelsize, ...])	Draw histogram of the DataFrame’s series using matplotlib / pylab.
`idxmax`([axis, skipna])	Return index of first occurrence of maximum over requested axis.
`idxmin`([axis, skipna])	Return index of first occurrence of minimum over requested axis.
`info`([verbose, buf, max_cols, memory_usage, ...])	Concise summary of a DataFrame.
`insert`(loc, column, value[, allow_duplicates])	Insert column into DataFrame at specified location.
`interpolate`([method, axis, limit, inplace, ...])	Interpolate values according to different methods.
`isin`(values)	Return boolean DataFrame showing whether each element in the DataFrame is contained in values.
`isnull`()	Return a boolean same-sized object indicating if the values are null.
`items`()	Iterator over (column name, Series) pairs.
`iteritems`()	Iterator over (column name, Series) pairs.
`iterrows`()	Iterate over DataFrame rows as (index, Series) pairs.
`itertuples`([index, name])	Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple.
`join`(other[, on, how, lsuffix, rsuffix, sort])	Join columns with other DataFrame either on index or on a key column.
`keys`()	Get the ‘info axis’ (see Indexing for more)
`kurt`([axis, skipna, level, numeric_only])	Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
`kurtosis`([axis, skipna, level, numeric_only])	Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).
`last`(offset)	Convenience method for subsetting final periods of time series data based on a date offset.
`last_valid_index`()	Return label for last non-NA/null value
`le`(other[, axis, level])	Wrapper for flexible comparison methods le
`lookup`(row_labels, col_labels)	Label-based “fancy indexing” function for DataFrame.
`lt`(other[, axis, level])	Wrapper for flexible comparison methods lt
`mad`([axis, skipna, level])	Return the mean absolute deviation of the values for the requested axis
`mask`(cond[, other, inplace, axis, level, ...])	Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other.
`max`([axis, skipna, level, numeric_only])	This method returns the maximum of the values in the object.
`mean`([axis, skipna, level, numeric_only])	Return the mean of the values for the requested axis
`median`([axis, skipna, level, numeric_only])	Return the median of the values for the requested axis
`melt`([id_vars, value_vars, var_name, ...])	“Unpivots” a DataFrame from wide format to long format, optionally
`memory_usage`([index, deep])	Memory usage of DataFrame columns.
`merge`(right[, how, on, left_on, right_on, ...])	Merge DataFrame objects by performing a database-style join operation by columns or indexes.
`min`([axis, skipna, level, numeric_only])	This method returns the minimum of the values in the object.
`mod`(other[, axis, level, fill_value])	Modulo of dataframe and other, element-wise (binary operator mod).
`mode`([axis, numeric_only])	Gets the mode(s) of each element along the axis selected.
`mul`(other[, axis, level, fill_value])	Multiplication of dataframe and other, element-wise (binary operator mul).
`multiply`(other[, axis, level, fill_value])	Multiplication of dataframe and other, element-wise (binary operator mul).
`ne`(other[, axis, level])	Wrapper for flexible comparison methods ne
`nlargest`(n, columns[, keep])	Get the rows of a DataFrame sorted by the n largest values of columns.
`notnull`()	Return a boolean same-sized object indicating if the values are not null.
`nsmallest`(n, columns[, keep])	Get the rows of a DataFrame sorted by the n smallest values of columns.
`nunique`([axis, dropna])	Return Series with number of distinct observations over requested axis.
`pct_change`([periods, fill_method, limit, freq])	Percent change over given number of periods.
`pipe`(func, args, *kwargs)	Apply func(self, args, *kwargs)
`pivot`([index, columns, values])	Reshape data (produce a “pivot” table) based on column values.
`pivot_table`(data[, values, index, columns, ...])	Create a spreadsheet-style pivot table as a DataFrame.
`plot`	alias of `FramePlotMethods`
`pop`(item)	Return item and drop from frame.
`pow`(other[, axis, level, fill_value])	Exponential power of dataframe and other, element-wise (binary operator pow).
`prod`([axis, skipna, level, numeric_only])	Return the product of the values for the requested axis
`product`([axis, skipna, level, numeric_only])	Return the product of the values for the requested axis
`quantile`([q, axis, numeric_only, interpolation])	Return values at the given quantile over requested axis, a la numpy.percentile.
`query`(expr[, inplace])	Query the columns of a frame with a boolean expression.
`radd`(other[, axis, level, fill_value])	Addition of dataframe and other, element-wise (binary operator radd).
`rank`([axis, method, numeric_only, ...])	Compute numerical data ranks (1 through n) along axis.
`rdiv`(other[, axis, level, fill_value])	Floating division of dataframe and other, element-wise (binary operator rtruediv).
`reindex`([index, columns])	Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.
`reindex_axis`(labels[, axis, method, level, ...])	Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.
`reindex_like`(other[, method, copy, limit, ...])	Return an object with matching indices to myself.
`rename`([index, columns])	Alter axes input function or functions.
`rename_axis`(mapper[, axis, copy, inplace])	Alter index and / or columns using input function or functions.
`reorder_levels`(order[, axis])	Rearrange index levels using input order.
`replace`([to_replace, value, inplace, limit, ...])	Replace values given in ‘to_replace’ with ‘value’.
`resample`(rule[, how, axis, fill_method, ...])	Convenience method for frequency conversion and resampling of time series.
`reset_index`([level, drop, inplace, ...])	For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc.
`rfloordiv`(other[, axis, level, fill_value])	Integer division of dataframe and other, element-wise (binary operator rfloordiv).
`rmod`(other[, axis, level, fill_value])	Modulo of dataframe and other, element-wise (binary operator rmod).
`rmul`(other[, axis, level, fill_value])	Multiplication of dataframe and other, element-wise (binary operator rmul).
`rolling`(window[, min_periods, freq, center, ...])	Provides rolling window calculcations.
`round`([decimals])	Round a DataFrame to a variable number of decimal places.
`rpow`(other[, axis, level, fill_value])	Exponential power of dataframe and other, element-wise (binary operator rpow).
`rsub`(other[, axis, level, fill_value])	Subtraction of dataframe and other, element-wise (binary operator rsub).
`rtruediv`(other[, axis, level, fill_value])	Floating division of dataframe and other, element-wise (binary operator rtruediv).
`sample`([n, frac, replace, weights, ...])	Returns a random sample of items from an axis of object.
`select`(crit[, axis])	Return data corresponding to axis labels matching criteria
`select_dtypes`([include, exclude])	Return a subset of a DataFrame including/excluding columns based on their `dtype`.
`sem`([axis, skipna, level, ddof, numeric_only])	Return unbiased standard error of the mean over requested axis.
`set_axis`(axis, labels)	public verson of axis assignment
`set_index`(keys[, drop, append, inplace, ...])	Set the DataFrame index (row labels) using one or more existing columns.
`set_value`(index, col, value[, takeable])	Put single value at passed column and index
`shift`([periods, freq, axis])	Shift index by desired number of periods with an optional time freq
`skew`([axis, skipna, level, numeric_only])	Return unbiased skew over requested axis
`slice_shift`([periods, axis])	Equivalent to shift without copying data.
`sort_index`([axis, level, ascending, ...])	Sort object by labels (along an axis)
`sort_values`(by[, axis, ascending, inplace, ...])	Sort by the values along either axis
`sortlevel`([level, axis, ascending, inplace, ...])	DEPRECATED: use `DataFrame.sort_index()`
`squeeze`([axis])	Squeeze length 1 dimensions.
`stack`([level, dropna])	Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels.
`std`([axis, skipna, level, ddof, numeric_only])	Return sample standard deviation over requested axis.
`sub`(other[, axis, level, fill_value])	Subtraction of dataframe and other, element-wise (binary operator sub).
`subtract`(other[, axis, level, fill_value])	Subtraction of dataframe and other, element-wise (binary operator sub).
`sum`([axis, skipna, level, numeric_only])	Return the sum of the values for the requested axis
`swapaxes`(axis1, axis2[, copy])	Interchange axes and swap values axes appropriately
`swaplevel`([i, j, axis])	Swap levels i and j in a MultiIndex on a particular axis
`tail`([n])	Returns last n rows
`take`(indices[, axis, convert, is_copy])	Analogous to ndarray.take
`to_clipboard`([excel, sep])	Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example.
`to_csv`([path_or_buf, sep, na_rep, ...])	Write DataFrame to a comma-separated values (csv) file
`to_dense`()	Return dense representation of NDFrame (as opposed to sparse)
`to_dict`([orient])	Convert DataFrame to dictionary.
`to_excel`(excel_writer[, sheet_name, na_rep, ...])	Write DataFrame to an excel sheet
`to_feather`(fname)	write out the binary feather-format for DataFrames
`to_gbq`(destination_table, project_id[, ...])	Write a DataFrame to a Google BigQuery table.
`to_hdf`(path_or_buf, key, **kwargs)	Write the contained data to an HDF5 file using HDFStore.
`to_html`([buf, columns, col_space, header, ...])	Render a DataFrame as an HTML table.
`to_json`([path_or_buf, orient, date_format, ...])	Convert the object to a JSON string.
`to_latex`([buf, columns, col_space, header, ...])	Render an object to a tabular environment table.
`to_msgpack`([path_or_buf, encoding])	msgpack (serialize) object to input file path
`to_panel`()	Transform long (stacked) format (DataFrame) into wide (3D, Panel) format.
`to_period`([freq, axis, copy])	Convert DataFrame from DatetimeIndex to PeriodIndex with desired
`to_pickle`(path[, compression])	Pickle (serialize) object to input file path.
`to_records`([index, convert_datetime64])	Convert DataFrame to record array.
`to_sparse`([fill_value, kind])	Convert to SparseDataFrame
`to_sql`(name, con[, flavor, schema, ...])	Write records stored in a DataFrame to a SQL database.
`to_stata`(fname[, convert_dates, ...])	A class for writing Stata binary dta files from array-like objects
`to_string`([buf, columns, col_space, header, ...])	Render a DataFrame to a console-friendly tabular output.
`to_timestamp`([freq, how, axis, copy])	Cast to DatetimeIndex of timestamps, at beginning of period
`to_xarray`()	Return an xarray object from the pandas object.
`transform`(func, args, *kwargs)	Call function producing a like-indexed NDFrame
`transpose`(args, *kwargs)	Transpose index and columns
`truediv`(other[, axis, level, fill_value])	Floating division of dataframe and other, element-wise (binary operator truediv).
`truncate`([before, after, axis, copy])	Truncates a sorted NDFrame before and/or after some particular index value.
`tshift`([periods, freq, axis])	Shift the time index, using the index’s frequency if available.
`tz_convert`(tz[, axis, level, copy])	Convert tz-aware axis to target time zone.
`tz_localize`(tz[, axis, level, copy, ambiguous])	Localize tz-naive TimeSeries to target time zone.
`unstack`([level, fill_value])	Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.
`update`(other[, join, overwrite, ...])	Modify DataFrame in place using non-NA values from passed DataFrame.
`var`([axis, skipna, level, ddof, numeric_only])	Return unbiased variance over requested axis.
`where`(cond[, other, inplace, axis, level, ...])	Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
`xs`(key[, axis, level, drop_level])	Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.

Table Of Contents

Search

pandas.DataFrame¶