pandas.DataFrame¶
-
class
pandas.
DataFrame
(data=None, index=None, columns=None, dtype=None, copy=False)[source]¶ Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure
Parameters: data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
index : Index or array-like
Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided
columns : Index or array-like
Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided
dtype : dtype, default None
Data type to force, otherwise infer
copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input
See also
DataFrame.from_records
- constructor from tuples, also record arrays
DataFrame.from_dict
- from dicts of Series, arrays, or dicts
DataFrame.from_items
- from sequence of (key, value) pairs
Examples
>>> d = {'col1': ts1, 'col2': ts2} >>> df = DataFrame(data=d, index=index) >>> df2 = DataFrame(np.random.randn(10, 5)) >>> df3 = DataFrame(np.random.randn(10, 5), ... columns=['a', 'b', 'c', 'd', 'e'])
Attributes
T
Transpose index and columns at
Fast label-based scalar accessor axes
Return a list with the row axis labels and column axis labels as the only members. blocks
Internal property, property synonym for as_blocks() dtypes
Return the dtypes in this object. empty
True if NDFrame is entirely empty [no items], meaning any of the axes are of length 0. ftypes
Return the ftypes (indication of sparse/dense and dtype) in this object. iat
Fast integer location scalar accessor. iloc
Purely integer-location based indexing for selection by position. is_copy
ix
A primarily label-location based indexer, with integer position fallback. loc
Purely label-location based indexer for selection by label. ndim
Number of axes / array dimensions shape
Return a tuple representing the dimensionality of the DataFrame. size
number of elements in the NDFrame style
Property returning a Styler object containing methods for building a styled HTML representation fo the DataFrame. values
Numpy representation of NDFrame Methods
abs
()Return an object with absolute value taken–only applicable to objects that are all numeric. add
(other[, axis, level, fill_value])Addition of dataframe and other, element-wise (binary operator add). add_prefix
(prefix)Concatenate prefix string with panel items names. add_suffix
(suffix)Concatenate suffix string with panel items names. agg
(func[, axis])Aggregate using callable, string, dict, or list of string/callables aggregate
(func[, axis])Aggregate using callable, string, dict, or list of string/callables align
(other[, join, axis, level, copy, …])Align two object on their axes with the all
([axis, bool_only, skipna, level])Return whether all elements are True over requested axis any
([axis, bool_only, skipna, level])Return whether any element is True over requested axis append
(other[, ignore_index, verify_integrity])Append rows of other to the end of this frame, returning a new object. apply
(func[, axis, broadcast, raw, reduce, args])Applies function along input axis of DataFrame. applymap
(func)Apply a function to a DataFrame that is intended to operate elementwise, i.e. as_blocks
([copy])Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype. as_matrix
([columns])Convert the frame to its Numpy-array representation. asfreq
(freq[, method, how, normalize, …])Convert TimeSeries to specified frequency. asof
(where[, subset])The last row without any NaN is taken (or the last row without assign
(**kwargs)Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones. astype
(dtype[, copy, errors])Cast object to input numpy.dtype at_time
(time[, asof])Select values at particular time of day (e.g. between_time
(start_time, end_time[, …])Select values between particular times of the day (e.g., 9:00-9:30 AM). bfill
([axis, inplace, limit, downcast])Synonym for DataFrame.fillna(method='bfill')
bool
()Return the bool of a single element PandasObject. boxplot
([column, by, ax, fontsize, rot, …])Make a box plot from DataFrame column optionally grouped by some columns or clip
([lower, upper, axis])Trim values at input threshold(s). clip_lower
(threshold[, axis])Return copy of the input with values below given value(s) truncated. clip_upper
(threshold[, axis])Return copy of input with values above given value(s) truncated. combine
(other, func[, fill_value, overwrite])Add two DataFrame objects and do not propagate NaN values, so if for a combine_first
(other)Combine two DataFrame objects and default to non-null values in frame calling the method. compound
([axis, skipna, level])Return the compound percentage of the values for the requested axis consolidate
([inplace])DEPRECATED: consolidate will be an internal implementation only. convert_objects
([convert_dates, …])Deprecated. copy
([deep])Make a copy of this objects data. corr
([method, min_periods])Compute pairwise correlation of columns, excluding NA/null values corrwith
(other[, axis, drop])Compute pairwise correlation between rows or columns of two DataFrame objects. count
([axis, level, numeric_only])Return Series with number of non-NA/null observations over requested axis. cov
([min_periods])Compute pairwise covariance of columns, excluding NA/null values cummax
([axis, skipna])Return cumulative max over requested axis. cummin
([axis, skipna])Return cumulative minimum over requested axis. cumprod
([axis, skipna])Return cumulative product over requested axis. cumsum
([axis, skipna])Return cumulative sum over requested axis. describe
([percentiles, include, exclude])Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN
values.diff
([periods, axis])1st discrete difference of object div
(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator truediv). divide
(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator truediv). dot
(other)Matrix multiplication with DataFrame or Series objects drop
(labels[, axis, level, inplace, errors])Return new object with labels in requested axis removed. drop_duplicates
([subset, keep, inplace])Return DataFrame with duplicate rows removed, optionally only dropna
([axis, how, thresh, subset, inplace])Return object with labels on given axis omitted where alternately any duplicated
([subset, keep])Return boolean Series denoting duplicate rows, optionally only eq
(other[, axis, level])Wrapper for flexible comparison methods eq equals
(other)Determines if two NDFrame objects contain the same elements. eval
(expr[, inplace])Evaluate an expression in the context of the calling DataFrame instance. ewm
([com, span, halflife, alpha, …])Provides exponential weighted functions expanding
([min_periods, freq, center, axis])Provides expanding transformations. ffill
([axis, inplace, limit, downcast])Synonym for DataFrame.fillna(method='ffill')
fillna
([value, method, axis, inplace, …])Fill NA/NaN values using the specified method filter
([items, like, regex, axis])Subset rows or columns of dataframe according to labels in the specified index. first
(offset)Convenience method for subsetting initial periods of time series data based on a date offset. first_valid_index
()Return label for first non-NA/null value floordiv
(other[, axis, level, fill_value])Integer division of dataframe and other, element-wise (binary operator floordiv). from_csv
(path[, header, sep, index_col, …])Read CSV file (DISCOURAGED, please use pandas.read_csv()
instead).from_dict
(data[, orient, dtype])Construct DataFrame from dict of array-like or dicts from_items
(items[, columns, orient])Convert (key, value) pairs to DataFrame. from_records
(data[, index, exclude, …])Convert structured or record ndarray to DataFrame ge
(other[, axis, level])Wrapper for flexible comparison methods ge get
(key[, default])Get item from object for given key (DataFrame column, Panel slice, etc.). get_dtype_counts
()Return the counts of dtypes in this object. get_ftype_counts
()Return the counts of ftypes in this object. get_value
(index, col[, takeable])Quickly retrieve single value at passed column and index get_values
()same as values (but handles sparseness conversions) groupby
([by, axis, level, as_index, sort, …])Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns. gt
(other[, axis, level])Wrapper for flexible comparison methods gt head
([n])Returns first n rows hist
(data[, column, by, grid, xlabelsize, …])Draw histogram of the DataFrame’s series using matplotlib / pylab. idxmax
([axis, skipna])Return index of first occurrence of maximum over requested axis. idxmin
([axis, skipna])Return index of first occurrence of minimum over requested axis. info
([verbose, buf, max_cols, memory_usage, …])Concise summary of a DataFrame. insert
(loc, column, value[, allow_duplicates])Insert column into DataFrame at specified location. interpolate
([method, axis, limit, inplace, …])Interpolate values according to different methods. isin
(values)Return boolean DataFrame showing whether each element in the DataFrame is contained in values. isnull
()Return a boolean same-sized object indicating if the values are null. items
()Iterator over (column name, Series) pairs. iteritems
()Iterator over (column name, Series) pairs. iterrows
()Iterate over DataFrame rows as (index, Series) pairs. itertuples
([index, name])Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple. join
(other[, on, how, lsuffix, rsuffix, sort])Join columns with other DataFrame either on index or on a key column. keys
()Get the ‘info axis’ (see Indexing for more) kurt
([axis, skipna, level, numeric_only])Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). kurtosis
([axis, skipna, level, numeric_only])Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). last
(offset)Convenience method for subsetting final periods of time series data based on a date offset. last_valid_index
()Return label for last non-NA/null value le
(other[, axis, level])Wrapper for flexible comparison methods le lookup
(row_labels, col_labels)Label-based “fancy indexing” function for DataFrame. lt
(other[, axis, level])Wrapper for flexible comparison methods lt mad
([axis, skipna, level])Return the mean absolute deviation of the values for the requested axis mask
(cond[, other, inplace, axis, level, …])Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other. max
([axis, skipna, level, numeric_only])This method returns the maximum of the values in the object. mean
([axis, skipna, level, numeric_only])Return the mean of the values for the requested axis median
([axis, skipna, level, numeric_only])Return the median of the values for the requested axis melt
([id_vars, value_vars, var_name, …])“Unpivots” a DataFrame from wide format to long format, optionally memory_usage
([index, deep])Memory usage of DataFrame columns. merge
(right[, how, on, left_on, right_on, …])Merge DataFrame objects by performing a database-style join operation by columns or indexes. min
([axis, skipna, level, numeric_only])This method returns the minimum of the values in the object. mod
(other[, axis, level, fill_value])Modulo of dataframe and other, element-wise (binary operator mod). mode
([axis, numeric_only])Gets the mode(s) of each element along the axis selected. mul
(other[, axis, level, fill_value])Multiplication of dataframe and other, element-wise (binary operator mul). multiply
(other[, axis, level, fill_value])Multiplication of dataframe and other, element-wise (binary operator mul). ne
(other[, axis, level])Wrapper for flexible comparison methods ne nlargest
(n, columns[, keep])Get the rows of a DataFrame sorted by the n largest values of columns. notnull
()Return a boolean same-sized object indicating if the values are not null. nsmallest
(n, columns[, keep])Get the rows of a DataFrame sorted by the n smallest values of columns. nunique
([axis, dropna])Return Series with number of distinct observations over requested axis. pct_change
([periods, fill_method, limit, freq])Percent change over given number of periods. pipe
(func, *args, **kwargs)Apply func(self, *args, **kwargs) pivot
([index, columns, values])Reshape data (produce a “pivot” table) based on column values. pivot_table
(data[, values, index, columns, …])Create a spreadsheet-style pivot table as a DataFrame. plot
alias of FramePlotMethods
pop
(item)Return item and drop from frame. pow
(other[, axis, level, fill_value])Exponential power of dataframe and other, element-wise (binary operator pow). prod
([axis, skipna, level, numeric_only])Return the product of the values for the requested axis product
([axis, skipna, level, numeric_only])Return the product of the values for the requested axis quantile
([q, axis, numeric_only, interpolation])Return values at the given quantile over requested axis, a la numpy.percentile. query
(expr[, inplace])Query the columns of a frame with a boolean expression. radd
(other[, axis, level, fill_value])Addition of dataframe and other, element-wise (binary operator radd). rank
([axis, method, numeric_only, …])Compute numerical data ranks (1 through n) along axis. rdiv
(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator rtruediv). reindex
([index, columns])Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. reindex_axis
(labels[, axis, method, level, …])Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. reindex_like
(other[, method, copy, limit, …])Return an object with matching indices to myself. rename
([index, columns])Alter axes input function or functions. rename_axis
(mapper[, axis, copy, inplace])Alter index and / or columns using input function or functions. reorder_levels
(order[, axis])Rearrange index levels using input order. replace
([to_replace, value, inplace, limit, …])Replace values given in ‘to_replace’ with ‘value’. resample
(rule[, how, axis, fill_method, …])Convenience method for frequency conversion and resampling of time series. reset_index
([level, drop, inplace, …])For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. rfloordiv
(other[, axis, level, fill_value])Integer division of dataframe and other, element-wise (binary operator rfloordiv). rmod
(other[, axis, level, fill_value])Modulo of dataframe and other, element-wise (binary operator rmod). rmul
(other[, axis, level, fill_value])Multiplication of dataframe and other, element-wise (binary operator rmul). rolling
(window[, min_periods, freq, center, …])Provides rolling window calculcations. round
([decimals])Round a DataFrame to a variable number of decimal places. rpow
(other[, axis, level, fill_value])Exponential power of dataframe and other, element-wise (binary operator rpow). rsub
(other[, axis, level, fill_value])Subtraction of dataframe and other, element-wise (binary operator rsub). rtruediv
(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator rtruediv). sample
([n, frac, replace, weights, …])Returns a random sample of items from an axis of object. select
(crit[, axis])Return data corresponding to axis labels matching criteria select_dtypes
([include, exclude])Return a subset of a DataFrame including/excluding columns based on their dtype
.sem
([axis, skipna, level, ddof, numeric_only])Return unbiased standard error of the mean over requested axis. set_axis
(axis, labels)public verson of axis assignment set_index
(keys[, drop, append, inplace, …])Set the DataFrame index (row labels) using one or more existing columns. set_value
(index, col, value[, takeable])Put single value at passed column and index shift
([periods, freq, axis])Shift index by desired number of periods with an optional time freq skew
([axis, skipna, level, numeric_only])Return unbiased skew over requested axis slice_shift
([periods, axis])Equivalent to shift without copying data. sort_index
([axis, level, ascending, …])Sort object by labels (along an axis) sort_values
(by[, axis, ascending, inplace, …])Sort by the values along either axis sortlevel
([level, axis, ascending, inplace, …])DEPRECATED: use DataFrame.sort_index()
squeeze
([axis])Squeeze length 1 dimensions. stack
([level, dropna])Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels. std
([axis, skipna, level, ddof, numeric_only])Return sample standard deviation over requested axis. sub
(other[, axis, level, fill_value])Subtraction of dataframe and other, element-wise (binary operator sub). subtract
(other[, axis, level, fill_value])Subtraction of dataframe and other, element-wise (binary operator sub). sum
([axis, skipna, level, numeric_only])Return the sum of the values for the requested axis swapaxes
(axis1, axis2[, copy])Interchange axes and swap values axes appropriately swaplevel
([i, j, axis])Swap levels i and j in a MultiIndex on a particular axis tail
([n])Returns last n rows take
(indices[, axis, convert, is_copy])Analogous to ndarray.take to_clipboard
([excel, sep])Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example. to_csv
([path_or_buf, sep, na_rep, …])Write DataFrame to a comma-separated values (csv) file to_dense
()Return dense representation of NDFrame (as opposed to sparse) to_dict
([orient])Convert DataFrame to dictionary. to_excel
(excel_writer[, sheet_name, na_rep, …])Write DataFrame to an excel sheet to_feather
(fname)write out the binary feather-format for DataFrames to_gbq
(destination_table, project_id[, …])Write a DataFrame to a Google BigQuery table. to_hdf
(path_or_buf, key, **kwargs)Write the contained data to an HDF5 file using HDFStore. to_html
([buf, columns, col_space, header, …])Render a DataFrame as an HTML table. to_json
([path_or_buf, orient, date_format, …])Convert the object to a JSON string. to_latex
([buf, columns, col_space, header, …])Render an object to a tabular environment table. to_msgpack
([path_or_buf, encoding])msgpack (serialize) object to input file path to_panel
()Transform long (stacked) format (DataFrame) into wide (3D, Panel) format. to_period
([freq, axis, copy])Convert DataFrame from DatetimeIndex to PeriodIndex with desired to_pickle
(path[, compression])Pickle (serialize) object to input file path. to_records
([index, convert_datetime64])Convert DataFrame to record array. to_sparse
([fill_value, kind])Convert to SparseDataFrame to_sql
(name, con[, flavor, schema, …])Write records stored in a DataFrame to a SQL database. to_stata
(fname[, convert_dates, …])A class for writing Stata binary dta files from array-like objects to_string
([buf, columns, col_space, header, …])Render a DataFrame to a console-friendly tabular output. to_timestamp
([freq, how, axis, copy])Cast to DatetimeIndex of timestamps, at beginning of period to_xarray
()Return an xarray object from the pandas object. transform
(func, *args, **kwargs)Call function producing a like-indexed NDFrame transpose
(*args, **kwargs)Transpose index and columns truediv
(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator truediv). truncate
([before, after, axis, copy])Truncates a sorted NDFrame before and/or after some particular index value. tshift
([periods, freq, axis])Shift the time index, using the index’s frequency if available. tz_convert
(tz[, axis, level, copy])Convert tz-aware axis to target time zone. tz_localize
(tz[, axis, level, copy, ambiguous])Localize tz-naive TimeSeries to target time zone. unstack
([level, fill_value])Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. update
(other[, join, overwrite, …])Modify DataFrame in place using non-NA values from passed DataFrame. var
([axis, skipna, level, ddof, numeric_only])Return unbiased variance over requested axis. where
(cond[, other, inplace, axis, level, …])Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other. xs
(key[, axis, level, drop_level])Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.