pandas.DataFrame¶
-
class
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)[source]¶ Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure
Parameters: data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
index : Index or array-like
Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided
columns : Index or array-like
Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer
copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input
See also
DataFrame.from_records- constructor from tuples, also record arrays
DataFrame.from_dict- from dicts of Series, arrays, or dicts
DataFrame.from_items- from sequence of (key, value) pairs
Examples
Constructing DataFrame from a dictionary.
>>> d = {'col1': [1, 2], 'col2': [3, 4]} >>> df = pd.DataFrame(data=d) >>> df col1 col2 0 1 3 1 2 4
Notice that the inferred dtype is int64.
>>> df.dtypes col1 int64 col2 int64 dtype: object
To enforce a single dtype:
>>> df = pd.DataFrame(data=d, dtype=np.int8) >>> df.dtypes col1 int8 col2 int8 dtype: object
Constructing DataFrame from numpy ndarray:
>>> df2 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), ... columns=['a', 'b', 'c', 'd', 'e']) >>> df2 a b c d e 0 2 8 8 3 4 1 4 2 9 0 9 2 1 0 7 8 0 3 5 1 7 1 3 4 6 0 2 4 2
Attributes
TTranspose index and columns atFast label-based scalar accessor axesReturn a list with the row axis labels and column axis labels as the only members. blocksInternal property, property synonym for as_blocks() dtypesReturn the dtypes in this object. emptyTrue if NDFrame is entirely empty [no items], meaning any of the axes are of length 0. ftypesReturn the ftypes (indication of sparse/dense and dtype) in this object. iatFast integer location scalar accessor. ilocPurely integer-location based indexing for selection by position. is_copyixA primarily label-location based indexer, with integer position fallback. locPurely label-location based indexer for selection by label. ndimNumber of axes / array dimensions shapeReturn a tuple representing the dimensionality of the DataFrame. sizenumber of elements in the NDFrame styleProperty returning a Styler object containing methods for building a styled HTML representation fo the DataFrame. valuesNumpy representation of NDFrame Methods
abs()Return an object with absolute value taken–only applicable to objects that are all numeric. add(other[, axis, level, fill_value])Addition of dataframe and other, element-wise (binary operator add). add_prefix(prefix)Concatenate prefix string with panel items names. add_suffix(suffix)Concatenate suffix string with panel items names. agg(func[, axis])Aggregate using callable, string, dict, or list of string/callables aggregate(func[, axis])Aggregate using callable, string, dict, or list of string/callables align(other[, join, axis, level, copy, ...])Align two objects on their axes with the all([axis, bool_only, skipna, level])Return whether all elements are True over requested axis any([axis, bool_only, skipna, level])Return whether any element is True over requested axis append(other[, ignore_index, verify_integrity])Append rows of other to the end of this frame, returning a new object. apply(func[, axis, broadcast, raw, reduce, args])Applies function along input axis of DataFrame. applymap(func)Apply a function to a DataFrame that is intended to operate elementwise, i.e. as_blocks([copy])Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype. as_matrix([columns])Convert the frame to its Numpy-array representation. asfreq(freq[, method, how, normalize, ...])Convert TimeSeries to specified frequency. asof(where[, subset])The last row without any NaN is taken (or the last row without assign(**kwargs)Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones. astype(dtype[, copy, errors])Cast a pandas object to a specified dtype dtype.at_time(time[, asof])Select values at particular time of day (e.g. between_time(start_time, end_time[, ...])Select values between particular times of the day (e.g., 9:00-9:30 AM). bfill([axis, inplace, limit, downcast])Synonym for DataFrame.fillna(method='bfill')bool()Return the bool of a single element PandasObject. boxplot([column, by, ax, fontsize, rot, ...])Make a box plot from DataFrame column optionally grouped by some columns or clip([lower, upper, axis, inplace])Trim values at input threshold(s). clip_lower(threshold[, axis, inplace])Return copy of the input with values below given value(s) truncated. clip_upper(threshold[, axis, inplace])Return copy of input with values above given value(s) truncated. combine(other, func[, fill_value, overwrite])Add two DataFrame objects and do not propagate NaN values, so if for a combine_first(other)Combine two DataFrame objects and default to non-null values in frame calling the method. compound([axis, skipna, level])Return the compound percentage of the values for the requested axis consolidate([inplace])DEPRECATED: consolidate will be an internal implementation only. convert_objects([convert_dates, ...])Deprecated. copy([deep])Make a copy of this objects data. corr([method, min_periods])Compute pairwise correlation of columns, excluding NA/null values corrwith(other[, axis, drop])Compute pairwise correlation between rows or columns of two DataFrame objects. count([axis, level, numeric_only])Return Series with number of non-NA/null observations over requested axis. cov([min_periods])Compute pairwise covariance of columns, excluding NA/null values cummax([axis, skipna])Return cumulative max over requested axis. cummin([axis, skipna])Return cumulative minimum over requested axis. cumprod([axis, skipna])Return cumulative product over requested axis. cumsum([axis, skipna])Return cumulative sum over requested axis. describe([percentiles, include, exclude])Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaNvalues.diff([periods, axis])1st discrete difference of object div(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator truediv). divide(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator truediv). dot(other)Matrix multiplication with DataFrame or Series objects drop([labels, axis, index, columns, level, ...])Return new object with labels in requested axis removed. drop_duplicates([subset, keep, inplace])Return DataFrame with duplicate rows removed, optionally only dropna([axis, how, thresh, subset, inplace])Return object with labels on given axis omitted where alternately any duplicated([subset, keep])Return boolean Series denoting duplicate rows, optionally only eq(other[, axis, level])Wrapper for flexible comparison methods eq equals(other)Determines if two NDFrame objects contain the same elements. eval(expr[, inplace])Evaluate an expression in the context of the calling DataFrame instance. ewm([com, span, halflife, alpha, ...])Provides exponential weighted functions expanding([min_periods, freq, center, axis])Provides expanding transformations. ffill([axis, inplace, limit, downcast])Synonym for DataFrame.fillna(method='ffill')fillna([value, method, axis, inplace, ...])Fill NA/NaN values using the specified method filter([items, like, regex, axis])Subset rows or columns of dataframe according to labels in the specified index. first(offset)Convenience method for subsetting initial periods of time series data based on a date offset. first_valid_index()Return index for first non-NA/null value. floordiv(other[, axis, level, fill_value])Integer division of dataframe and other, element-wise (binary operator floordiv). from_csv(path[, header, sep, index_col, ...])Read CSV file (DEPRECATED, please use pandas.read_csv()instead).from_dict(data[, orient, dtype])Construct DataFrame from dict of array-like or dicts from_items(items[, columns, orient])Convert (key, value) pairs to DataFrame. from_records(data[, index, exclude, ...])Convert structured or record ndarray to DataFrame ge(other[, axis, level])Wrapper for flexible comparison methods ge get(key[, default])Get item from object for given key (DataFrame column, Panel slice, etc.). get_dtype_counts()Return the counts of dtypes in this object. get_ftype_counts()Return the counts of ftypes in this object. get_value(index, col[, takeable])Quickly retrieve single value at passed column and index get_values()same as values (but handles sparseness conversions) groupby([by, axis, level, as_index, sort, ...])Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns. gt(other[, axis, level])Wrapper for flexible comparison methods gt head([n])Return the first n rows. hist(data[, column, by, grid, xlabelsize, ...])Draw histogram of the DataFrame’s series using matplotlib / pylab. idxmax([axis, skipna])Return index of first occurrence of maximum over requested axis. idxmin([axis, skipna])Return index of first occurrence of minimum over requested axis. infer_objects()Attempt to infer better dtypes for object columns. info([verbose, buf, max_cols, memory_usage, ...])Concise summary of a DataFrame. insert(loc, column, value[, allow_duplicates])Insert column into DataFrame at specified location. interpolate([method, axis, limit, inplace, ...])Interpolate values according to different methods. isin(values)Return boolean DataFrame showing whether each element in the DataFrame is contained in values. isna()Return a boolean same-sized object indicating if the values are NA. isnull()Return a boolean same-sized object indicating if the values are NA. items()Iterator over (column name, Series) pairs. iteritems()Iterator over (column name, Series) pairs. iterrows()Iterate over DataFrame rows as (index, Series) pairs. itertuples([index, name])Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple. join(other[, on, how, lsuffix, rsuffix, sort])Join columns with other DataFrame either on index or on a key column. keys()Get the ‘info axis’ (see Indexing for more) kurt([axis, skipna, level, numeric_only])Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). kurtosis([axis, skipna, level, numeric_only])Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). last(offset)Convenience method for subsetting final periods of time series data based on a date offset. last_valid_index()Return index for last non-NA/null value. le(other[, axis, level])Wrapper for flexible comparison methods le lookup(row_labels, col_labels)Label-based “fancy indexing” function for DataFrame. lt(other[, axis, level])Wrapper for flexible comparison methods lt mad([axis, skipna, level])Return the mean absolute deviation of the values for the requested axis mask(cond[, other, inplace, axis, level, ...])Return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other. max([axis, skipna, level, numeric_only])This method returns the maximum of the values in the object. mean([axis, skipna, level, numeric_only])Return the mean of the values for the requested axis median([axis, skipna, level, numeric_only])Return the median of the values for the requested axis melt([id_vars, value_vars, var_name, ...])“Unpivots” a DataFrame from wide format to long format, optionally memory_usage([index, deep])Memory usage of DataFrame columns. merge(right[, how, on, left_on, right_on, ...])Merge DataFrame objects by performing a database-style join operation by columns or indexes. min([axis, skipna, level, numeric_only])This method returns the minimum of the values in the object. mod(other[, axis, level, fill_value])Modulo of dataframe and other, element-wise (binary operator mod). mode([axis, numeric_only])Gets the mode(s) of each element along the axis selected. mul(other[, axis, level, fill_value])Multiplication of dataframe and other, element-wise (binary operator mul). multiply(other[, axis, level, fill_value])Multiplication of dataframe and other, element-wise (binary operator mul). ne(other[, axis, level])Wrapper for flexible comparison methods ne nlargest(n, columns[, keep])Get the rows of a DataFrame sorted by the n largest values of columns. notna()Return a boolean same-sized object indicating if the values are not NA. notnull()Return a boolean same-sized object indicating if the values are not NA. nsmallest(n, columns[, keep])Get the rows of a DataFrame sorted by the n smallest values of columns. nunique([axis, dropna])Return Series with number of distinct observations over requested axis. pct_change([periods, fill_method, limit, freq])Percent change over given number of periods. pipe(func, *args, **kwargs)Apply func(self, *args, **kwargs) pivot([index, columns, values])Reshape data (produce a “pivot” table) based on column values. pivot_table([values, index, columns, ...])Create a spreadsheet-style pivot table as a DataFrame. plotalias of FramePlotMethodspop(item)Return item and drop from frame. pow(other[, axis, level, fill_value])Exponential power of dataframe and other, element-wise (binary operator pow). prod([axis, skipna, level, numeric_only])Return the product of the values for the requested axis product([axis, skipna, level, numeric_only])Return the product of the values for the requested axis quantile([q, axis, numeric_only, interpolation])Return values at the given quantile over requested axis, a la numpy.percentile. query(expr[, inplace])Query the columns of a frame with a boolean expression. radd(other[, axis, level, fill_value])Addition of dataframe and other, element-wise (binary operator radd). rank([axis, method, numeric_only, ...])Compute numerical data ranks (1 through n) along axis. rdiv(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator rtruediv). reindex([labels, index, columns, axis, ...])Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. reindex_axis(labels[, axis, method, level, ...])Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. reindex_like(other[, method, copy, limit, ...])Return an object with matching indices to myself. rename([mapper, index, columns, axis, copy, ...])Alter axes labels. rename_axis(mapper[, axis, copy, inplace])Alter the name of the index or columns. reorder_levels(order[, axis])Rearrange index levels using input order. replace([to_replace, value, inplace, limit, ...])Replace values given in ‘to_replace’ with ‘value’. resample(rule[, how, axis, fill_method, ...])Convenience method for frequency conversion and resampling of time series. reset_index([level, drop, inplace, ...])For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. rfloordiv(other[, axis, level, fill_value])Integer division of dataframe and other, element-wise (binary operator rfloordiv). rmod(other[, axis, level, fill_value])Modulo of dataframe and other, element-wise (binary operator rmod). rmul(other[, axis, level, fill_value])Multiplication of dataframe and other, element-wise (binary operator rmul). rolling(window[, min_periods, freq, center, ...])Provides rolling window calculations. round([decimals])Round a DataFrame to a variable number of decimal places. rpow(other[, axis, level, fill_value])Exponential power of dataframe and other, element-wise (binary operator rpow). rsub(other[, axis, level, fill_value])Subtraction of dataframe and other, element-wise (binary operator rsub). rtruediv(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator rtruediv). sample([n, frac, replace, weights, ...])Returns a random sample of items from an axis of object. select(crit[, axis])Return data corresponding to axis labels matching criteria select_dtypes([include, exclude])Return a subset of a DataFrame including/excluding columns based on their dtype.sem([axis, skipna, level, ddof, numeric_only])Return unbiased standard error of the mean over requested axis. set_axis(labels[, axis, inplace])Assign desired index to given axis set_index(keys[, drop, append, inplace, ...])Set the DataFrame index (row labels) using one or more existing columns. set_value(index, col, value[, takeable])Put single value at passed column and index shift([periods, freq, axis])Shift index by desired number of periods with an optional time freq skew([axis, skipna, level, numeric_only])Return unbiased skew over requested axis slice_shift([periods, axis])Equivalent to shift without copying data. sort_index([axis, level, ascending, ...])Sort object by labels (along an axis) sort_values(by[, axis, ascending, inplace, ...])Sort by the values along either axis sortlevel([level, axis, ascending, inplace, ...])DEPRECATED: use DataFrame.sort_index()squeeze([axis])Squeeze length 1 dimensions. stack([level, dropna])Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels. std([axis, skipna, level, ddof, numeric_only])Return sample standard deviation over requested axis. sub(other[, axis, level, fill_value])Subtraction of dataframe and other, element-wise (binary operator sub). subtract(other[, axis, level, fill_value])Subtraction of dataframe and other, element-wise (binary operator sub). sum([axis, skipna, level, numeric_only])Return the sum of the values for the requested axis swapaxes(axis1, axis2[, copy])Interchange axes and swap values axes appropriately swaplevel([i, j, axis])Swap levels i and j in a MultiIndex on a particular axis tail([n])Return the last n rows. take(indices[, axis, convert, is_copy])Return the elements in the given positional indices along an axis. to_clipboard([excel, sep])Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example. to_csv([path_or_buf, sep, na_rep, ...])Write DataFrame to a comma-separated values (csv) file to_dense()Return dense representation of NDFrame (as opposed to sparse) to_dict([orient, into])Convert DataFrame to dictionary. to_excel(excel_writer[, sheet_name, na_rep, ...])Write DataFrame to an excel sheet to_feather(fname)write out the binary feather-format for DataFrames to_gbq(destination_table, project_id[, ...])Write a DataFrame to a Google BigQuery table. to_hdf(path_or_buf, key, **kwargs)Write the contained data to an HDF5 file using HDFStore. to_html([buf, columns, col_space, header, ...])Render a DataFrame as an HTML table. to_json([path_or_buf, orient, date_format, ...])Convert the object to a JSON string. to_latex([buf, columns, col_space, header, ...])Render an object to a tabular environment table. to_msgpack([path_or_buf, encoding])msgpack (serialize) object to input file path to_panel()Transform long (stacked) format (DataFrame) into wide (3D, Panel) format. to_parquet(fname[, engine, compression])Write a DataFrame to the binary parquet format. to_period([freq, axis, copy])Convert DataFrame from DatetimeIndex to PeriodIndex with desired to_pickle(path[, compression, protocol])Pickle (serialize) object to input file path. to_records([index, convert_datetime64])Convert DataFrame to record array. to_sparse([fill_value, kind])Convert to SparseDataFrame to_sql(name, con[, flavor, schema, ...])Write records stored in a DataFrame to a SQL database. to_stata(fname[, convert_dates, ...])A class for writing Stata binary dta files from array-like objects to_string([buf, columns, col_space, header, ...])Render a DataFrame to a console-friendly tabular output. to_timestamp([freq, how, axis, copy])Cast to DatetimeIndex of timestamps, at beginning of period to_xarray()Return an xarray object from the pandas object. transform(func, *args, **kwargs)Call function producing a like-indexed NDFrame transpose(*args, **kwargs)Transpose index and columns truediv(other[, axis, level, fill_value])Floating division of dataframe and other, element-wise (binary operator truediv). truncate([before, after, axis, copy])Truncates a sorted DataFrame/Series before and/or after some particular index value. tshift([periods, freq, axis])Shift the time index, using the index’s frequency if available. tz_convert(tz[, axis, level, copy])Convert tz-aware axis to target time zone. tz_localize(tz[, axis, level, copy, ambiguous])Localize tz-naive TimeSeries to target time zone. unstack([level, fill_value])Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. update(other[, join, overwrite, ...])Modify DataFrame in place using non-NA values from passed DataFrame. var([axis, skipna, level, ddof, numeric_only])Return unbiased variance over requested axis. where(cond[, other, inplace, axis, level, ...])Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other. xs(key[, axis, level, drop_level])Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.