pandas.DataFrame¶
- class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)¶
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure
Parameters: data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
index : Index or array-like
Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided
columns : Index or array-like
Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided
dtype : dtype, default None
Data type to force, otherwise infer
copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input
See also
- DataFrame.from_records
- constructor from tuples, also record arrays
- DataFrame.from_dict
- from dicts of Series, arrays, or dicts
- DataFrame.from_csv
- from CSV files
- DataFrame.from_items
- from sequence of (key, value) pairs
Examples
>>> d = {'col1': ts1, 'col2': ts2} >>> df = DataFrame(data=d, index=index) >>> df2 = DataFrame(np.random.randn(10, 5)) >>> df3 = DataFrame(np.random.randn(10, 5), ... columns=['a', 'b', 'c', 'd', 'e'])
Attributes
T Transpose index and columns at Fast label-based scalar accessor axes blocks Internal property, property synonym for as_blocks() dtypes Return the dtypes in this object empty True if NDFrame is entirely empty [no items] ftypes Return the ftypes (indication of sparse/dense and dtype) in this object. iat Fast integer location scalar accessor. iloc Purely integer-location based indexing for selection by position. ix A primarily label-location based indexer, with integer position fallback. loc Purely label-location based indexer for selection by label. ndim Number of axes / array dimensions shape size number of elements in the NDFrame values Numpy representation of NDFrame is_copy Methods
abs() Return an object with absolute value taken. add(other[, axis, level, fill_value]) Binary operator add with support to substitute a fill_value for missing data in add_prefix(prefix) Concatenate prefix string with panel items names. add_suffix(suffix) Concatenate suffix string with panel items names align(other[, join, axis, level, copy, ...]) Align two object on their axes with the all([axis, bool_only, skipna, level]) Return whether all elements are True over requested axis any([axis, bool_only, skipna, level]) Return whether any element is True over requested axis append(other[, ignore_index, verify_integrity]) Append rows of other to the end of this frame, returning a new object. apply(func[, axis, broadcast, raw, reduce, args]) Applies function along input axis of DataFrame. applymap(func) Apply a function to a DataFrame that is intended to operate elementwise, i.e. as_blocks() Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype. as_matrix([columns]) Convert the frame to its Numpy-array representation. asfreq(freq[, method, how, normalize]) Convert all TimeSeries inside to specified frequency using DateOffset objects. assign(**kwargs) Assign new columns to a DataFrame, returning a new object (a copy) with all the original columns in addition to the new ones. astype(dtype[, copy, raise_on_error]) Cast object to input numpy.dtype at_time(time[, asof]) Select values at particular time of day (e.g. between_time(start_time, end_time[, ...]) Select values between particular times of the day (e.g., 9:00-9:30 AM) bfill([axis, inplace, limit, downcast]) Synonym for NDFrame.fillna(method=’bfill’) bool() Return the bool of a single element PandasObject boxplot([column, by, ax, fontsize, rot, ...]) Make a box plot from DataFrame column optionally grouped by some columns or clip([lower, upper, out]) Trim values at input threshold(s) clip_lower(threshold) Return copy of the input with values below given value truncated clip_upper(threshold) Return copy of input with values above given value truncated combine(other, func[, fill_value, overwrite]) Add two DataFrame objects and do not propagate NaN values, so if for a combineAdd(other) Add two DataFrame objects and do not propagate combineMult(other) Multiply two DataFrame objects and do not propagate NaN values, so if combine_first(other) Combine two DataFrame objects and default to non-null values in frame calling the method. compound([axis, skipna, level]) Return the compound percentage of the values for the requested axis consolidate([inplace]) Compute NDFrame with “consolidated” internals (data of each dtype grouped together in a single ndarray). convert_objects([convert_dates, ...]) Attempt to infer better dtype for object columns copy([deep]) Make a copy of this object corr([method, min_periods]) Compute pairwise correlation of columns, excluding NA/null values corrwith(other[, axis, drop]) Compute pairwise correlation between rows or columns of two DataFrame objects. count([axis, level, numeric_only]) Return Series with number of non-NA/null observations over requested axis. cov([min_periods]) Compute pairwise covariance of columns, excluding NA/null values cummax([axis, dtype, out, skipna]) Return cumulative max over requested axis. cummin([axis, dtype, out, skipna]) Return cumulative min over requested axis. cumprod([axis, dtype, out, skipna]) Return cumulative prod over requested axis. cumsum([axis, dtype, out, skipna]) Return cumulative sum over requested axis. describe([percentile_width, percentiles, ...]) Generate various summary statistics, excluding NaN values. diff([periods]) 1st discrete difference of object div(other[, axis, level, fill_value]) Binary operator truediv with support to substitute a fill_value for missing data in divide(other[, axis, level, fill_value]) Binary operator truediv with support to substitute a fill_value for missing data in dot(other) Matrix multiplication with DataFrame or Series objects drop(labels[, axis, level, inplace]) Return new object with labels in requested axis removed drop_duplicates(*args, **kwargs) Return DataFrame with duplicate rows removed, optionally only dropna([axis, how, thresh, subset, inplace]) Return object with labels on given axis omitted where alternately any duplicated(*args, **kwargs) Return boolean Series denoting duplicate rows, optionally only eq(other[, axis, level]) Wrapper for flexible comparison methods eq equals(other) Determines if two NDFrame objects contain the same elements. eval(expr, **kwargs) Evaluate an expression in the context of the calling DataFrame instance. ffill([axis, inplace, limit, downcast]) Synonym for NDFrame.fillna(method=’ffill’) fillna([value, method, axis, inplace, ...]) Fill NA/NaN values using the specified method filter([items, like, regex, axis]) Restrict the info axis to set of items or wildcard first(offset) Convenience method for subsetting initial periods of time series data first_valid_index() Return label for first non-NA/null value floordiv(other[, axis, level, fill_value]) Binary operator floordiv with support to substitute a fill_value for missing data in from_csv(path[, header, sep, index_col, ...]) Read delimited file into DataFrame from_dict(data[, orient, dtype]) Construct DataFrame from dict of array-like or dicts from_items(items[, columns, orient]) Convert (key, value) pairs to DataFrame. from_records(data[, index, exclude, ...]) Convert structured or record ndarray to DataFrame ge(other[, axis, level]) Wrapper for flexible comparison methods ge get(key[, default]) Get item from object for given key (DataFrame column, Panel slice, etc.). get_dtype_counts() Return the counts of dtypes in this object get_ftype_counts() Return the counts of ftypes in this object get_value(index, col[, takeable]) Quickly retrieve single value at passed column and index get_values() same as values (but handles sparseness conversions) groupby([by, axis, level, as_index, sort, ...]) Group series using mapper (dict or key function, apply given function gt(other[, axis, level]) Wrapper for flexible comparison methods gt head([n]) Returns first n rows hist(data[, column, by, grid, xlabelsize, ...]) Draw histogram of the DataFrame’s series using matplotlib / pylab. icol(i) idxmax([axis, skipna]) Return index of first occurrence of maximum over requested axis. idxmin([axis, skipna]) Return index of first occurrence of minimum over requested axis. iget_value(i, j) info([verbose, buf, max_cols, memory_usage, ...]) Concise summary of a DataFrame. insert(loc, column, value[, allow_duplicates]) Insert column into DataFrame at specified location. interpolate([method, axis, limit, inplace, ...]) Interpolate values according to different methods. irow(i[, copy]) isin(values) Return boolean DataFrame showing whether each element in the DataFrame is contained in values. isnull() Return a boolean same-sized object indicating if the values are null iteritems() Iterator over (column, series) pairs iterkv(*args, **kwargs) iteritems alias used to get around 2to3. Deprecated iterrows() Iterate over rows of DataFrame as (index, Series) pairs. itertuples([index]) Iterate over rows of DataFrame as tuples, with index value join(other[, on, how, lsuffix, rsuffix, sort]) Join columns with other DataFrame either on index or on a key column. keys() Get the ‘info axis’ (see Indexing for more) kurt([axis, skipna, level, numeric_only]) Return unbiased kurtosis over requested axis using Fishers definition of kurtosis (kurtosis of normal == 0.0). kurtosis([axis, skipna, level, numeric_only]) Return unbiased kurtosis over requested axis using Fishers definition of kurtosis (kurtosis of normal == 0.0). last(offset) Convenience method for subsetting final periods of time series data last_valid_index() Return label for last non-NA/null value le(other[, axis, level]) Wrapper for flexible comparison methods le load(path) Deprecated. lookup(row_labels, col_labels) Label-based “fancy indexing” function for DataFrame. lt(other[, axis, level]) Wrapper for flexible comparison methods lt mad([axis, skipna, level]) Return the mean absolute deviation of the values for the requested axis mask(cond) Returns copy whose values are replaced with nan if the max([axis, skipna, level, numeric_only]) This method returns the maximum of the values in the object. mean([axis, skipna, level, numeric_only]) Return the mean of the values for the requested axis median([axis, skipna, level, numeric_only]) Return the median of the values for the requested axis memory_usage([index]) Memory usage of DataFrame columns. merge(right[, how, on, left_on, right_on, ...]) Merge DataFrame objects by performing a database-style join operation by columns or indexes. min([axis, skipna, level, numeric_only]) This method returns the minimum of the values in the object. mod(other[, axis, level, fill_value]) Binary operator mod with support to substitute a fill_value for missing data in mode([axis, numeric_only]) Gets the mode of each element along the axis selected. mul(other[, axis, level, fill_value]) Binary operator mul with support to substitute a fill_value for missing data in multiply(other[, axis, level, fill_value]) Binary operator mul with support to substitute a fill_value for missing data in ne(other[, axis, level]) Wrapper for flexible comparison methods ne notnull() Return a boolean same-sized object indicating if the values are pct_change([periods, fill_method, limit, freq]) Percent change over given number of periods. pivot([index, columns, values]) Reshape data (produce a “pivot” table) based on column values. pivot_table(data[, values, index, columns, ...]) Create a spreadsheet-style pivot table as a DataFrame. plot(data[, x, y, kind, ax, subplots, ...]) Make plots of DataFrame using matplotlib / pylab. pop(item) Return item and drop from frame. pow(other[, axis, level, fill_value]) Binary operator pow with support to substitute a fill_value for missing data in prod([axis, skipna, level, numeric_only]) Return the product of the values for the requested axis product([axis, skipna, level, numeric_only]) Return the product of the values for the requested axis quantile([q, axis, numeric_only]) Return values at the given quantile over requested axis, a la numpy.percentile. query(expr, **kwargs) Query the columns of a frame with a boolean expression. radd(other[, axis, level, fill_value]) Binary operator radd with support to substitute a fill_value for missing data in rank([axis, numeric_only, method, ...]) Compute numerical data ranks (1 through n) along axis. rdiv(other[, axis, level, fill_value]) Binary operator rtruediv with support to substitute a fill_value for missing data in reindex([index, columns]) Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. reindex_axis(labels[, axis, method, level, ...]) Conform input object to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. reindex_like(other[, method, copy, limit]) return an object with matching indicies to myself rename([index, columns]) Alter axes input function or functions. rename_axis(mapper[, axis, copy, inplace]) Alter index and / or columns using input function or functions. reorder_levels(order[, axis]) Rearrange index levels using input order. replace([to_replace, value, inplace, limit, ...]) Replace values given in ‘to_replace’ with ‘value’. resample(rule[, how, axis, fill_method, ...]) Convenience method for frequency conversion and resampling of regular time-series data. reset_index([level, drop, inplace, ...]) For DataFrame with multi-level index, return new DataFrame with labeling information in the columns under the index names, defaulting to ‘level_0’, ‘level_1’, etc. rfloordiv(other[, axis, level, fill_value]) Binary operator rfloordiv with support to substitute a fill_value for missing data in rmod(other[, axis, level, fill_value]) Binary operator rmod with support to substitute a fill_value for missing data in rmul(other[, axis, level, fill_value]) Binary operator rmul with support to substitute a fill_value for missing data in rpow(other[, axis, level, fill_value]) Binary operator rpow with support to substitute a fill_value for missing data in rsub(other[, axis, level, fill_value]) Binary operator rsub with support to substitute a fill_value for missing data in rtruediv(other[, axis, level, fill_value]) Binary operator rtruediv with support to substitute a fill_value for missing data in save(path) Deprecated. select(crit[, axis]) Return data corresponding to axis labels matching criteria select_dtypes([include, exclude]) Return a subset of a DataFrame including/excluding columns based on their dtype. sem([axis, skipna, level, ddof, numeric_only]) Return unbiased standard error of the mean over requested axis. set_axis(axis, labels) public verson of axis assignment set_index(keys[, drop, append, inplace, ...]) Set the DataFrame index (row labels) using one or more existing columns. set_value(index, col, value[, takeable]) Put single value at passed column and index shift([periods, freq, axis]) Shift index by desired number of periods with an optional time freq skew([axis, skipna, level, numeric_only]) Return unbiased skew over requested axis slice_shift([periods, axis]) Equivalent to shift without copying data. sort([columns, axis, ascending, inplace, ...]) Sort DataFrame either by labels (along either axis) or by the values in sort_index([axis, by, ascending, inplace, ...]) Sort DataFrame either by labels (along either axis) or by the values in sortlevel([level, axis, ascending, inplace, ...]) Sort multilevel index by chosen axis and primary level. squeeze() squeeze length 1 dimensions stack([level, dropna]) Pivot a level of the (possibly hierarchical) column labels, returning a DataFrame (or Series in the case of an object with a single level of column labels) having a hierarchical index with a new inner-most level of row labels. std([axis, skipna, level, ddof, numeric_only]) Return unbiased standard deviation over requested axis. sub(other[, axis, level, fill_value]) Binary operator sub with support to substitute a fill_value for missing data in subtract(other[, axis, level, fill_value]) Binary operator sub with support to substitute a fill_value for missing data in sum([axis, skipna, level, numeric_only]) Return the sum of the values for the requested axis swapaxes(axis1, axis2[, copy]) Interchange axes and swap values axes appropriately swaplevel(i, j[, axis]) Swap levels i and j in a MultiIndex on a particular axis tail([n]) Returns last n rows take(indices[, axis, convert, is_copy]) Analogous to ndarray.take to_clipboard([excel, sep]) Attempt to write text representation of object to the system clipboard This can be pasted into Excel, for example. to_csv([path_or_buf, sep, na_rep, ...]) Write DataFrame to a comma-separated values (csv) file to_dense() Return dense representation of NDFrame (as opposed to sparse) to_dict(*args, **kwargs) Convert DataFrame to dictionary. to_excel(excel_writer[, sheet_name, na_rep, ...]) Write DataFrame to a excel sheet to_gbq(destination_table[, project_id, ...]) Write a DataFrame to a Google BigQuery table. to_hdf(path_or_buf, key, **kwargs) activate the HDFStore to_html([buf, columns, col_space, colSpace, ...]) Render a DataFrame as an HTML table. to_json([path_or_buf, orient, date_format, ...]) Convert the object to a JSON string. to_latex([buf, columns, col_space, ...]) Render a DataFrame to a tabular environment table. to_msgpack([path_or_buf]) msgpack (serialize) object to input file path to_panel() Transform long (stacked) format (DataFrame) into wide (3D, Panel) format. to_period([freq, axis, copy]) Convert DataFrame from DatetimeIndex to PeriodIndex with desired to_pickle(path) Pickle (serialize) object to input file path to_records([index, convert_datetime64]) Convert DataFrame to record array. to_sparse([fill_value, kind]) Convert to SparseDataFrame to_sql(name, con[, flavor, schema, ...]) Write records stored in a DataFrame to a SQL database. to_stata(fname[, convert_dates, ...]) A class for writing Stata binary dta files from array-like objects to_string([buf, columns, col_space, ...]) Render a DataFrame to a console-friendly tabular output. to_timestamp([freq, how, axis, copy]) Cast to DatetimeIndex of timestamps, at beginning of period to_wide(*args, **kwargs) transpose() Transpose index and columns truediv(other[, axis, level, fill_value]) Binary operator truediv with support to substitute a fill_value for missing data in truncate([before, after, axis, copy]) Truncates a sorted NDFrame before and/or after some particular dates. tshift([periods, freq, axis]) Shift the time index, using the index’s frequency if available tz_convert(tz[, axis, level, copy]) Convert the axis to target time zone. tz_localize(*args, **kwargs) Localize tz-naive TimeSeries to target time zone unstack([level]) Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels. update(other[, join, overwrite, ...]) Modify DataFrame in place using non-NA values from passed DataFrame. var([axis, skipna, level, ddof, numeric_only]) Return unbiased variance over requested axis. where(cond[, other, inplace, axis, level, ...]) Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other. xs(key[, axis, level, copy, drop_level]) Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.