pandas.DataFrame¶
- class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)¶
- Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure - Parameters : - data : numpy ndarray (structured or homogeneous), dict, or DataFrame - Dict can contain Series, arrays, constants, or list-like objects - index : Index or array-like - Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided - columns : Index or array-like - Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided - dtype : dtype, default None - Data type to force, otherwise infer - copy : boolean, default False - Copy data from inputs. Only affects DataFrame / 2d ndarray input - See also - DataFrame.from_records
- constructor from tuples, also record arrays
- DataFrame.from_dict
- from dicts of Series, arrays, or dicts
- DataFrame.from_csv
- from CSV files
- DataFrame.from_items
- from sequence of (key, value) pairs
 - Examples - >>> d = {'col1': ts1, 'col2': ts2} >>> df = DataFrame(data=d, index=index) >>> df2 = DataFrame(np.random.randn(10, 5)) >>> df3 = DataFrame(np.random.randn(10, 5), ... columns=['a', 'b', 'c', 'd', 'e']) - Attributes - T - Transpose index and columns - at - axes - blocks - Internal property, property synonym for as_blocks() - dtypes - Return the dtypes in this object - empty - True if NDFrame is entirely empty [no items] - ftypes - Return the ftypes (indication of sparse/dense and dtype) - iat - iloc - ix - loc - ndim - Number of axes / array dimensions - shape - size - number of elements in the NDFrame - values - Numpy representation of NDFrame - is_copy - Methods - abs() - Return an object with absolute value taken. - add(other[, axis, level, fill_value]) - Binary operator add with support to substitute a fill_value for missing data in - add_prefix(prefix) - Concatenate prefix string with panel items names. - add_suffix(suffix) - Concatenate suffix string with panel items names - align(other[, join, axis, level, copy, ...]) - Align two object on their axes with the - all([axis, bool_only, skipna, level]) - Return whether all elements are True over requested axis - any([axis, bool_only, skipna, level]) - Return whether any element is True over requested axis - append(other[, ignore_index, verify_integrity]) - Append columns of other to end of this frame’s columns and index, returning a new object. - apply(func[, axis, broadcast, raw, reduce, args]) - Applies function along input axis of DataFrame. - applymap(func) - Apply a function to a DataFrame that is intended to operate - as_blocks() - Convert the frame to a dict of dtype -> Constructor Types that each has - as_matrix([columns]) - Convert the frame to its Numpy-array representation. - asfreq(freq[, method, how, normalize]) - Convert all TimeSeries inside to specified frequency using DateOffset - astype(dtype[, copy, raise_on_error]) - Cast object to input numpy.dtype - at_time(time[, asof]) - Select values at particular time of day (e.g. - between_time(start_time, end_time[, ...]) - Select values between particular times of the day (e.g., 9:00-9:30 AM) - bfill([axis, inplace, limit, downcast]) - Synonym for NDFrame.fillna(method=’bfill’) - bool() - Return the bool of a single element PandasObject - boxplot([column, by, ax, fontsize, rot, ...]) - Make a box plot from DataFrame column optionally grouped by some columns or - clip([lower, upper, out]) - Trim values at input threshold(s) - clip_lower(threshold) - Return copy of the input with values below given value truncated - clip_upper(threshold) - Return copy of input with values above given value truncated - combine(other, func[, fill_value, overwrite]) - Add two DataFrame objects and do not propagate NaN values, so if for a - combineAdd(other) - Add two DataFrame objects and do not propagate - combineMult(other) - Multiply two DataFrame objects and do not propagate NaN values, so if - combine_first(other) - Combine two DataFrame objects and default to non-null values in frame - compound([axis, skipna, level]) - Return the compound percentage of the values for the requested axis - consolidate([inplace]) - Compute NDFrame with “consolidated” internals (data of each dtype - convert_objects([convert_dates, ...]) - Attempt to infer better dtype for object columns - copy([deep]) - Make a copy of this object - corr([method, min_periods]) - Compute pairwise correlation of columns, excluding NA/null values - corrwith(other[, axis, drop]) - Compute pairwise correlation between rows or columns of two DataFrame - count([axis, level, numeric_only]) - Return Series with number of non-NA/null observations over requested - cov([min_periods]) - Compute pairwise covariance of columns, excluding NA/null values - cummax([axis, dtype, out, skipna]) - Return cumulative max over requested axis. - cummin([axis, dtype, out, skipna]) - Return cumulative min over requested axis. - cumprod([axis, dtype, out, skipna]) - Return cumulative prod over requested axis. - cumsum([axis, dtype, out, skipna]) - Return cumulative sum over requested axis. - describe([percentile_width, percentiles, ...]) - Generate various summary statistics, excluding NaN values. - diff([periods]) - 1st discrete difference of object - div(other[, axis, level, fill_value]) - Binary operator truediv with support to substitute a fill_value for missing data in - divide(other[, axis, level, fill_value]) - Binary operator truediv with support to substitute a fill_value for missing data in - dot(other) - Matrix multiplication with DataFrame or Series objects - drop(labels[, axis, level, inplace]) - Return new object with labels in requested axis removed - drop_duplicates(*args, **kwargs) - Return DataFrame with duplicate rows removed, optionally only - dropna([axis, how, thresh, subset, inplace]) - Return object with labels on given axis omitted where alternately any - duplicated(*args, **kwargs) - Return boolean Series denoting duplicate rows, optionally only - eq(other[, axis, level]) - Wrapper for flexible comparison methods eq - equals(other) - Determines if two NDFrame objects contain the same elements. NaNs in the - eval(expr, **kwargs) - Evaluate an expression in the context of the calling DataFrame - ffill([axis, inplace, limit, downcast]) - Synonym for NDFrame.fillna(method=’ffill’) - fillna([value, method, axis, inplace, ...]) - Fill NA/NaN values using the specified method - filter([items, like, regex, axis]) - Restrict the info axis to set of items or wildcard - first(offset) - Convenience method for subsetting initial periods of time series data - first_valid_index() - Return label for first non-NA/null value - floordiv(other[, axis, level, fill_value]) - Binary operator floordiv with support to substitute a fill_value for missing data in - from_csv(path[, header, sep, index_col, ...]) - Read delimited file into DataFrame - from_dict(data[, orient, dtype]) - Construct DataFrame from dict of array-like or dicts - from_items(items[, columns, orient]) - Convert (key, value) pairs to DataFrame. The keys will be the axis - from_records(data[, index, exclude, ...]) - Convert structured or record ndarray to DataFrame - ge(other[, axis, level]) - Wrapper for flexible comparison methods ge - get(key[, default]) - Get item from object for given key (DataFrame column, Panel slice, - get_dtype_counts() - Return the counts of dtypes in this object - get_ftype_counts() - Return the counts of ftypes in this object - get_value(index, col[, takeable]) - Quickly retrieve single value at passed column and index - get_values() - same as values (but handles sparseness conversions) - groupby([by, axis, level, as_index, sort, ...]) - Group series using mapper (dict or key function, apply given function - gt(other[, axis, level]) - Wrapper for flexible comparison methods gt - head([n]) - Returns first n rows - hist(data[, column, by, grid, xlabelsize, ...]) - Draw histogram of the DataFrame’s series using matplotlib / pylab. - icol(i) - idxmax([axis, skipna]) - Return index of first occurrence of maximum over requested axis. - idxmin([axis, skipna]) - Return index of first occurrence of minimum over requested axis. - iget_value(i, j) - info([verbose, buf, max_cols, memory_usage, ...]) - Concise summary of a DataFrame. - insert(loc, column, value[, allow_duplicates]) - Insert column into DataFrame at specified location. - interpolate([method, axis, limit, inplace, ...]) - Interpolate values according to different methods. - irow(i[, copy]) - isin(values) - Return boolean DataFrame showing whether each element in the - isnull() - Return a boolean same-sized object indicating if the values are null .. - iteritems() - Iterator over (column, series) pairs - iterkv(*args, **kwargs) - iteritems alias used to get around 2to3. Deprecated - iterrows() - Iterate over rows of DataFrame as (index, Series) pairs. - itertuples([index]) - Iterate over rows of DataFrame as tuples, with index value - join(other[, on, how, lsuffix, rsuffix, sort]) - Join columns with other DataFrame either on index or on a key - keys() - Get the ‘info axis’ (see Indexing for more) - kurt([axis, skipna, level, numeric_only]) - Return unbiased kurtosis over requested axis - kurtosis([axis, skipna, level, numeric_only]) - Return unbiased kurtosis over requested axis - last(offset) - Convenience method for subsetting final periods of time series data - last_valid_index() - Return label for last non-NA/null value - le(other[, axis, level]) - Wrapper for flexible comparison methods le - load(path) - Deprecated. - lookup(row_labels, col_labels) - Label-based “fancy indexing” function for DataFrame. - lt(other[, axis, level]) - Wrapper for flexible comparison methods lt - mad([axis, skipna, level]) - Return the mean absolute deviation of the values for the requested axis - mask(cond) - Returns copy whose values are replaced with nan if the - max([axis, skipna, level, numeric_only]) - This method returns the maximum of the values in the object. - mean([axis, skipna, level, numeric_only]) - Return the mean of the values for the requested axis - median([axis, skipna, level, numeric_only]) - Return the median of the values for the requested axis - memory_usage([index]) - Memory usage of DataFrame columns. - merge(right[, how, on, left_on, right_on, ...]) - Merge DataFrame objects by performing a database-style join operation by - min([axis, skipna, level, numeric_only]) - This method returns the minimum of the values in the object. - mod(other[, axis, level, fill_value]) - Binary operator mod with support to substitute a fill_value for missing data in - mode([axis, numeric_only]) - Gets the mode of each element along the axis selected. - mul(other[, axis, level, fill_value]) - Binary operator mul with support to substitute a fill_value for missing data in - multiply(other[, axis, level, fill_value]) - Binary operator mul with support to substitute a fill_value for missing data in - ne(other[, axis, level]) - Wrapper for flexible comparison methods ne - notnull() - Return a boolean same-sized object indicating if the values are not null .. - pct_change([periods, fill_method, limit, freq]) - Percent change over given number of periods. - pivot([index, columns, values]) - Reshape data (produce a “pivot” table) based on column values. - pivot_table(*args, **kwargs) - Create a spreadsheet-style pivot table as a DataFrame. The levels in the - plot(data[, x, y, kind, ax, subplots, ...]) - Make plots of DataFrame using matplotlib / pylab. - pop(item) - Return item and drop from frame. - pow(other[, axis, level, fill_value]) - Binary operator pow with support to substitute a fill_value for missing data in - prod([axis, skipna, level, numeric_only]) - Return the product of the values for the requested axis - product([axis, skipna, level, numeric_only]) - Return the product of the values for the requested axis - quantile([q, axis, numeric_only]) - Return values at the given quantile over requested axis, a la numpy.percentile. - query(expr, **kwargs) - Query the columns of a frame with a boolean expression. - radd(other[, axis, level, fill_value]) - Binary operator radd with support to substitute a fill_value for missing data in - rank([axis, numeric_only, method, ...]) - Compute numerical data ranks (1 through n) along axis. - rdiv(other[, axis, level, fill_value]) - Binary operator rtruediv with support to substitute a fill_value for missing data in - reindex([index, columns]) - Conform DataFrame to new index with optional filling logic, placing - reindex_axis(labels[, axis, method, level, ...]) - Conform input object to new index with optional filling logic, - reindex_like(other[, method, copy, limit]) - return an object with matching indicies to myself - rename([index, columns]) - Alter axes input function or functions. - rename_axis(mapper[, axis, copy, inplace]) - Alter index and / or columns using input function or functions. - reorder_levels(order[, axis]) - Rearrange index levels using input order. - replace([to_replace, value, inplace, limit, ...]) - Replace values given in ‘to_replace’ with ‘value’. - resample(rule[, how, axis, fill_method, ...]) - Convenience method for frequency conversion and resampling of regular time-series data. - reset_index([level, drop, inplace, ...]) - For DataFrame with multi-level index, return new DataFrame with - rfloordiv(other[, axis, level, fill_value]) - Binary operator rfloordiv with support to substitute a fill_value for missing data in - rmod(other[, axis, level, fill_value]) - Binary operator rmod with support to substitute a fill_value for missing data in - rmul(other[, axis, level, fill_value]) - Binary operator rmul with support to substitute a fill_value for missing data in - rpow(other[, axis, level, fill_value]) - Binary operator rpow with support to substitute a fill_value for missing data in - rsub(other[, axis, level, fill_value]) - Binary operator rsub with support to substitute a fill_value for missing data in - rtruediv(other[, axis, level, fill_value]) - Binary operator rtruediv with support to substitute a fill_value for missing data in - save(path) - Deprecated. - select(crit[, axis]) - Return data corresponding to axis labels matching criteria - select_dtypes([include, exclude]) - Return a subset of a DataFrame including/excluding columns based on - sem([axis, skipna, level, ddof]) - Return unbiased standard error of the mean over requested axis. - set_axis(axis, labels) - public verson of axis assignment - set_index(keys[, drop, append, inplace, ...]) - Set the DataFrame index (row labels) using one or more existing - set_value(index, col, value[, takeable]) - Put single value at passed column and index - shift([periods, freq, axis]) - Shift index by desired number of periods with an optional time freq - skew([axis, skipna, level, numeric_only]) - Return unbiased skew over requested axis - slice_shift([periods, axis]) - Equivalent to shift without copying data. - sort([columns, axis, ascending, inplace, ...]) - Sort DataFrame either by labels (along either axis) or by the values in - sort_index([axis, by, ascending, inplace, ...]) - Sort DataFrame either by labels (along either axis) or by the values in - sortlevel([level, axis, ascending, inplace, ...]) - Sort multilevel index by chosen axis and primary level. - squeeze() - squeeze length 1 dimensions - stack([level, dropna]) - Pivot a level of the (possibly hierarchical) column labels, returning a - std([axis, skipna, level, ddof]) - Return unbiased standard deviation over requested axis. - sub(other[, axis, level, fill_value]) - Binary operator sub with support to substitute a fill_value for missing data in - subtract(other[, axis, level, fill_value]) - Binary operator sub with support to substitute a fill_value for missing data in - sum([axis, skipna, level, numeric_only]) - Return the sum of the values for the requested axis - swapaxes(axis1, axis2[, copy]) - Interchange axes and swap values axes appropriately - swaplevel(i, j[, axis]) - Swap levels i and j in a MultiIndex on a particular axis - tail([n]) - Returns last n rows - take(indices[, axis, convert, is_copy]) - Analogous to ndarray.take - to_clipboard([excel, sep]) - Attempt to write text representation of object to the system clipboard - to_csv(*args, **kwargs) - Write DataFrame to a comma-separated values (csv) file - to_dense() - Return dense representation of NDFrame (as opposed to sparse) - to_dict(*args, **kwargs) - Convert DataFrame to dictionary. - to_excel(*args, **kwargs) - Write DataFrame to a excel sheet - to_gbq(destination_table[, project_id, ...]) - Write a DataFrame to a Google BigQuery table. - to_hdf(path_or_buf, key, **kwargs) - activate the HDFStore - to_html([buf, columns, col_space, colSpace, ...]) - Render a DataFrame as an HTML table. - to_json([path_or_buf, orient, date_format, ...]) - Convert the object to a JSON string. - to_latex([buf, columns, col_space, ...]) - Render a DataFrame to a tabular environment table. You can splice - to_msgpack([path_or_buf]) - msgpack (serialize) object to input file path - to_panel() - Transform long (stacked) format (DataFrame) into wide (3D, Panel) - to_period([freq, axis, copy]) - Convert DataFrame from DatetimeIndex to PeriodIndex with desired - to_pickle(path) - Pickle (serialize) object to input file path - to_records([index, convert_datetime64]) - Convert DataFrame to record array. Index will be put in the - to_sparse([fill_value, kind]) - Convert to SparseDataFrame - to_sql(name, con[, flavor, schema, ...]) - Write records stored in a DataFrame to a SQL database. - to_stata(fname[, convert_dates, ...]) - A class for writing Stata binary dta files from array-like objects - to_string([buf, columns, col_space, ...]) - Render a DataFrame to a console-friendly tabular output. - to_timestamp([freq, how, axis, copy]) - Cast to DatetimeIndex of timestamps, at beginning of period - to_wide(*args, **kwargs) - transpose() - Transpose index and columns - truediv(other[, axis, level, fill_value]) - Binary operator truediv with support to substitute a fill_value for missing data in - truncate([before, after, axis, copy]) - Truncates a sorted NDFrame before and/or after some particular - tshift([periods, freq, axis]) - Shift the time index, using the index’s frequency if available - tz_convert(tz[, axis, level, copy]) - Convert the axis to target time zone. - tz_localize(*args, **kwargs) - Localize tz-naive TimeSeries to target time zone - unstack([level]) - Pivot a level of the (necessarily hierarchical) index labels, returning - update(other[, join, overwrite, ...]) - Modify DataFrame in place using non-NA values from passed - var([axis, skipna, level, ddof]) - Return unbiased variance over requested axis. - where(cond[, other, inplace, axis, level, ...]) - Return an object of same shape as self and whose corresponding - xs(key[, axis, level, copy, drop_level]) - Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.