pandas.
DataFrame
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
Dict can contain Series, arrays, constants, dataclass or list-like objects. If data is a dict, column order follows insertion-order.
Changed in version 0.25.0: If data is a list of dicts, column order follows insertion-order.
Index to use for resulting frame. Will default to RangeIndex if no indexing information part of input data and no index provided.
Column labels to use for resulting frame. Will default to RangeIndex (0, 1, 2, …, n) if no column labels are provided.
Data type to force. Only a single dtype is allowed. If None, infer.
Copy data from inputs. Only affects DataFrame / 2d ndarray input.
See also
DataFrame.from_records
Constructor from tuples, also record arrays.
DataFrame.from_dict
From dicts of Series, arrays, or dicts.
read_csv
Read a comma-separated values (csv) file into DataFrame.
read_table
Read general delimited file into DataFrame.
read_clipboard
Read text from clipboard into DataFrame.
Examples
Constructing DataFrame from a dictionary.
>>> d = {'col1': [1, 2], 'col2': [3, 4]} >>> df = pd.DataFrame(data=d) >>> df col1 col2 0 1 3 1 2 4
Notice that the inferred dtype is int64.
>>> df.dtypes col1 int64 col2 int64 dtype: object
To enforce a single dtype:
>>> df = pd.DataFrame(data=d, dtype=np.int8) >>> df.dtypes col1 int8 col2 int8 dtype: object
Constructing DataFrame from numpy ndarray:
>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), ... columns=['a', 'b', 'c']) >>> df2 a b c 0 1 2 3 1 4 5 6 2 7 8 9
Constructing DataFrame from dataclass:
>>> from dataclasses import make_dataclass >>> Point = make_dataclass("Point", [("x", int), ("y", int)]) >>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)]) x y 0 0 0 1 0 3 2 2 3
Attributes
at
Access a single value for a row/column label pair.
attrs
Dictionary of global attributes of this dataset.
axes
Return a list representing the axes of the DataFrame.
columns
The column labels of the DataFrame.
dtypes
Return the dtypes in the DataFrame.
empty
Indicator whether DataFrame is empty.
flags
Get the properties associated with this pandas object.
iat
Access a single value for a row/column pair by integer position.
iloc
Purely integer-location based indexing for selection by position.
index
The index (row labels) of the DataFrame.
loc
Access a group of rows and columns by label(s) or a boolean array.
ndim
Return an int representing the number of axes / array dimensions.
shape
Return a tuple representing the dimensionality of the DataFrame.
size
Return an int representing the number of elements in this object.
style
Returns a Styler object.
values
Return a Numpy representation of the DataFrame.
T
Methods
abs()
abs
Return a Series/DataFrame with absolute numeric value of each element.
add(other[, axis, level, fill_value])
add
Get Addition of dataframe and other, element-wise (binary operator add).
add_prefix(prefix)
add_prefix
Prefix labels with string prefix.
add_suffix(suffix)
add_suffix
Suffix labels with string suffix.
agg([func, axis])
agg
Aggregate using one or more operations over the specified axis.
aggregate([func, axis])
aggregate
align(other[, join, axis, level, copy, …])
align
Align two objects on their axes with the specified join method.
all([axis, bool_only, skipna, level])
all
Return whether all elements are True, potentially over an axis.
any([axis, bool_only, skipna, level])
any
Return whether any element is True, potentially over an axis.
append(other[, ignore_index, …])
append
Append rows of other to the end of caller, returning a new object.
apply(func[, axis, raw, result_type, args])
apply
Apply a function along an axis of the DataFrame.
applymap(func[, na_action])
applymap
Apply a function to a Dataframe elementwise.
asfreq(freq[, method, how, normalize, …])
asfreq
Convert TimeSeries to specified frequency.
asof(where[, subset])
asof
Return the last row(s) without any NaNs before where.
assign(**kwargs)
assign
Assign new columns to a DataFrame.
astype(dtype[, copy, errors])
astype
Cast a pandas object to a specified dtype dtype.
dtype
at_time(time[, asof, axis])
at_time
Select values at particular time of day (e.g., 9:30AM).
backfill([axis, inplace, limit, downcast])
backfill
Synonym for DataFrame.fillna() with method='bfill'.
DataFrame.fillna()
method='bfill'
between_time(start_time, end_time[, …])
between_time
Select values between particular times of the day (e.g., 9:00-9:30 AM).
bfill([axis, inplace, limit, downcast])
bfill
bool()
bool
Return the bool of a single element Series or DataFrame.
boxplot([column, by, ax, fontsize, rot, …])
boxplot
Make a box plot from DataFrame columns.
clip([lower, upper, axis, inplace])
clip
Trim values at input threshold(s).
combine(other, func[, fill_value, overwrite])
combine
Perform column-wise combine with another DataFrame.
combine_first(other)
combine_first
Update null elements with value in the same location in other.
compare(other[, align_axis, keep_shape, …])
compare
Compare to another DataFrame and show the differences.
convert_dtypes([infer_objects, …])
convert_dtypes
Convert columns to best possible dtypes using dtypes supporting pd.NA.
pd.NA
copy([deep])
copy
Make a copy of this object’s indices and data.
corr([method, min_periods])
corr
Compute pairwise correlation of columns, excluding NA/null values.
corrwith(other[, axis, drop, method])
corrwith
Compute pairwise correlation.
count([axis, level, numeric_only])
count
Count non-NA cells for each column or row.
cov([min_periods, ddof])
cov
Compute pairwise covariance of columns, excluding NA/null values.
cummax([axis, skipna])
cummax
Return cumulative maximum over a DataFrame or Series axis.
cummin([axis, skipna])
cummin
Return cumulative minimum over a DataFrame or Series axis.
cumprod([axis, skipna])
cumprod
Return cumulative product over a DataFrame or Series axis.
cumsum([axis, skipna])
cumsum
Return cumulative sum over a DataFrame or Series axis.
describe([percentiles, include, exclude, …])
describe
Generate descriptive statistics.
diff([periods, axis])
diff
First discrete difference of element.
div(other[, axis, level, fill_value])
div
Get Floating division of dataframe and other, element-wise (binary operator truediv).
divide(other[, axis, level, fill_value])
divide
dot(other)
dot
Compute the matrix multiplication between the DataFrame and other.
drop([labels, axis, index, columns, level, …])
drop
Drop specified labels from rows or columns.
drop_duplicates([subset, keep, inplace, …])
drop_duplicates
Return DataFrame with duplicate rows removed.
droplevel(level[, axis])
droplevel
Return DataFrame with requested index / column level(s) removed.
dropna([axis, how, thresh, subset, inplace])
dropna
Remove missing values.
duplicated([subset, keep])
duplicated
Return boolean Series denoting duplicate rows.
eq(other[, axis, level])
eq
Get Equal to of dataframe and other, element-wise (binary operator eq).
equals(other)
equals
Test whether two objects contain the same elements.
eval(expr[, inplace])
eval
Evaluate a string describing operations on DataFrame columns.
ewm([com, span, halflife, alpha, …])
ewm
Provide exponential weighted (EW) functions.
expanding([min_periods, center, axis])
expanding
Provide expanding transformations.
explode(column[, ignore_index])
explode
Transform each element of a list-like to a row, replicating index values.
ffill([axis, inplace, limit, downcast])
ffill
Synonym for DataFrame.fillna() with method='ffill'.
method='ffill'
fillna([value, method, axis, inplace, …])
fillna
Fill NA/NaN values using the specified method.
filter([items, like, regex, axis])
filter
Subset the dataframe rows or columns according to the specified index labels.
first(offset)
first
Select initial periods of time series data based on a date offset.
first_valid_index()
first_valid_index
Return index for first non-NA/null value.
floordiv(other[, axis, level, fill_value])
floordiv
Get Integer division of dataframe and other, element-wise (binary operator floordiv).
from_dict(data[, orient, dtype, columns])
from_dict
Construct DataFrame from dict of array-like or dicts.
from_records(data[, index, exclude, …])
from_records
Convert structured or record ndarray to DataFrame.
ge(other[, axis, level])
ge
Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
get(key[, default])
get
Get item from object for given key (ex: DataFrame column).
groupby([by, axis, level, as_index, sort, …])
groupby
Group DataFrame using a mapper or by a Series of columns.
gt(other[, axis, level])
gt
Get Greater than of dataframe and other, element-wise (binary operator gt).
head([n])
head
Return the first n rows.
hist([column, by, grid, xlabelsize, xrot, …])
hist
Make a histogram of the DataFrame’s.
idxmax([axis, skipna])
idxmax
Return index of first occurrence of maximum over requested axis.
idxmin([axis, skipna])
idxmin
Return index of first occurrence of minimum over requested axis.
infer_objects()
infer_objects
Attempt to infer better dtypes for object columns.
info([verbose, buf, max_cols, memory_usage, …])
info
Print a concise summary of a DataFrame.
insert(loc, column, value[, allow_duplicates])
insert
Insert column into DataFrame at specified location.
interpolate([method, axis, limit, inplace, …])
interpolate
Fill NaN values using an interpolation method.
isin(values)
isin
Whether each element in the DataFrame is contained in values.
isna()
isna
Detect missing values.
isnull()
isnull
items()
items
Iterate over (column name, Series) pairs.
iteritems()
iteritems
iterrows()
iterrows
Iterate over DataFrame rows as (index, Series) pairs.
itertuples([index, name])
itertuples
Iterate over DataFrame rows as namedtuples.
join(other[, on, how, lsuffix, rsuffix, sort])
join
Join columns of another DataFrame.
keys()
keys
Get the ‘info axis’ (see Indexing for more).
kurt([axis, skipna, level, numeric_only])
kurt
Return unbiased kurtosis over requested axis.
kurtosis([axis, skipna, level, numeric_only])
kurtosis
last(offset)
last
Select final periods of time series data based on a date offset.
last_valid_index()
last_valid_index
Return index for last non-NA/null value.
le(other[, axis, level])
le
Get Less than or equal to of dataframe and other, element-wise (binary operator le).
lookup(row_labels, col_labels)
lookup
(DEPRECATED) Label-based “fancy indexing” function for DataFrame.
lt(other[, axis, level])
lt
Get Less than of dataframe and other, element-wise (binary operator lt).
mad([axis, skipna, level])
mad
{desc}
mask(cond[, other, inplace, axis, level, …])
mask
Replace values where the condition is True.
max([axis, skipna, level, numeric_only])
max
Return the maximum of the values over the requested axis.
mean([axis, skipna, level, numeric_only])
mean
Return the mean of the values over the requested axis.
median([axis, skipna, level, numeric_only])
median
Return the median of the values over the requested axis.
melt([id_vars, value_vars, var_name, …])
melt
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
memory_usage([index, deep])
memory_usage
Return the memory usage of each column in bytes.
merge(right[, how, on, left_on, right_on, …])
merge
Merge DataFrame or named Series objects with a database-style join.
min([axis, skipna, level, numeric_only])
min
Return the minimum of the values over the requested axis.
mod(other[, axis, level, fill_value])
mod
Get Modulo of dataframe and other, element-wise (binary operator mod).
mode([axis, numeric_only, dropna])
mode
Get the mode(s) of each element along the selected axis.
mul(other[, axis, level, fill_value])
mul
Get Multiplication of dataframe and other, element-wise (binary operator mul).
multiply(other[, axis, level, fill_value])
multiply
ne(other[, axis, level])
ne
Get Not equal to of dataframe and other, element-wise (binary operator ne).
nlargest(n, columns[, keep])
nlargest
Return the first n rows ordered by columns in descending order.
notna()
notna
Detect existing (non-missing) values.
notnull()
notnull
nsmallest(n, columns[, keep])
nsmallest
Return the first n rows ordered by columns in ascending order.
nunique([axis, dropna])
nunique
Count distinct observations over requested axis.
pad([axis, inplace, limit, downcast])
pad
pct_change([periods, fill_method, limit, freq])
pct_change
Percentage change between the current and a prior element.
pipe(func, *args, **kwargs)
pipe
Apply func(self, *args, **kwargs).
pivot([index, columns, values])
pivot
Return reshaped DataFrame organized by given index / column values.
pivot_table([values, index, columns, …])
pivot_table
Create a spreadsheet-style pivot table as a DataFrame.
plot
alias of pandas.plotting._core.PlotAccessor
pandas.plotting._core.PlotAccessor
pop(item)
pop
Return item and drop from frame.
pow(other[, axis, level, fill_value])
pow
Get Exponential power of dataframe and other, element-wise (binary operator pow).
prod([axis, skipna, level, numeric_only, …])
prod
Return the product of the values over the requested axis.
product([axis, skipna, level, numeric_only, …])
product
quantile([q, axis, numeric_only, interpolation])
quantile
Return values at the given quantile over requested axis.
query(expr[, inplace])
query
Query the columns of a DataFrame with a boolean expression.
radd(other[, axis, level, fill_value])
radd
Get Addition of dataframe and other, element-wise (binary operator radd).
rank([axis, method, numeric_only, …])
rank
Compute numerical data ranks (1 through n) along axis.
rdiv(other[, axis, level, fill_value])
rdiv
Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
reindex([labels, index, columns, axis, …])
reindex
Conform Series/DataFrame to new index with optional filling logic.
reindex_like(other[, method, copy, limit, …])
reindex_like
Return an object with matching indices as other object.
rename([mapper, index, columns, axis, copy, …])
rename
Alter axes labels.
rename_axis([mapper, index, columns, axis, …])
rename_axis
Set the name of the axis for the index or columns.
reorder_levels(order[, axis])
reorder_levels
Rearrange index levels using input order.
replace([to_replace, value, inplace, limit, …])
replace
Replace values given in to_replace with value.
resample(rule[, axis, closed, label, …])
resample
Resample time-series data.
reset_index([level, drop, inplace, …])
reset_index
Reset the index, or a level of it.
rfloordiv(other[, axis, level, fill_value])
rfloordiv
Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
rmod(other[, axis, level, fill_value])
rmod
Get Modulo of dataframe and other, element-wise (binary operator rmod).
rmul(other[, axis, level, fill_value])
rmul
Get Multiplication of dataframe and other, element-wise (binary operator rmul).
rolling(window[, min_periods, center, …])
rolling
Provide rolling window calculations.
round([decimals])
round
Round a DataFrame to a variable number of decimal places.
rpow(other[, axis, level, fill_value])
rpow
Get Exponential power of dataframe and other, element-wise (binary operator rpow).
rsub(other[, axis, level, fill_value])
rsub
Get Subtraction of dataframe and other, element-wise (binary operator rsub).
rtruediv(other[, axis, level, fill_value])
rtruediv
sample([n, frac, replace, weights, …])
sample
Return a random sample of items from an axis of object.
select_dtypes([include, exclude])
select_dtypes
Return a subset of the DataFrame’s columns based on the column dtypes.
sem([axis, skipna, level, ddof, numeric_only])
sem
Return unbiased standard error of the mean over requested axis.
set_axis(labels[, axis, inplace])
set_axis
Assign desired index to given axis.
set_flags(*[, copy, allows_duplicate_labels])
set_flags
Return a new object with updated flags.
set_index(keys[, drop, append, inplace, …])
set_index
Set the DataFrame index using existing columns.
shift([periods, freq, axis, fill_value])
shift
Shift index by desired number of periods with an optional time freq.
skew([axis, skipna, level, numeric_only])
skew
Return unbiased skew over requested axis.
slice_shift([periods, axis])
slice_shift
(DEPRECATED) Equivalent to shift without copying data.
sort_index([axis, level, ascending, …])
sort_index
Sort object by labels (along an axis).
sort_values(by[, axis, ascending, inplace, …])
sort_values
Sort by the values along either axis.
sparse
alias of pandas.core.arrays.sparse.accessor.SparseFrameAccessor
pandas.core.arrays.sparse.accessor.SparseFrameAccessor
squeeze([axis])
squeeze
Squeeze 1 dimensional axis objects into scalars.
stack([level, dropna])
stack
Stack the prescribed level(s) from columns to index.
std([axis, skipna, level, ddof, numeric_only])
std
Return sample standard deviation over requested axis.
sub(other[, axis, level, fill_value])
sub
Get Subtraction of dataframe and other, element-wise (binary operator sub).
subtract(other[, axis, level, fill_value])
subtract
sum([axis, skipna, level, numeric_only, …])
sum
Return the sum of the values over the requested axis.
swapaxes(axis1, axis2[, copy])
swapaxes
Interchange axes and swap values axes appropriately.
swaplevel([i, j, axis])
swaplevel
Swap levels i and j in a MultiIndex on a particular axis.
tail([n])
tail
Return the last n rows.
take(indices[, axis, is_copy])
take
Return the elements in the given positional indices along an axis.
to_clipboard([excel, sep])
to_clipboard
Copy object to the system clipboard.
to_csv([path_or_buf, sep, na_rep, …])
to_csv
Write object to a comma-separated values (csv) file.
to_dict([orient, into])
to_dict
Convert the DataFrame to a dictionary.
to_excel(excel_writer[, sheet_name, na_rep, …])
to_excel
Write object to an Excel sheet.
to_feather(path, **kwargs)
to_feather
Write a DataFrame to the binary Feather format.
to_gbq(destination_table[, project_id, …])
to_gbq
Write a DataFrame to a Google BigQuery table.
to_hdf(path_or_buf, key[, mode, complevel, …])
to_hdf
Write the contained data to an HDF5 file using HDFStore.
to_html([buf, columns, col_space, header, …])
to_html
Render a DataFrame as an HTML table.
to_json([path_or_buf, orient, date_format, …])
to_json
Convert the object to a JSON string.
to_latex([buf, columns, col_space, header, …])
to_latex
Render object to a LaTeX tabular, longtable, or nested table/tabular.
to_markdown([buf, mode, index, storage_options])
to_markdown
Print DataFrame in Markdown-friendly format.
to_numpy([dtype, copy, na_value])
to_numpy
Convert the DataFrame to a NumPy array.
to_parquet([path, engine, compression, …])
to_parquet
Write a DataFrame to the binary parquet format.
to_period([freq, axis, copy])
to_period
Convert DataFrame from DatetimeIndex to PeriodIndex.
to_pickle(path[, compression, protocol, …])
to_pickle
Pickle (serialize) object to file.
to_records([index, column_dtypes, index_dtypes])
to_records
Convert DataFrame to a NumPy record array.
to_sql(name, con[, schema, if_exists, …])
to_sql
Write records stored in a DataFrame to a SQL database.
to_stata(path[, convert_dates, write_index, …])
to_stata
Export DataFrame object to Stata dta format.
to_string([buf, columns, col_space, header, …])
to_string
Render a DataFrame to a console-friendly tabular output.
to_timestamp([freq, how, axis, copy])
to_timestamp
Cast to DatetimeIndex of timestamps, at beginning of period.
to_xarray()
to_xarray
Return an xarray object from the pandas object.
transform(func[, axis])
transform
Call func on self producing a DataFrame with transformed values.
func
transpose(*args[, copy])
transpose
Transpose index and columns.
truediv(other[, axis, level, fill_value])
truediv
truncate([before, after, axis, copy])
truncate
Truncate a Series or DataFrame before and after some index value.
tshift([periods, freq, axis])
tshift
(DEPRECATED) Shift the time index, using the index’s frequency if available.
tz_convert(tz[, axis, level, copy])
tz_convert
Convert tz-aware axis to target time zone.
tz_localize(tz[, axis, level, copy, …])
tz_localize
Localize tz-naive index of a Series or DataFrame to target time zone.
unstack([level, fill_value])
unstack
Pivot a level of the (necessarily hierarchical) index labels.
update(other[, join, overwrite, …])
update
Modify in place using non-NA values from another DataFrame.
value_counts([subset, normalize, sort, …])
value_counts
Return a Series containing counts of unique rows in the DataFrame.
var([axis, skipna, level, ddof, numeric_only])
var
Return unbiased variance over requested axis.
where(cond[, other, inplace, axis, level, …])
where
Replace values where the condition is False.
xs(key[, axis, level, drop_level])
xs
Return cross-section from the Series/DataFrame.