pandas.api.typing.SeriesGroupBy.describe#
- SeriesGroupBy.describe(percentiles=None, include=None, exclude=None)[source]#
Generate descriptive statistics.
Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding
NaNvalues.Analyzes both numeric and object series, as well as
DataFramecolumn sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.- Parameters:
- percentileslist-like of numbers, optional
The percentiles to include in the output. All should fall between 0 and 1. The default,
None, will automatically return the 25th, 50th, and 75th percentiles.- include‘all’, list-like of dtypes or None (default), optional
A white list of data types to include in the result. Ignored for
Series. Here are the options:‘all’ : All columns of the input will be included in the output.
A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit
numpy.number. To limit it instead to object columns submit thenumpy.objectdata type. Strings can also be used in the style ofselect_dtypes(e.g.df.describe(include=['O'])). To select pandas categorical columns, use'category'None (default) : The result will include all numeric columns.
- excludelist-like of dtypes or None (default), optional,
A black list of data types to omit from the result. Ignored for
Series. Here are the options:A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit
numpy.number. To exclude object columns submit the data typenumpy.object. Strings can also be used in the style ofselect_dtypes(e.g.df.describe(exclude=['O'])). To exclude pandas categorical columns, use'category'None (default) : The result will exclude nothing.
- Returns:
- Series or DataFrame
Summary statistics of the Series or Dataframe provided.
See also
DataFrame.countCount number of non-NA/null observations.
DataFrame.maxMaximum of the values in the object.
DataFrame.minMinimum of the values in the object.
DataFrame.meanMean of the values.
DataFrame.stdStandard deviation of the observations.
DataFrame.select_dtypesSubset of a DataFrame including/excluding columns based on their dtype.
Notes
For numeric data, the result’s index will include
count,mean,std,min,maxas well as lower,50and upper percentiles. By default the lower percentile is25and the upper percentile is75. The50percentile is the same as the median.For object data (e.g. strings), the result’s index will include
count,unique,top, andfreq. Thetopis the most common value. Thefreqis the most common value’s frequency.If multiple object values have the highest count, then the
countandtopresults will be arbitrarily chosen from among those with the highest count.For mixed data types provided via a
DataFrame, the default is to return only an analysis of numeric columns. If the DataFrame consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. Ifinclude='all'is provided as an option, the result will include a union of attributes of each type.The include and exclude parameters can be used to limit which columns in a
DataFrameare analyzed for the output. The parameters are ignored when analyzing aSeries.Examples
Describing a numeric
Series.>>> s = pd.Series([1, 2, 3, 4])
>>> s 0 1 1 2 2 3 3 4 dtype: int64
>>> s.groupby([1, 1, 2, 2]).describe() count mean std min 25% 50% 75% max 1 2.0 1.5 0.707107 1.0 1.25 1.5 1.75 2.0 2 2.0 3.5 0.707107 3.0 3.25 3.5 3.75 4.0