pandas.Series.describe#
- Series.describe(percentiles=None, include=None, exclude=None)[source]#
- Generate descriptive statistics. - Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding - NaNvalues.- Analyzes both numeric and object series, as well as - DataFramecolumn sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.- Parameters:
- percentileslist-like of numbers, optional
- The percentiles to include in the output. All should fall between 0 and 1. The default is - [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
- include‘all’, list-like of dtypes or None (default), optional
- A white list of data types to include in the result. Ignored for - Series. Here are the options:- ‘all’ : All columns of the input will be included in the output. 
- A list-like of dtypes : Limits the results to the provided data types. To limit the result to numeric types submit - numpy.number. To limit it instead to object columns submit the- numpy.objectdata type. Strings can also be used in the style of- select_dtypes(e.g.- df.describe(include=['O'])). To select pandas categorical columns, use- 'category'
- None (default) : The result will include all numeric columns. 
 
- excludelist-like of dtypes or None (default), optional,
- A black list of data types to omit from the result. Ignored for - Series. Here are the options:- A list-like of dtypes : Excludes the provided data types from the result. To exclude numeric types submit - numpy.number. To exclude object columns submit the data type- numpy.object. Strings can also be used in the style of- select_dtypes(e.g.- df.describe(exclude=['O'])). To exclude pandas categorical columns, use- 'category'
- None (default) : The result will exclude nothing. 
 
 
- Returns:
- Series or DataFrame
- Summary statistics of the Series or Dataframe provided. 
 
 - See also - DataFrame.count
- Count number of non-NA/null observations. 
- DataFrame.max
- Maximum of the values in the object. 
- DataFrame.min
- Minimum of the values in the object. 
- DataFrame.mean
- Mean of the values. 
- DataFrame.std
- Standard deviation of the observations. 
- DataFrame.select_dtypes
- Subset of a DataFrame including/excluding columns based on their dtype. 
 - Notes - For numeric data, the result’s index will include - count,- mean,- std,- min,- maxas well as lower,- 50and upper percentiles. By default the lower percentile is- 25and the upper percentile is- 75. The- 50percentile is the same as the median.- For object data (e.g. strings or timestamps), the result’s index will include - count,- unique,- top, and- freq. The- topis the most common value. The- freqis the most common value’s frequency. Timestamps also include the- firstand- lastitems.- If multiple object values have the highest count, then the - countand- topresults will be arbitrarily chosen from among those with the highest count.- For mixed data types provided via a - DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If- include='all'is provided as an option, the result will include a union of attributes of each type.- The include and exclude parameters can be used to limit which columns in a - DataFrameare analyzed for the output. The parameters are ignored when analyzing a- Series.- Examples - Describing a numeric - Series.- >>> s = pd.Series([1, 2, 3]) >>> s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 dtype: float64 - Describing a categorical - Series.- >>> s = pd.Series(['a', 'a', 'b', 'c']) >>> s.describe() count 4 unique 3 top a freq 2 dtype: object - Describing a timestamp - Series.- >>> s = pd.Series([ ... np.datetime64("2000-01-01"), ... np.datetime64("2010-01-01"), ... np.datetime64("2010-01-01") ... ]) >>> s.describe() count 3 mean 2006-09-01 08:00:00 min 2000-01-01 00:00:00 25% 2004-12-31 12:00:00 50% 2010-01-01 00:00:00 75% 2010-01-01 00:00:00 max 2010-01-01 00:00:00 dtype: object - Describing a - DataFrame. By default only numeric fields are returned.- >>> df = pd.DataFrame({'categorical': pd.Categorical(['d', 'e', 'f']), ... 'numeric': [1, 2, 3], ... 'object': ['a', 'b', 'c'] ... }) >>> df.describe() numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 - Describing all columns of a - DataFrameregardless of data type.- >>> df.describe(include='all') categorical numeric object count 3 3.0 3 unique 3 NaN 3 top f NaN a freq 1 NaN 1 mean NaN 2.0 NaN std NaN 1.0 NaN min NaN 1.0 NaN 25% NaN 1.5 NaN 50% NaN 2.0 NaN 75% NaN 2.5 NaN max NaN 3.0 NaN - Describing a column from a - DataFrameby accessing it as an attribute.- >>> df.numeric.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Name: numeric, dtype: float64 - Including only numeric columns in a - DataFramedescription.- >>> df.describe(include=[np.number]) numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 - Including only string columns in a - DataFramedescription.- >>> df.describe(include=[object]) object count 3 unique 3 top a freq 1 - Including only categorical columns from a - DataFramedescription.- >>> df.describe(include=['category']) categorical count 3 unique 3 top d freq 1 - Excluding numeric columns from a - DataFramedescription.- >>> df.describe(exclude=[np.number]) categorical object count 3 3 unique 3 3 top f a freq 1 1 - Excluding object columns from a - DataFramedescription.- >>> df.describe(exclude=[object]) categorical numeric count 3 3.0 unique 3 NaN top f NaN freq 1 NaN mean NaN 2.0 std NaN 1.0 min NaN 1.0 25% NaN 1.5 50% NaN 2.0 75% NaN 2.5 max NaN 3.0