pandas.DataFrame.rank#
- DataFrame.rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)[source]#
Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the ranks of those values.
- Parameters:
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking. For Series this parameter is unused and defaults to 0.
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties):
average: average rank of the group
min: lowest rank in the group
max: highest rank in the group
first: ranks assigned in order they appear in the array
dense: like ‘min’, but rank always increases by 1 between groups.
- numeric_onlybool, default False
For DataFrame objects, rank only numeric columns if set to True.
Changed in version 2.0.0: The default value of
numeric_only
is nowFalse
.- na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values:
keep: assign NaN rank to NaN values
top: assign lowest rank to NaN values
bottom: assign highest rank to NaN values
- ascendingbool, default True
Whether or not the elements should be ranked in ascending order.
- pctbool, default False
Whether or not to display the returned rankings in percentile form.
- Returns:
- same type as caller
Return a Series or DataFrame with data ranks as values.
See also
core.groupby.DataFrameGroupBy.rank
Rank of values within each group.
core.groupby.SeriesGroupBy.rank
Rank of values within each group.
Examples
>>> df = pd.DataFrame( ... data={ ... "Animal": ["cat", "penguin", "dog", "spider", "snake"], ... "Number_legs": [4, 2, 4, 8, np.nan], ... } ... ) >>> df Animal Number_legs 0 cat 4.0 1 penguin 2.0 2 dog 4.0 3 spider 8.0 4 snake NaN
Ties are assigned the mean of the ranks (by default) for the group.
>>> s = pd.Series(range(5), index=list("abcde")) >>> s["d"] = s["b"] >>> s.rank() a 1.0 b 2.5 c 4.0 d 2.5 e 5.0 dtype: float64
The following example shows how the method behaves with the above parameters:
default_rank: this is the default behaviour obtained without using any parameter.
max_rank: setting
method = 'max'
the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)NA_bottom: choosing
na_option = 'bottom'
, if there are records with NaN values they are placed at the bottom of the ranking.pct_rank: when setting
pct = True
, the ranking is expressed as percentile rank.
>>> df["default_rank"] = df["Number_legs"].rank() >>> df["max_rank"] = df["Number_legs"].rank(method="max") >>> df["NA_bottom"] = df["Number_legs"].rank(na_option="bottom") >>> df["pct_rank"] = df["Number_legs"].rank(pct=True) >>> df Animal Number_legs default_rank max_rank NA_bottom pct_rank 0 cat 4.0 2.5 3.0 2.5 0.625 1 penguin 2.0 1.0 1.0 1.0 0.250 2 dog 4.0 2.5 3.0 2.5 0.625 3 spider 8.0 4.0 4.0 4.0 1.000 4 snake NaN NaN NaN 5.0 NaN