pandas.DataFrame.rank¶
-
DataFrame.
rank
(self, axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)[source]¶ Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the ranks of those values.
Parameters: - axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking.
- method : {‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties):
- average: average rank of the group
- min: lowest rank in the group
- max: highest rank in the group
- first: ranks assigned in order they appear in the array
- dense: like ‘min’, but rank always increases by 1 between groups
- numeric_only : bool, optional
For DataFrame objects, rank only numeric columns if set to True.
- na_option : {‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values:
- keep: assign NaN rank to NaN values
- top: assign smallest rank to NaN values if ascending
- bottom: assign highest rank to NaN values if ascending
- ascending : bool, default True
Whether or not the elements should be ranked in ascending order.
- pct : bool, default False
Whether or not to display the returned rankings in percentile form.
Returns: - same type as caller
Return a Series or DataFrame with data ranks as values.
See also
core.groupby.GroupBy.rank
- Rank of values within each group.
Examples
>>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog', ... 'spider', 'snake'], ... 'Number_legs': [4, 2, 4, 8, np.nan]}) >>> df Animal Number_legs 0 cat 4.0 1 penguin 2.0 2 dog 4.0 3 spider 8.0 4 snake NaN
The following example shows how the method behaves with the above parameters:
- default_rank: this is the default behaviour obtained without using any parameter.
- max_rank: setting
method = 'max'
the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.) - NA_bottom: choosing
na_option = 'bottom'
, if there are records with NaN values they are placed at the bottom of the ranking. - pct_rank: when setting
pct = True
, the ranking is expressed as percentile rank.
>>> df['default_rank'] = df['Number_legs'].rank() >>> df['max_rank'] = df['Number_legs'].rank(method='max') >>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom') >>> df['pct_rank'] = df['Number_legs'].rank(pct=True) >>> df Animal Number_legs default_rank max_rank NA_bottom pct_rank 0 cat 4.0 2.5 3.0 2.5 0.625 1 penguin 2.0 1.0 1.0 1.0 0.250 2 dog 4.0 2.5 3.0 2.5 0.625 3 spider 8.0 4.0 4.0 4.0 1.000 4 snake NaN NaN NaN 5.0 NaN