pandas.DataFrame.sort_values#

DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)[source]#

Sort by the values along either axis.

Parameters:
bystr or list of str

Name or list of names to sort by.

  • if axis is 0 or ‘index’ then by may contain index levels and/or column labels.

  • if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.

axis“{0 or ‘index’, 1 or ‘columns’}”, default 0

Axis to be sorted.

ascendingbool or list of bool, default True

Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

inplacebool, default False

If True, perform operation in-place.

kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’

Choice of sorting algorithm. See also numpy.sort() for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.

na_position{‘first’, ‘last’}, default ‘last’

Puts NaNs at the beginning if first; last puts NaNs at the end.

ignore_indexbool, default False

If True, the resulting axis will be labeled 0, 1, …, n - 1.

keycallable, optional

Apply the key function to the values before sorting. This is similar to the key argument in the builtin sorted() function, with the notable difference that this key function should be vectorized. It should expect a Series and return a Series with the same shape as the input. It will be applied to each column in by independently.

Returns:
DataFrame or None

DataFrame with sorted values or None if inplace=True.

See also

DataFrame.sort_index

Sort a DataFrame by the index.

Series.sort_values

Similar method for a Series.

Examples

>>> df = pd.DataFrame(
...     {
...         "col1": ["A", "A", "B", np.nan, "D", "C"],
...         "col2": [2, 1, 9, 8, 7, 4],
...         "col3": [0, 1, 9, 4, 2, 3],
...         "col4": ["a", "B", "c", "D", "e", "F"],
...     }
... )
>>> df
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Sort by a single column

In this case, we are sorting the rows according to values in col1:

>>> df.sort_values(by=["col1"])
  col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort by multiple columns

You can also provide multiple columns to by argument, as shown below. In this example, the rows are first sorted according to col1, and then the rows that have an identical value in col1 are sorted according to col2.

>>> df.sort_values(by=["col1", "col2"])
  col1  col2  col3 col4
1    A     1     1    B
0    A     2     0    a
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort in a descending order

The sort order can be reversed using ascending argument, as shown below:

>>> df.sort_values(by="col1", ascending=False)
  col1  col2  col3 col4
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B
3  NaN     8     4    D

Placing any NA first

Note that in the above example, the rows that contain an NA value in their col1 are placed at the end of the dataframe. This behavior can be modified via na_position argument, as shown below:

>>> df.sort_values(by="col1", ascending=False, na_position="first")
  col1  col2  col3 col4
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B

Customized sort order

The key argument allows for a further customization of sorting behaviour. For example, you may want to ignore the letter’s case when sorting strings:

>>> df.sort_values(by="col4", key=lambda col: col.str.lower())
   col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Another typical example is natural sorting. This can be done using natsort package, which provides sorted indices according to their natural order, as shown below:

>>> df = pd.DataFrame(
...     {
...         "time": ["0hr", "128hr", "72hr", "48hr", "96hr"],
...         "value": [10, 20, 30, 40, 50],
...     }
... )
>>> df
    time  value
0    0hr     10
1  128hr     20
2   72hr     30
3   48hr     40
4   96hr     50
>>> from natsort import index_natsorted
>>> index_natsorted(df["time"])
[0, 3, 2, 4, 1]
>>> df.sort_values(
...     by="time",
...     key=lambda x: np.argsort(index_natsorted(x)),
... )
    time  value
0    0hr     10
3   48hr     40
2   72hr     30
4   96hr     50
1  128hr     20