pandas.core.groupby.GroupBy.apply¶
-
GroupBy.
apply
(func, *args, **kwargs)[source]¶ Apply function
func
group-wise and combine the results together.The function passed to
apply
must take a dataframe as its first argument and return a dataframe, a series or a scalar.apply
will then take care of combining the results back together into a single dataframe or series.apply
is therefore a highly flexible grouping method.While
apply
is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods. Pandas offers a wide range of method that will be much faster than usingapply
for their specific purposes, so try to use them before reaching forapply
.Parameters: func : function
A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments
args, kwargs : tuple and dict
Optional positional and keyword arguments to pass to
func
Returns: applied : Series or DataFrame
See also
pipe
- Apply function to the full GroupBy object instead of to each group.
Notes
In the current implementation
apply
calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.Examples
>>> df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]}) >>> g = df.groupby('A')
From
df
above we can see thatg
has two groups,a
,b
. Callingapply
in various ways, we can get different grouping results:Example 1: below the function passed to
apply
takes a dataframe as its argument and returns a dataframe.apply
combines the result for each group together into a new dataframe:>>> g.apply(lambda x: x / x.sum()) B C 0 0.333333 0.4 1 0.666667 0.6 2 1.000000 1.0
Example 2: The function passed to
apply
takes a dataframe as its argument and returns a series.apply
combines the result for each group together into a new dataframe:>>> g.apply(lambda x: x.max() - x.min()) B C A a 1 2 b 0 0
Example 3: The function passed to
apply
takes a dataframe as its argument and returns a scalar.apply
combines the result for each group together into a series, including setting the index as appropriate:>>> g.apply(lambda x: x.C.max() - x.B.min()) A a 5 b 2 dtype: int64