pandas.core.groupby.DataFrameGroupBy.apply#
- DataFrameGroupBy.apply(func, *args, include_groups=True, **kwargs)[source]#
Apply function
func
group-wise and combine the results together.The function passed to
apply
must take a dataframe as its first argument and return a DataFrame, Series or scalar.apply
will then take care of combining the results back together into a single dataframe or series.apply
is therefore a highly flexible grouping method.While
apply
is a very flexible method, its downside is that using it can be quite a bit slower than using more specific methods likeagg
ortransform
. Pandas offers a wide range of method that will be much faster than usingapply
for their specific purposes, so try to use them before reaching forapply
.- Parameters:
- funccallable
A callable that takes a dataframe as its first argument, and returns a dataframe, a series or a scalar. In addition the callable may take positional and keyword arguments.
- *argstuple
Optional positional arguments to pass to
func
.- include_groupsbool, default True
When True, will attempt to apply
func
to the groupings in the case that they are columns of the DataFrame. If this raises a TypeError, the result will be computed with the groupings excluded. When False, the groupings will be excluded when applyingfunc
.Added in version 2.2.0.
Deprecated since version 2.2.0.
Setting include_groups to True is deprecated. Only the value False will be allowed in a future version of pandas.
- **kwargsdict
Optional keyword arguments to pass to
func
.
- Returns:
- Series or DataFrame
A pandas object with the result of applying
func
to each group.
See also
pipe
Apply function to the full GroupBy object instead of to each group.
aggregate
Apply aggregate function to the GroupBy object.
transform
Apply function column-by-column to the GroupBy object.
Series.apply
Apply a function to a Series.
DataFrame.apply
Apply a function to each row or column of a DataFrame.
Notes
Changed in version 1.3.0: The resulting dtype will reflect the return value of the passed
func
, see the examples below.Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
Examples
>>> df = pd.DataFrame({"A": "a a b".split(), "B": [1, 2, 3], "C": [4, 6, 5]}) >>> g1 = df.groupby("A", group_keys=False) >>> g2 = df.groupby("A", group_keys=True)
Notice that
g1
andg2
have two groups,a
andb
, and only differ in theirgroup_keys
argument. Calling apply in various ways, we can get different grouping results:Example 1: below the function passed to apply takes a DataFrame as its argument and returns a DataFrame. apply combines the result for each group together into a new DataFrame:
>>> g1[["B", "C"]].apply(lambda x: x / x.sum()) B C 0 0.333333 0.4 1 0.666667 0.6 2 1.000000 1.0
In the above, the groups are not part of the index. We can have them included by using
g2
wheregroup_keys=True
:>>> g2[["B", "C"]].apply(lambda x: x / x.sum()) B C A a 0 0.333333 0.4 1 0.666667 0.6 b 2 1.000000 1.0
Example 2: The function passed to apply takes a DataFrame as its argument and returns a Series. apply combines the result for each group together into a new DataFrame.
Changed in version 1.3.0: The resulting dtype will reflect the return value of the passed
func
.>>> g1[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min()) B C A a 1.0 2.0 b 0.0 0.0
>>> g2[["B", "C"]].apply(lambda x: x.astype(float).max() - x.min()) B C A a 1.0 2.0 b 0.0 0.0
The
group_keys
argument has no effect here because the result is not like-indexed (i.e. a transform) when compared to the input.Example 3: The function passed to apply takes a DataFrame as its argument and returns a scalar. apply combines the result for each group together into a Series, including setting the index as appropriate:
>>> g1.apply(lambda x: x.C.max() - x.B.min(), include_groups=False) A a 5 b 2 dtype: int64
Example 4: The function passed to
apply
returnsNone
for one of the group. This group is filtered from the result:>>> g1.apply(lambda x: None if x.iloc[0, 0] == 3 else x, include_groups=False) B C 0 1 4 1 2 6