pandas.DataFrame.apply#
- DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine=None, engine_kwargs=None, **kwargs)[source]#
Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is either the DataFrame’s index (
axis=0
) or the DataFrame’s columns (axis=1
). By default (result_type=None
), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument. The return type of the applied function is inferred based on the first computed result obtained after applying the function to a Series object.- Parameters:
- funcfunction
Function to apply to each column or row.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:
0 or ‘index’: apply function to each column.
1 or ‘columns’: apply function to each row.
- rawbool, default False
Determines if row or column is passed as a Series or ndarray object:
False
: passes each row or column as a Series to the function.True
: the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
- result_type{‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when
axis=1
(columns):‘expand’ : list-like results will be turned into columns.
‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
- argstuple
Positional arguments to pass to func in addition to the array/series.
- by_rowFalse or “compat”, default “compat”
Only has an effect when
func
is a listlike or dictlike of funcs and the func isn’t a string. If “compat”, will if possible first translate the func into pandas methods (e.g.Series().apply(np.sum)
will be translated toSeries().sum()
). If that doesn’t work, will try call to apply again withby_row=True
and if that fails, will call apply again withby_row=False
(backward compatible). If False, the funcs will be passed the whole Series at once.Added in version 2.1.0.
- enginedecorator or {‘python’, ‘numba’}, optional
Choose the execution engine to use. If not provided the function will be executed by the regular Python interpreter.
Other options include JIT compilers such Numba and Bodo, which in some cases can speed up the execution. To use an executor you can provide the decorators
numba.jit
,numba.njit
orbodo.jit
. You can also provide the decorator with parameters, likenumba.jit(nogit=True)
.Not all functions can be executed with all execution engines. In general, JIT compilers will require type stability in the function (no variable should change data type during the execution). And not all pandas and NumPy APIs are supported. Check the engine documentation [1] and [2] for limitations.
Warning
String parameters will stop being supported in a future pandas version.
Added in version 2.2.0.
- engine_kwargsdict
Pass keyword arguments to the engine. This is currently only used by the numba engine, see the documentation for the engine argument for more information.
- **kwargs
Additional keyword arguments to pass as keywords arguments to func.
- Returns:
- Series or DataFrame
Result of applying
func
along the given axis of the DataFrame.
See also
DataFrame.map
For elementwise operations.
DataFrame.aggregate
Only perform aggregating type operations.
DataFrame.transform
Only perform transforming type operations.
Notes
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
References
[2]Bodo documentation <https://docs.bodo.ai/latest/>/
Examples
>>> df = pd.DataFrame([[4, 9]] * 3, columns=["A", "B"]) >>> df A B 0 4 9 1 4 9 2 4 9
Using a numpy universal function (in this case the same as
np.sqrt(df)
):>>> df.apply(np.sqrt) A B 0 2.0 3.0 1 2.0 3.0 2 2.0 3.0
Using a reducing function on either axis
>>> df.apply(np.sum, axis=0) A 12 B 27 dtype: int64
>>> df.apply(np.sum, axis=1) 0 13 1 13 2 13 dtype: int64
Returning a list-like will result in a Series
>>> df.apply(lambda x: [1, 2], axis=1) 0 [1, 2] 1 [1, 2] 2 [1, 2] dtype: object
Passing
result_type='expand'
will expand list-like results to columns of a Dataframe>>> df.apply(lambda x: [1, 2], axis=1, result_type="expand") 0 1 0 1 2 1 1 2 2 1 2
Returning a Series inside the function is similar to passing
result_type='expand'
. The resulting column names will be the Series index.>>> df.apply(lambda x: pd.Series([1, 2], index=["foo", "bar"]), axis=1) foo bar 0 1 2 1 1 2 2 1 2
Passing
result_type='broadcast'
will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.>>> df.apply(lambda x: [1, 2], axis=1, result_type="broadcast") A B 0 1 2 1 1 2 2 1 2
Advanced users can speed up their code by using a Just-in-time (JIT) compiler with
apply
. The main JIT compilers available for pandas are Numba and Bodo. In general, JIT compilation is only possible when the function passed toapply
has type stability (variables in the function do not change their type during the execution).>>> import bodo >>> df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit)
Note that JIT compilation is only recommended for functions that take a significant amount of time to run. Fast functions are unlikely to run faster with JIT compilation.