pandas.DataFrame.assign#

DataFrame.assign(**kwargs)[source]#

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters:

**kwargscallable, Series, scalar, array-like, or dict: The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable (e.g. a Series, scalar, array, or dict), they are simply assigned. See the Notes section for details on alignment and broadcasting.

Returns:

DataFrame: A new DataFrame with the new columns in addition to all the existing columns.

See also

DataFrame.loc: Select a subset of a DataFrame by labels.
DataFrame.iloc: Select a subset of a DataFrame by positions.

Notes

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order. Non-callable values (Series, arrays, scalars) follow the same alignment and broadcasting rules as DataFrame.__setitem__(). See that method’s documentation for details.

Examples

>>> df = pd.DataFrame({"temp_c": [17.0, 25.0]}, index=["Portland", "Berkeley"])
>>> df
          temp_c
Portland    17.0
Berkeley    25.0

Where the value is a callable, evaluated on df:

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df["temp_c"] * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

or by using pandas.col():

>>> df.assign(temp_f=pd.col("temp_c") * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

>>> df.assign(
...     temp_f=lambda x: x["temp_c"] * 9 / 5 + 32,
...     temp_k=lambda x: (x["temp_f"] + 459.67) * 5 / 9,
... )
          temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15

A dict value is aligned to the DataFrame’s index by its keys, and index labels not present in the dict are filled with NaN:

>>> df.assign(temp_k={"Portland": 290.15, "Berkeley": 298.15, "Seattle": 285.0})
          temp_c  temp_k
Portland    17.0  290.15
Berkeley    25.0  298.15