pandas.DataFrame.copy#

DataFrame.copy(deep=True)[source]#

Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Note

The deep=False behaviour as described above will change in pandas 3.0. Copy-on-Write will be enabled by default, which means that the “shallow” copy is that is returned with deep=False will still avoid making an eager copy, but changes to the data of the original will no longer be reflected in the shallow copy (or vice versa). Instead, it makes use of a lazy (deferred) copy mechanism that will copy the data only when any changes to the original or shallow copy is made.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Parameters:
deepbool, default True

Make a deep copy, including a copy of the data and the indices. With deep=False neither the indices nor the data are copied.

Returns:
Series or DataFrame

Object type matches caller.

Notes

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).

While Index objects are copied when deep=True, the underlying numpy array is not copied for performance reasons. Since Index is immutable, the underlying data can be safely shared and a copy is not needed.

Since pandas is not thread safe, see the gotchas when copying in a threading environment.

Copy-on-Write protects shallow copies against accidental modifications. This means that any changes to the copied data would make a new copy of the data upon write (and vice versa). Changes made to either the original or copied variable would not be reflected in the counterpart. See Copy_on_Write for more information.

Examples

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64
>>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: int64

Shallow copy versus default (deep) copy:

>>> s = pd.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)

Shallow copy shares index with original, the data is a view of the original.

>>> s is shallow
False
>>> s.values is shallow.values
False
>>> s.index is shallow.index
False

Deep copy has own copy of data and index.

>>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False

The shallow copy is protected against updating the original object as well. Thus, updates will only reflect in one of both objects.

>>> s.iloc[0] = 3
>>> shallow.iloc[1] = 4
>>> s
a    3
b    2
dtype: int64
>>> shallow
a    1
b    4
dtype: int64
>>> deep
a    1
b    2
dtype: int64

Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy.

>>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0    [10, 2]
1     [3, 4]
dtype: object
>>> deep
0    [10, 2]
1     [3, 4]
dtype: object