pandas.util.hash_pandas_object#

pandas.util.hash_pandas_object(obj, index=True, encoding='utf8', hash_key='0123456789123456', categorize=True)[source]#

Return a data hash of the Index/Series/DataFrame.

The hash is computed element-wise using the underlying data values, and optionally includes the index when hashing a Series or DataFrame.

Parameters:
objIndex, Series, or DataFrame

The pandas object to hash.

indexbool, default True

Include the index in the hash (if Series/DataFrame). When True, the hash for each row depends on both the value and the index label, so the same value at a different index position will produce a different hash.

encodingstr, default ‘utf8’

Encoding for data & key when strings.

hash_keystr, default _default_hash_key

Hash_key for string key to encode.

categorizebool, default True

Whether to first categorize object arrays before hashing. This is more efficient when the array contains duplicate values.

Returns:
Series of uint64

Same length as the object.

See also

util.hash_array

Return a hash of the given array.

util.hash_tuples

Hash a MultiIndex or listlike-of-tuples efficiently.

Examples

>>> pd.util.hash_pandas_object(pd.Series([1, 2, 3]))
0    14639053686158035780
1     3869563279212530728
2      393322362522515241
dtype: uint64

By default, the hash includes the index, so the same value at a different index position will produce a different hash:

>>> df1 = pd.DataFrame({"a": ["a", "b", "c"]})
>>> df2 = pd.DataFrame({"a": ["b", "a", "c"]})
>>> pd.util.hash_pandas_object(df1)
0     4578374827886788867
1    17338122309987883691
2     5473791562133574857
dtype: uint64
>>> pd.util.hash_pandas_object(df2)
0     8168238220198793318
1    14044658390916132862
2     5473791562133574857
dtype: uint64

Set index=False to hash only the values. In this case, the same value always produces the same hash regardless of its position:

>>> pd.util.hash_pandas_object(df1, index=False)
0     5694802365760992243
1     2797248057711234736
2    18202460376300699891
dtype: uint64
>>> pd.util.hash_pandas_object(df2, index=False)
0     2797248057711234736
1     5694802365760992243
2    18202460376300699891
dtype: uint64