pandas.util.hash_pandas_object#
- pandas.util.hash_pandas_object(obj, index=True, encoding='utf8', hash_key='0123456789123456', categorize=True)[source]#
Return a data hash of the Index/Series/DataFrame.
The hash is computed element-wise using the underlying data values, and optionally includes the index when hashing a Series or DataFrame.
- Parameters:
- objIndex, Series, or DataFrame
The pandas object to hash.
- indexbool, default True
Include the index in the hash (if Series/DataFrame). When True, the hash for each row depends on both the value and the index label, so the same value at a different index position will produce a different hash.
- encodingstr, default ‘utf8’
Encoding for data & key when strings.
- hash_keystr, default _default_hash_key
Hash_key for string key to encode.
- categorizebool, default True
Whether to first categorize object arrays before hashing. This is more efficient when the array contains duplicate values.
- Returns:
- Series of uint64
Same length as the object.
See also
util.hash_arrayReturn a hash of the given array.
util.hash_tuplesHash a MultiIndex or listlike-of-tuples efficiently.
Examples
>>> pd.util.hash_pandas_object(pd.Series([1, 2, 3])) 0 14639053686158035780 1 3869563279212530728 2 393322362522515241 dtype: uint64
By default, the hash includes the index, so the same value at a different index position will produce a different hash:
>>> df1 = pd.DataFrame({"a": ["a", "b", "c"]}) >>> df2 = pd.DataFrame({"a": ["b", "a", "c"]}) >>> pd.util.hash_pandas_object(df1) 0 4578374827886788867 1 17338122309987883691 2 5473791562133574857 dtype: uint64 >>> pd.util.hash_pandas_object(df2) 0 8168238220198793318 1 14044658390916132862 2 5473791562133574857 dtype: uint64
Set
index=Falseto hash only the values. In this case, the same value always produces the same hash regardless of its position:>>> pd.util.hash_pandas_object(df1, index=False) 0 5694802365760992243 1 2797248057711234736 2 18202460376300699891 dtype: uint64 >>> pd.util.hash_pandas_object(df2, index=False) 0 2797248057711234736 1 5694802365760992243 2 18202460376300699891 dtype: uint64