pandas.DataFrame.to_hdf#

DataFrame.to_hdf(path_or_buf, *, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=<no_default>, data_columns=None, errors='strict', encoding='UTF-8')[source]#

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

Note

Files produced by this method use a pandas-specific layout on top of PyTables and are intended to be read back with read_hdf() or HDFStore. They are valid HDF5 files but are not a general-purpose interchange format for arbitrary HDF5 consumers.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

For more information see the user guide.

Parameters:

path_or_bufstr or pandas.HDFStore

File path or HDFStore object.

keystr

Identifier for the group in the store.

mode{‘a’, ‘w’, ‘r+’}, default ‘a’

Mode to open file:

‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.

complevel{0-9}, default None

Specifies a compression level for data. A value of 0 or None disables compression.

complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’

Specifies the compression library to be used. This has no effect unless complevel is set to a value greater than 0; passing complib alone emits a UserWarning and writes the data uncompressed. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.

appendbool, default False

For Table formats, append the input data to the existing table. The object stored at key (if any) must already be in 'table' format; appending to a 'fixed' object raises ValueError. When creating a new key with append=True, format defaults to 'table'. Each append must use exactly the same columns, in the same order, as the existing table.

format{‘fixed’, ‘table’, None}, default ‘fixed’

Possible values:

‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.

indexbool, default True

Write DataFrame index as a column.

min_itemsizedict or int, optional

Map column names to minimum string sizes for columns.

nan_repAny, optional

How to represent null values as str. Not allowed with append=True.

dropnabool, default False, optional

Remove missing values.

Deprecated since version 3.1.0: The dropna keyword is deprecated and will be removed in a future version. Use DataFrame.dropna() before writing instead.

data_columnslist of columns or True, optional

List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. See Query via data columns. for more information. Applicable only to format=’table’.

errorsstr, default ‘strict’

Specifies how encoding and decoding errors are to be handled. See the errors argument for open() for a full list of options.

encodingstr, default “UTF-8”

Set character encoding.