pandas.DataFrame.to_parquet#
- DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)[source]#
Write a DataFrame to the binary parquet format.
This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.
- Parameters:
- pathstr, path object, file-like object, or None, default None
String, path object (implementing
os.PathLike[str]
), or file-like object implementing a binarywrite()
function. If None, the result is returned as bytes. If a string or path, it will be used as Root Directory path when writing a partitioned dataset.Changed in version 1.2.0.
Previously this was “fname”
- engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
Parquet library to use. If ‘auto’, then the option
io.parquet.engine
is used. The defaultio.parquet.engine
behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.- compressionstr or None, default ‘snappy’
Name of the compression to use. Use
None
for no compression. The supported compression methods actually depend on which engine is used. For ‘pyarrow’, ‘snappy’, ‘gzip’, ‘brotli’, ‘lz4’, ‘zstd’ are all supported. For ‘fastparquet’, only ‘gzip’ and ‘snappy’ are supported.- indexbool, default None
If
True
, include the dataframe’s index(es) in the file output. IfFalse
, they will not be written to the file. IfNone
, similar toTrue
the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.- partition_colslist, optional, default None
Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.
- storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details, and for more examples on storage options refer here.New in version 1.2.0.
- **kwargs
Additional arguments passed to the parquet library. See pandas io for more details.
- Returns:
- bytes if no path argument is provided else None
See also
read_parquet
Read a parquet file.
DataFrame.to_orc
Write an orc file.
DataFrame.to_csv
Write a csv file.
DataFrame.to_sql
Write to a sql table.
DataFrame.to_hdf
Write to hdf.
Notes
This function requires either the fastparquet or pyarrow library.
Examples
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) >>> df.to_parquet('df.parquet.gzip', ... compression='gzip') >>> pd.read_parquet('df.parquet.gzip') col1 col2 0 1 3 1 2 4
If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don’t use partition_cols, which creates multiple files.
>>> import io >>> f = io.BytesIO() >>> df.to_parquet(f) >>> f.seek(0) 0 >>> content = f.read()