pandas.DataFrame.to_parquet#
- DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)[source]#
- Write a DataFrame to the binary parquet format. - This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details. - Parameters
- pathstr, path object, file-like object, or None, default None
- String, path object (implementing - os.PathLike[str]), or file-like object implementing a binary- write()function. If None, the result is returned as bytes. If a string or path, it will be used as Root Directory path when writing a partitioned dataset.- Changed in version 1.2.0. - Previously this was “fname” 
- engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
- Parquet library to use. If ‘auto’, then the option - io.parquet.engineis used. The default- io.parquet.enginebehavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.
- compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
- Name of the compression to use. Use - Nonefor no compression.
- indexbool, default None
- If - True, include the dataframe’s index(es) in the file output. If- False, they will not be written to the file. If- None, similar to- Truethe dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.
- partition_colslist, optional, default None
- Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string. 
- storage_optionsdict, optional
- Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to - urllib.request.Requestas header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to- fsspec.open. Please see- fsspecand- urllibfor more details, and for more examples on storage options refer here.- New in version 1.2.0. 
- **kwargs
- Additional arguments passed to the parquet library. See pandas io for more details. 
 
- Returns
- bytes if no path argument is provided else None
 
 - See also - read_parquet
- Read a parquet file. 
- DataFrame.to_orc
- Write an orc file. 
- DataFrame.to_csv
- Write a csv file. 
- DataFrame.to_sql
- Write to a sql table. 
- DataFrame.to_hdf
- Write to hdf. 
 - Notes - This function requires either the fastparquet or pyarrow library. - Examples - >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) >>> df.to_parquet('df.parquet.gzip', ... compression='gzip') >>> pd.read_parquet('df.parquet.gzip') col1 col2 0 1 3 1 2 4 - If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don’t use partition_cols, which creates multiple files. - >>> import io >>> f = io.BytesIO() >>> df.to_parquet(f) >>> f.seek(0) 0 >>> content = f.read()