pandas.read_parquet#
- pandas.read_parquet(path, engine='auto', columns=None, storage_options=None, use_nullable_dtypes=_NoDefault.no_default, dtype_backend=_NoDefault.no_default, **kwargs)[source]#
Load a parquet object from the file path, returning a DataFrame.
- Parameters
- pathstr, path object or file-like object
String, path object (implementing
os.PathLike[str]
), or file-like object implementing a binaryread()
function. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be:file://localhost/path/to/table.parquet
. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be:file://localhost/path/to/tables
ors3://bucket/partition_dir
.- engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
Parquet library to use. If ‘auto’, then the option
io.parquet.engine
is used. The defaultio.parquet.engine
behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.- columnslist, default=None
If not None, only these columns will be read from the file.
- storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Request
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open
. Please seefsspec
andurllib
for more details, and for more examples on storage options refer here.New in version 1.3.0.
- use_nullable_dtypesbool, default False
If True, use dtypes that use
pd.NA
as missing value indicator for the resulting DataFrame. (only applicable for thepyarrow
engine) As new dtypes are added that supportpd.NA
in the future, the output with this option will change to use those dtypes. Note: this is an experimental option, and behaviour (e.g. additional support dtypes) may change without notice.Deprecated since version 2.0.
- dtype_backend{“numpy_nullable”, “pyarrow”}, defaults to NumPy backed DataFrames
Which dtype_backend to use, e.g. whether a DataFrame should have NumPy arrays, nullable dtypes are used for all dtypes that have a nullable implementation when “numpy_nullable” is set, pyarrow is used for all dtypes if “pyarrow” is set.
The dtype_backends are still experimential.
New in version 2.0.
- **kwargs
Any additional kwargs are passed to the engine.
- Returns
- DataFrame