pandas.read_parquet¶

pandas.read_parquet(path, engine='auto', columns=None, use_nullable_dtypes=False, **kwargs)[source]¶

Load a parquet object from the file path, returning a DataFrame.

Parameters

pathstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.parquet. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be: file://localhost/path/to/tables or s3://bucket/partition_dir

If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.

engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’

Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

columnslist, default=None

If not None, only these columns will be read from the file.

use_nullable_dtypesbool, default False

If True, use dtypes that use pd.NA as missing value indicator for the resulting DataFrame (only applicable for engine="pyarrow"). As new dtypes are added that support pd.NA in the future, the output with this option will change to use those dtypes. Note: this is an experimental option, and behaviour (e.g. additional support dtypes) may change without notice.

New in version 1.2.0.

**kwargs

Any additional kwargs are passed to the engine.

Returns

DataFrame

pandas.read_feather pandas.read_orc