pandas.read_orc#
- pandas.read_orc(path, columns=None, dtype_backend=<no_default>, filesystem=None, **kwargs)[source]#
Load an ORC object from the file path, returning a DataFrame.
This method reads an ORC (Optimized Row Columnar) file into a pandas DataFrame using the pyarrow.orc library. ORC is a columnar storage format that provides efficient compression and fast retrieval for analytical workloads. It allows reading specific columns, handling different filesystem types (such as local storage, cloud storage via fsspec, or pyarrow filesystem), and supports different data type backends, including numpy_nullable and pyarrow.
- Parameters:
- pathstr, path object, or file-like object
String, path object (implementing
os.PathLike[str]
), or file-like object implementing a binaryread()
function. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be:file://localhost/path/to/table.orc
.- columnslist, default None
If not None, only these columns will be read from the file. Output always follows the ordering of the file and not the columns list. This mirrors the original behaviour of
pyarrow.orc.ORCFile.read()
.- dtype_backend{‘numpy_nullable’, ‘pyarrow’}
Back-end data type applied to the resultant
DataFrame
(still experimental). If not specified, the default behavior is to not use nullable data types. If specified, the behavior is as follows:"numpy_nullable"
: returns nullable-dtype-backedDataFrame
"pyarrow"
: returns pyarrow-backed nullableArrowDtype
DataFrame
Added in version 2.0.
- filesystemfsspec or pyarrow filesystem, default None
Filesystem object to use when reading the orc file.
Added in version 2.1.0.
- **kwargs
Any additional kwargs are passed to pyarrow.
- Returns:
- DataFrame
DataFrame based on the ORC file.
See also
read_csv
Read a comma-separated values (csv) file into a pandas DataFrame.
read_excel
Read an Excel file into a pandas DataFrame.
read_spss
Read an SPSS file into a pandas DataFrame.
read_sas
Load a SAS file into a pandas DataFrame.
read_feather
Load a feather-format object into a pandas DataFrame.
Notes
Before using this function you should read the user guide about ORC and install optional dependencies.
If
path
is a URI scheme pointing to a local or remote file (e.g. “s3://”), apyarrow.fs
filesystem will be attempted to read the file. You can also pass a pyarrow or fsspec filesystem object into the filesystem keyword to override this behavior.Examples
>>> result = pd.read_orc("example_pa.orc")