pandas.DataFrame.to_orc#
- DataFrame.to_orc(path=None, *, engine='pyarrow', index=None, engine_kwargs=None)[source]#
- Write a DataFrame to the ORC format. - New in version 1.5.0. - Parameters:
- pathstr, file-like object or None, default None
- If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function). If path is None, a bytes object is returned. 
- engine{‘pyarrow’}, default ‘pyarrow’
- ORC library to use. Pyarrow must be >= 7.0.0. 
- indexbool, optional
- If - True, include the dataframe’s index(es) in the file output. If- False, they will not be written to the file. If- None, similar to- inferthe dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.
- engine_kwargsdict[str, Any] or None, default None
- Additional keyword arguments passed to - pyarrow.orc.write_table().
 
- Returns:
- bytes if no path argument is provided else None
 
- Raises:
- NotImplementedError
- Dtype of one or more columns is category, unsigned integers, interval, period or sparse. 
- ValueError
- engine is not pyarrow. 
 
 - See also - read_orc
- Read a ORC file. 
- DataFrame.to_parquet
- Write a parquet file. 
- DataFrame.to_csv
- Write a csv file. 
- DataFrame.to_sql
- Write to a sql table. 
- DataFrame.to_hdf
- Write to hdf. 
 - Notes - Before using this function you should read the user guide about ORC and install optional dependencies. 
- This function requires pyarrow library. 
- For supported dtypes please refer to supported ORC features in Arrow. 
- Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files. 
 - Examples - >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [4, 3]}) >>> df.to_orc('df.orc') >>> pd.read_orc('df.orc') col1 col2 0 1 4 1 2 3 - If you want to get a buffer to the orc content you can write it to io.BytesIO - >>> import io >>> b = io.BytesIO(df.to_orc()) >>> b.seek(0) 0 >>> content = b.read()