pandas.read_stata

pandas.read_stata(filepath_or_buffer, convert_dates=True, convert_categoricals=True, index_col=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, chunksize=None, iterator=False, storage_options=None)[source]

Read Stata file into DataFrame.

Parameters
filepath_or_bufferstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.dta.

If you want to pass in a path object, pandas accepts any os.PathLike.

By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.

convert_datesbool, default True

Convert date variables to DataFrame time values.

convert_categoricalsbool, default True

Read value labels and convert columns to Categorical/Factor variables.

index_colstr, optional

Column to set as index.

convert_missingbool, default False

Flag indicating whether to convert missing values to their Stata representations. If False, missing values are replaced with nan. If True, columns containing missing values are returned with object data types and missing values are represented by StataMissingValue objects.

preserve_dtypesbool, default True

Preserve Stata datatypes. If False, numeric data are upcast to pandas default types for foreign data (float64 or int64).

columnslist or None

Columns to retain. Columns will be returned in the given order. None returns all columns.

order_categoricalsbool, default True

Flag indicating whether converted categorical data are ordered.

chunksizeint, default None

Return StataReader object for iterations, returns chunks with given number of lines.

iteratorbool, default False

Return StataReader object.

Returns
DataFrame or StataReader

See also

io.stata.StataReader

Low-level reader for Stata data files.

DataFrame.to_stata

Export Stata data files.

Notes

Categorical variables read through an iterator may not have the same categories and dtype. This occurs when a variable stored in a DTA file is associated to an incomplete set of value labels that only label a strict subset of the values.

Examples

Read a Stata dta file:

>>> df = pd.read_stata('filename.dta')

Read a Stata dta file in 10,000 line chunks:

>>> itr = pd.read_stata('filename.dta', chunksize=10000)
>>> for chunk in itr:
...     do_something(chunk)