pandas.read_stata¶
-
pandas.
read_stata
(filepath_or_buffer, convert_dates=True, convert_categoricals=True, encoding=None, index_col=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, chunksize=None, iterator=False)[source]¶ Read Stata file into DataFrame
Parameters: filepath_or_buffer : string or file-like object
Path to .dta file or object implementing a binary read() functions
convert_dates : boolean, defaults to True
Convert date variables to DataFrame time values
convert_categoricals : boolean, defaults to True
Read value labels and convert columns to Categorical/Factor variables
encoding : string, None or encoding
Encoding used to parse the files. None defaults to latin-1.
index_col : string, optional, default: None
Column to set as index
convert_missing : boolean, defaults to False
Flag indicating whether to convert missing values to their Stata representations. If False, missing values are replaced with nan. If True, columns containing missing values are returned with object data types and missing values are represented by StataMissingValue objects.
preserve_dtypes : boolean, defaults to True
Preserve Stata datatypes. If False, numeric data are upcast to pandas default types for foreign data (float64 or int64)
columns : list or None
Columns to retain. Columns will be returned in the given order. None returns all columns
order_categoricals : boolean, defaults to True
Flag indicating whether converted categorical data are ordered.
chunksize : int, default None
Return StataReader object for iterations, returns chunks with given number of lines
iterator : boolean, default False
Return StataReader object
Returns: DataFrame or StataReader
Examples
Read a Stata dta file:
>>> df = pandas.read_stata('filename.dta')
Read a Stata dta file in 10,000 line chunks:
>>> itr = pandas.read_stata('filename.dta', chunksize=10000) >>> for chunk in itr: >>> do_something(chunk)