pandas.io.parsers.read_fwf¶
- pandas.io.parsers.read_fwf(filepath_or_buffer, colspecs=None, widths=None, **kwds)¶
Read a table of fixed-width formatted lines into DataFrame
Also supports optionally iterating or breaking of the file into chunks.
Parameters : filepath_or_buffer : string or file handle / StringIO. The string could be
a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
colspecs : a list of pairs (tuples), giving the extents
of the fixed-width fields of each line as half-open internals (i.e., [from, to[ ).
widths : a list of field widths, which can be used instead of
‘colspecs’ if the intervals are contiguous.
lineterminator : string (length 1), default None
Character to break file into lines. Only valid with C parser
quotechar : string
The character to used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
quoting : int
Controls whether quotes should be recognized. Values are taken from csv.QUOTE_* values. Acceptable values are 0, 1, 2, and 3 for QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONE, and QUOTE_NONNUMERIC, respectively.
skipinitialspace : boolean, default False
Skip spaces after delimiter
escapechar : string
dtype : Type name or dict of column -> type
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32}
compression : {‘gzip’, ‘bz2’, None}, default None
For on-the-fly decompression of on-disk data
dialect : string or csv.Dialect instance, default None
If None defaults to Excel dialect. Ignored if sep longer than 1 char See csv.Dialect documentation for more details
header : int, default 0 if names parameter not specified,
Row to use for the column labels of the parsed DataFrame. Specify None if there is no header row. Can be a list of integers that specify row locations for a multi-index on the columns E.g. [0,1,3]. Interveaning rows that are not specified (E.g. 2 in this example are skipped)
skiprows : list-like or integer
Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file
index_col : int or sequence or False, default None
Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to _not_ use the first column as the index (row names)
names : array-like
List of column names to use. If file contains no header row, then you should explicitly pass header=None
prefix : string or None (default)
Prefix to add to column numbers when no header, e.g ‘X’ for X0, X1, ...
na_values : list-like or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values
true_values : list
Values to consider as True
false_values : list
Values to consider as False
keep_default_na : bool, default True
If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to
parse_dates : boolean, list of ints or names, list of lists, or dict
If True -> try parsing the index. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
keep_date_col : boolean, default False
If True and parse_dates specifies combining multiple columns then keep the original columns.
date_parser : function
Function to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion.
dayfirst : boolean, default False
DD/MM format dates, international and European format
thousands : str, default None
Thousands separator
comment : str, default None
Indicates remainder of line should not be parsed Does not support line commenting (will return empty line)
decimal : str, default ‘.’
Character to recognize as decimal point. E.g. use ‘,’ for European data
nrows : int, default None
Number of rows of file to read. Useful for reading pieces of large files
iterator : boolean, default False
Return TextFileReader object
chunksize : int, default None
Return TextFileReader object for iteration
skipfooter : int, default 0
Number of line at bottom of file to skip
converters : dict. optional
Dict of functions for converting values in certain columns. Keys can either be integers or column labels
verbose : boolean, default False
Indicate number of NA values placed in non-numeric columns
delimiter : string, default None
Alternative argument name for sep. Regular expressions are accepted.
encoding : string, default None
Encoding to use for UTF when reading/writing (ex. ‘utf-8’)
squeeze : boolean, default False
If the parsed data only contains one column then return a Series
na_filter: boolean, default True :
Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file
usecols : array-like
Return a subset of the columns. Results in much faster parsing time and lower memory usage.
mangle_dupe_cols: boolean, default True :
Duplicate columns will be specified as ‘X.0’...’X.N’, rather than ‘X’...’X’
tupleize_cols: boolean, default False :
Leave a list of tuples on columns as is (default is to convert to a Multi Index on the columns)
Returns : result : DataFrame or TextParser
Also, ‘delimiter’ is used to specify the filler character of the :
fields if it is not spaces (e.g., ‘~’). :