pandas.Series.str.extract¶
-
Series.str.
extract
(pat, flags=0, expand=True)[source]¶ For each subject string in the Series, extract groups from the first match of regular expression pat.
Parameters: pat : string
Regular expression pattern with capturing groups
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
expand : bool, default True
- If True, return DataFrame.
- If False, return Series/Index/DataFrame.
New in version 0.18.0.
Returns: - DataFrame with one row for each subject string, and one column for
- each group. Any capture group names in regular expression pat will
- be used for column names; otherwise capture group numbers will be
- used. The dtype of each result column is always object, even when
- no match is found. If expand=False and pat has only one capture group,
- then return a Series (if subject is a Series) or Index (if subject
- is an Index).
See also
extractall
- returns all matches (not just the first match)
Examples
A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.
>>> s = Series(['a1', 'b2', 'c3']) >>> s.str.extract(r'([ab])(\d)') 0 1 0 a 1 1 b 2 2 NaN NaN
A pattern may contain optional groups.
>>> s.str.extract(r'([ab])?(\d)') 0 1 0 a 1 1 b 2 2 NaN 3
Named groups will become column names in the result.
>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)') letter digit 0 a 1 1 b 2 2 NaN NaN
A pattern with one group will return a DataFrame with one column if expand=True.
>>> s.str.extract(r'[ab](\d)', expand=True) 0 0 1 1 2 2 NaN
A pattern with one group will return a Series if expand=False.
>>> s.str.extract(r'[ab](\d)', expand=False) 0 1 1 2 2 NaN dtype: object