Table Of Contents


Enter search terms or a module, class or function name.


Series.str.extract(pat, flags=0, expand=True)[source]

For each subject string in the Series, extract groups from the first match of regular expression pat.


pat : string

Regular expression pattern with capturing groups

flags : int, default 0 (no flags)

re module flags, e.g. re.IGNORECASE

expand : bool, default True

  • If True, return DataFrame.
  • If False, return Series/Index/DataFrame.

New in version 0.18.0.

DataFrame with one row for each subject string, and one column for
each group. Any capture group names in regular expression pat will
be used for column names; otherwise capture group numbers will be
used. The dtype of each result column is always object, even when
no match is found. If expand=False and pat has only one capture group,
then return a Series (if subject is a Series) or Index (if subject
is an Index).

See also

returns all matches (not just the first match)


A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.

>>> s = Series(['a1', 'b2', 'c3'])
>>> s.str.extract(r'([ab])(\d)')
     0    1
0    a    1
1    b    2
2  NaN  NaN

A pattern may contain optional groups.

>>> s.str.extract(r'([ab])?(\d)')
     0  1
0    a  1
1    b  2
2  NaN  3

Named groups will become column names in the result.

>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)')
  letter digit
0      a     1
1      b     2
2    NaN   NaN

A pattern with one group will return a DataFrame with one column if expand=True.

>>> s.str.extract(r'[ab](\d)', expand=True)
0    1
1    2
2  NaN

A pattern with one group will return a Series if expand=False.

>>> s.str.extract(r'[ab](\d)', expand=False)
0      1
1      2
2    NaN
dtype: object
Scroll To Top