pandas.Series.str.extract

Series.str.extract(pat, flags=0)

Find groups in each string in the Series using passed regular expression.

Parameters:

pat : string

Pattern or regular expression

flags : int, default 0 (no flags)

re module flags, e.g. re.IGNORECASE

Returns:

extracted groups : Series (one group) or DataFrame (multiple groups)

Note that dtype of the result is always object, even when no match is found and the result is a Series or DataFrame containing only NaN values.

Examples

A pattern with one group will return a Series. Non-matches will be NaN.

>>> Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)')
0      1
1      2
2    NaN
dtype: object

A pattern with more than one group will return a DataFrame.

>>> Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)')
     0    1
0    a    1
1    b    2
2  NaN  NaN

A pattern may contain optional groups.

>>> Series(['a1', 'b2', 'c3']).str.extract('([ab])?(\d)')
     0  1
0    a  1
1    b  2
2  NaN  3

Named groups will become column names in the result.

>>> Series(['a1', 'b2', 'c3']).str.extract('(?P<letter>[ab])(?P<digit>\d)')
  letter digit
0      a     1
1      b     2
2    NaN   NaN