pandas.Series.str.extract¶
- Series.str.extract(pat, flags=0, **kwargs)¶
Find groups in each string using passed regular expression
Parameters: pat : string
Pattern or regular expression
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
Returns: extracted groups : Series (one group) or DataFrame (multiple groups)
Note that dtype of the result is always object, even when no match is found and the result is a Series or DataFrame containing only NaN values.
Examples
A pattern with one group will return a Series. Non-matches will be NaN.
>>> Series(['a1', 'b2', 'c3']).str.extract('[ab](\d)') 0 1 1 2 2 NaN dtype: object
A pattern with more than one group will return a DataFrame.
>>> Series(['a1', 'b2', 'c3']).str.extract('([ab])(\d)') 0 1 0 a 1 1 b 2 2 NaN NaN
A pattern may contain optional groups.
>>> Series(['a1', 'b2', 'c3']).str.extract('([ab])?(\d)') 0 1 0 a 1 1 b 2 2 NaN 3
Named groups will become column names in the result.
>>> Series(['a1', 'b2', 'c3']).str.extract('(?P<letter>[ab])(?P<digit>\d)') letter digit 0 a 1 1 b 2 2 NaN NaN