pandas.Series.str.extractall¶
-
Series.str.
extractall
(pat, flags=0)[source]¶ For each subject string in the Series, extract groups from all matches of regular expression pat. When each subject string in the Series has exactly one match, extractall(pat).xs(0, level=’match’) is the same as extract(pat).
New in version 0.18.0.
Parameters: pat : string
Regular expression pattern with capturing groups
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
Returns: - A DataFrame with one row for each match, and one column for each
- group. Its rows have a MultiIndex with first levels that come from
- the subject Series. The last level is named ‘match’ and indicates
- the order in the subject. Any capture group names in regular
- expression pat will be used for column names; otherwise capture
- group numbers will be used.
See also
extract
- returns first match only (not all matches)
Examples
A pattern with one group will return a DataFrame with one column. Indices with no matches will not appear in the result.
>>> s = Series(["a1a2", "b1", "c1"], index=["A", "B", "C"]) >>> s.str.extractall(r"[ab](\d)") 0 match A 0 1 1 2 B 0 1
Capture group names are used for column names of the result.
>>> s.str.extractall(r"[ab](?P<digit>\d)") digit match A 0 1 1 2 B 0 1
A pattern with two groups will return a DataFrame with two columns.
>>> s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)") letter digit match A 0 a 1 1 a 2 B 0 b 1
Optional groups that do not match are NaN in the result.
>>> s.str.extractall(r"(?P<letter>[ab])?(?P<digit>\d)") letter digit match A 0 a 1 1 a 2 B 0 b 1 C 0 NaN 1