pandas.Series.str.split¶

Series.str.split(pat=None, n=-1, expand=False)[source]¶

Split strings around given separator/delimiter.

Split each string in the caller’s values by given pattern, propagating NaN values. Equivalent to str.split().

Parameters:

pat : str, optional

String or regular expression to split on. If not specified, split on whitespace.

n : int, default -1 (all)

Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.

expand : bool, default False

Expand the splitted strings into separate columns.

If True, return DataFrame/MultiIndex expanding dimensionality.

If False, return Series/Index, containing lists of strings.

Returns:

Series, Index, DataFrame or MultiIndex

Type matches caller unless expand=True (see Notes).

See also

str.split: Standard library version of this method.
Series.str.get_dummies: Split each string into dummy variables.
Series.str.partition: Split string on a separator, returning the before, separator, and after components.

Notes

The handling of the n keyword depends on the number of found splits:

If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append None for padding up to n if expand=True

If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.

Examples

>>> s = pd.Series(["this is good text", "but this is even better"])

By default, split will return an object of the same size having lists containing the split elements

>>> s.str.split()
0           [this, is, good, text]
1    [but, this, is, even, better]
dtype: object
>>> s.str.split("random")
0          [this is good text]
1    [but this is even better]
dtype: object

When using expand=True, the split elements will expand out into separate columns.

For Series object, output return type is DataFrame.

>>> s.str.split(expand=True)
      0     1     2     3       4
0  this    is  good  text    None
1   but  this    is  even  better
>>> s.str.split(" is ", expand=True)
          0            1
0      this    good text
1  but this  even better

For Index object, output return type is MultiIndex.

>>> i = pd.Index(["ba 100 001", "ba 101 002", "ba 102 003"])
>>> i.str.split(expand=True)
MultiIndex(levels=[['ba'], ['100', '101', '102'], ['001', '002', '003']],
       labels=[[0, 0, 0], [0, 1, 2], [0, 1, 2]])

Parameter n can be used to limit the number of splits in the output.

>>> s.str.split("is", n=1)
0          [th,  is good text]
1    [but th,  is even better]
dtype: object
>>> s.str.split("is", n=1, expand=True)
        0                1
0      th     is good text
1  but th   is even better

If NaN is present, it is propagated throughout the columns during the split.

>>> s = pd.Series(["this is good text", "but this is even better", np.nan])
>>> s.str.split(n=3, expand=True)
      0     1     2            3
0  this    is  good         text
1   but  this    is  even better
2   NaN   NaN   NaN          NaN

Table Of Contents

Search

pandas.Series.str.split¶