pandas.Series.str.split¶
-
Series.str.
split
(pat=None, n=-1, expand=False)[source]¶ Split strings around given separator/delimiter.
Split each string in the caller’s values by given pattern, propagating NaN values. Equivalent to
str.split()
.Parameters: pat : str, optional
String or regular expression to split on. If not specified, split on whitespace.
n : int, default -1 (all)
Limit number of splits in output.
None
, 0 and -1 will be interpreted as return all splits.expand : bool, default False
Expand the splitted strings into separate columns.
- If
True
, return DataFrame/MultiIndex expanding dimensionality. - If
False
, return Series/Index, containing lists of strings.
Returns: Series, Index, DataFrame or MultiIndex
Type matches caller unless
expand=True
(see Notes).See also
str.split
- Standard library version of this method.
Series.str.get_dummies
- Split each string into dummy variables.
Series.str.partition
- Split string on a separator, returning the before, separator, and after components.
Notes
The handling of the n keyword depends on the number of found splits:
- If found splits > n, make first n splits only
- If found splits <= n, make all splits
- If for a certain row the number of found splits < n,
append None for padding up to n if
expand=True
If using
expand=True
, Series and Index callers return DataFrame and MultiIndex objects, respectively.Examples
>>> s = pd.Series(["this is good text", "but this is even better"])
By default, split will return an object of the same size having lists containing the split elements
>>> s.str.split() 0 [this, is, good, text] 1 [but, this, is, even, better] dtype: object >>> s.str.split("random") 0 [this is good text] 1 [but this is even better] dtype: object
When using
expand=True
, the split elements will expand out into separate columns.For Series object, output return type is DataFrame.
>>> s.str.split(expand=True) 0 1 2 3 4 0 this is good text None 1 but this is even better >>> s.str.split(" is ", expand=True) 0 1 0 this good text 1 but this even better
For Index object, output return type is MultiIndex.
>>> i = pd.Index(["ba 100 001", "ba 101 002", "ba 102 003"]) >>> i.str.split(expand=True) MultiIndex(levels=[['ba'], ['100', '101', '102'], ['001', '002', '003']], labels=[[0, 0, 0], [0, 1, 2], [0, 1, 2]])
Parameter n can be used to limit the number of splits in the output.
>>> s.str.split("is", n=1) 0 [th, is good text] 1 [but th, is even better] dtype: object >>> s.str.split("is", n=1, expand=True) 0 1 0 th is good text 1 but th is even better
If NaN is present, it is propagated throughout the columns during the split.
>>> s = pd.Series(["this is good text", "but this is even better", np.nan]) >>> s.str.split(n=3, expand=True) 0 1 2 3 0 this is good text 1 but this is even better 2 NaN NaN NaN NaN
- If