pandas.get_dummies#

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)[source]#

Convert categorical variable into dummy/indicator variables.

Each variable is converted in as many 0/1 variables as there are different values. Columns in the output are each named after a value; if the input is a DataFrame, the name of the original variable is prepended to the value.

Parameters:
dataarray-like, Series, or DataFrame

Data of which to get dummy indicators.

prefixstr, list of str, or dict of str, default None

String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternatively, prefix can be a dictionary mapping column names to prefixes.

prefix_sepstr, default ‘_’

If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix.

dummy_nabool, default False

Add a column to indicate NaNs, if False NaNs are ignored.

columnslist-like, default None

Column names in the DataFrame to be encoded. If columns is None then all the columns with object, string, or category dtype will be converted.

sparsebool, default False

Whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False).

drop_firstbool, default False

Whether to get k-1 dummies out of k categorical levels by removing the first level.

dtypedtype, default bool

Data type for new columns. Only a single dtype is allowed.

Returns:
DataFrame

Dummy-coded data. If data contains other columns than the dummy-coded one(s), these will be prepended, unaltered, to the result.

See also

Series.str.get_dummies

Convert Series of strings to dummy codes.

from_dummies()

Convert dummy codes to categorical DataFrame.

Notes

Reference the user guide for more examples.

Examples

>>> s = pd.Series(list('abca'))
>>> pd.get_dummies(s)
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False
>>> s1 = ['a', 'b', np.nan]
>>> pd.get_dummies(s1)
       a      b
0   True  False
1  False   True
2  False  False
>>> pd.get_dummies(s1, dummy_na=True)
       a      b    NaN
0   True  False  False
1  False   True  False
2  False  False   True
>>> df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
...                    'C': [1, 2, 3]})
>>> pd.get_dummies(df, prefix=['col1', 'col2'])
   C  col1_a  col1_b  col2_a  col2_b  col2_c
0  1    True   False   False    True   False
1  2   False    True    True   False   False
2  3    True   False   False   False    True
>>> pd.get_dummies(pd.Series(list('abcaa')))
       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True
3   True  False  False
4   True  False  False
>>> pd.get_dummies(pd.Series(list('abcaa')), drop_first=True)
       b      c
0  False  False
1   True  False
2  False   True
3  False  False
4  False  False
>>> pd.get_dummies(pd.Series(list('abc')), dtype=float)
     a    b    c
0  1.0  0.0  0.0
1  0.0  1.0  0.0
2  0.0  0.0  1.0