pandas.crosstab#
- pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)[source]#
- Compute a simple cross tabulation of two (or more) factors. - By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed. - Parameters:
- indexarray-like, Series, or list of arrays/Series
- Values to group by in the rows. 
- columnsarray-like, Series, or list of arrays/Series
- Values to group by in the columns. 
- valuesarray-like, optional
- Array of values to aggregate according to the factors. Requires aggfunc be specified. 
- rownamessequence, default None
- If passed, must match number of row arrays passed. 
- colnamessequence, default None
- If passed, must match number of column arrays passed. 
- aggfuncfunction, optional
- If specified, requires values be specified as well. 
- marginsbool, default False
- Add row/column margins (subtotals). 
- margins_namestr, default ‘All’
- Name of the row/column that will contain the totals when margins is True. 
- dropnabool, default True
- Do not include columns whose entries are all NaN. 
- normalizebool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False
- Normalize by dividing all values by the sum of values. - If passed ‘all’ or True, will normalize over all values. 
- If passed ‘index’ will normalize over each row. 
- If passed ‘columns’ will normalize over each column. 
- If margins is True, will also normalize margin values. 
 
 
- Returns:
- DataFrame
- Cross tabulation of the data. 
 
 - See also - DataFrame.pivot
- Reshape data based on column values. 
- pivot_table
- Create a pivot table as a DataFrame. 
 - Notes - Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified. - Any input passed containing Categorical data will have all of its categories included in the cross-tabulation, even if the actual data does not contain any instances of a particular category. - In the event that there aren’t overlapping indexes an empty DataFrame will be returned. - Reference the user guide for more examples. - Examples - >>> a = np.array( ... [ ... "foo", ... "foo", ... "foo", ... "foo", ... "bar", ... "bar", ... "bar", ... "bar", ... "foo", ... "foo", ... "foo", ... ], ... dtype=object, ... ) >>> b = np.array( ... [ ... "one", ... "one", ... "one", ... "two", ... "one", ... "one", ... "one", ... "two", ... "two", ... "two", ... "one", ... ], ... dtype=object, ... ) >>> c = np.array( ... [ ... "dull", ... "dull", ... "shiny", ... "dull", ... "dull", ... "shiny", ... "shiny", ... "dull", ... "shiny", ... "shiny", ... "shiny", ... ], ... dtype=object, ... ) >>> pd.crosstab(a, [b, c], rownames=["a"], colnames=["b", "c"]) b one two c dull shiny dull shiny a bar 1 2 1 0 foo 2 2 1 2 - Here ‘c’ and ‘f’ are not represented in the data and will not be shown in the output because dropna is True by default. Set dropna=False to preserve categories with no data. - >>> foo = pd.Categorical(["a", "b"], categories=["a", "b", "c"]) >>> bar = pd.Categorical(["d", "e"], categories=["d", "e", "f"]) >>> pd.crosstab(foo, bar) col_0 d e row_0 a 1 0 b 0 1 >>> pd.crosstab(foo, bar, dropna=False) col_0 d e f row_0 a 1 0 0 b 0 1 0 c 0 0 0