What’s new in 1.0.0 (January 29, 2020)#
These are the changes in pandas 1.0.0. See Release notes for a full changelog including other versions of pandas.
Note
The pandas 1.0 release removed a lot of functionality that was deprecated in previous releases (see below for an overview). It is recommended to first upgrade to pandas 0.25 and to ensure your code is working without warnings, before upgrading to pandas 1.0.
New deprecation policy#
Starting with pandas 1.0.0, pandas will adopt a variant of SemVer to version releases. Briefly,
- Deprecations will be introduced in minor releases (e.g. 1.1.0, 1.2.0, 2.1.0, …) 
- Deprecations will be enforced in major releases (e.g. 1.0.0, 2.0.0, 3.0.0, …) 
- API-breaking changes will be made only in major releases (except for experimental features) 
See Version policy for more.
Enhancements#
Using Numba in rolling.apply and expanding.apply#
We’ve added an engine keyword to apply() and apply()
that allows the user to execute the routine using Numba instead of Cython.
Using the Numba engine can yield significant performance gains if the apply function can operate on numpy arrays and
the data set is larger (1 million rows or greater). For more details, see
rolling apply documentation (GH 28987, GH 30936)
Defining custom windows for rolling operations#
We’ve added a pandas.api.indexers.BaseIndexer() class that allows users to define how
window bounds are created during rolling operations. Users can define their own get_window_bounds
method on a pandas.api.indexers.BaseIndexer() subclass that will generate the start and end
indices used for each window during the rolling aggregation. For more details and example usage, see
the custom window rolling documentation
Converting to markdown#
We’ve added to_markdown() for creating a markdown table (GH 11052)
In [1]: df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
In [2]: print(df.to_markdown())
|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |
Experimental new features#
Experimental NA scalar to denote missing values#
A new pd.NA value (singleton) is introduced to represent scalar missing
values. Up to now, pandas used several values to represent missing data: np.nan is used for this for float data, np.nan or
None for object-dtype data and pd.NaT for datetime-like data. The
goal of pd.NA is to provide a “missing” indicator that can be used
consistently across data types. pd.NA is currently used by the nullable integer and boolean
data types and the new string data type (GH 28095).
Warning
Experimental: the behaviour of pd.NA can still change without warning.
For example, creating a Series using the nullable integer dtype:
In [3]: s = pd.Series([1, 2, None], dtype="Int64")
In [4]: s
Out[4]: 
0       1
1       2
2    <NA>
Length: 3, dtype: Int64
In [5]: s[2]
Out[5]: <NA>
Compared to np.nan, pd.NA behaves differently in certain operations.
In addition to arithmetic operations, pd.NA also propagates as “missing”
or “unknown” in comparison operations:
In [6]: np.nan > 1
Out[6]: False
In [7]: pd.NA > 1
Out[7]: <NA>
For logical operations, pd.NA follows the rules of the
three-valued logic (or
Kleene logic). For example:
In [8]: pd.NA | True
Out[8]: True
For more, see NA section in the user guide on missing data.
Dedicated string data type#
We’ve added StringDtype, an extension type dedicated to string data.
Previously, strings were typically stored in object-dtype NumPy arrays. (GH 29975)
Warning
StringDtype is currently considered experimental. The implementation
and parts of the API may change without warning.
The 'string' extension type solves several issues with object-dtype NumPy arrays:
- You can accidentally store a mixture of strings and non-strings in an - objectdtype array. A- StringArraycan only store strings.
- objectdtype breaks dtype-specific operations like- DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text, but still object-dtype columns.
- When reading code, the contents of an - objectdtype array is less clear than- string.
In [9]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())
Out[9]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string
You can use the alias "string" as well.
In [10]: s = pd.Series(['abc', None, 'def'], dtype="string")
In [11]: s
Out[11]: 
0     abc
1    <NA>
2     def
Length: 3, dtype: string
The usual string accessor methods work. Where appropriate, the return type of the Series or columns of a DataFrame will also have string dtype.
In [12]: s.str.upper()
Out[12]: 
0     ABC
1    <NA>
2     DEF
Length: 3, dtype: string
In [13]: s.str.split('b', expand=True).dtypes
Out[13]: 
0    string[python]
1    string[python]
Length: 2, dtype: object
String accessor methods returning integers will return a value with Int64Dtype
In [14]: s.str.count("a")
Out[14]: 
0       1
1    <NA>
2       0
Length: 3, dtype: Int64
We recommend explicitly using the string data type when working with strings.
See Text data types for more.
Boolean data type with missing values support#
We’ve added BooleanDtype / BooleanArray, an extension
type dedicated to boolean data that can hold missing values. The default
bool data type based on a bool-dtype NumPy array, the column can only hold
True or False, and not missing values. This new BooleanArray
can store missing values as well by keeping track of this in a separate mask.
(GH 29555, GH 30095, GH 31131)
In [15]: pd.Series([True, False, None], dtype=pd.BooleanDtype())
Out[15]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean
You can use the alias "boolean" as well.
In [16]: s = pd.Series([True, False, None], dtype="boolean")
In [17]: s
Out[17]: 
0     True
1    False
2     <NA>
Length: 3, dtype: boolean
Method convert_dtypes to ease use of supported extension dtypes#
In order to encourage use of the extension dtypes StringDtype,
BooleanDtype, Int64Dtype, Int32Dtype, etc., that support pd.NA, the
methods DataFrame.convert_dtypes() and Series.convert_dtypes()
have been introduced. (GH 29752) (GH 30929)
Example:
In [18]: df = pd.DataFrame({'x': ['abc', None, 'def'],
   ....:                    'y': [1, 2, np.nan],
   ....:                    'z': [True, False, True]})
   ....: 
In [19]: df
Out[19]: 
      x    y      z
0   abc  1.0   True
1  None  2.0  False
2   def  NaN   True
[3 rows x 3 columns]
In [20]: df.dtypes
Out[20]: 
x     object
y    float64
z       bool
Length: 3, dtype: object
In [21]: converted = df.convert_dtypes()
In [22]: converted
Out[22]: 
      x     y      z
0   abc     1   True
1  <NA>     2  False
2   def  <NA>   True
[3 rows x 3 columns]
In [23]: converted.dtypes
Out[23]: 
x    string[python]
y             Int64
z           boolean
Length: 3, dtype: object
This is especially useful after reading in data using readers such as read_csv()
and read_excel().
See here for a description.
Other enhancements#
- DataFrame.to_string()added the- max_colwidthparameter to control when wide columns are truncated (GH 9784)
- Added the - na_valueargument to- Series.to_numpy(),- Index.to_numpy()and- DataFrame.to_numpy()to control the value used for missing data (GH 30322)
- MultiIndex.from_product()infers level names from inputs if not explicitly provided (GH 27292)
- DataFrame.to_latex()now accepts- captionand- labelarguments (GH 25436)
- DataFrames with nullable integer, the new string dtype and period data type can now be converted to - pyarrow(>=0.15.0), which means that it is supported in writing to the Parquet file format when using the- pyarrowengine (GH 28368). Full roundtrip to parquet (writing and reading back in with- to_parquet()/- read_parquet()) is supported starting with pyarrow >= 0.16 (GH 20612).
- to_parquet()now appropriately handles the- schemaargument for user defined schemas in the pyarrow engine. (GH 30270)
- DataFrame.to_json()now accepts an- indentinteger argument to enable pretty printing of JSON output (GH 12004)
- read_stata()can read Stata 119 dta files. (GH 28250)
- Implemented - pandas.core.window.Window.var()and- pandas.core.window.Window.std()functions (GH 26597)
- Added - encodingargument to- DataFrame.to_string()for non-ascii text (GH 28766)
- Added - encodingargument to- DataFrame.to_html()for non-ascii text (GH 28663)
- Styler.background_gradient()now accepts- vminand- vmaxarguments (GH 12145)
- Styler.format()added the- na_repparameter to help format the missing values (GH 21527, GH 28358)
- read_excel()now can read binary Excel (- .xlsb) files by passing- engine='pyxlsb'. For more details and example usage, see the Binary Excel files documentation. Closes GH 8540.
- The - partition_colsargument in- DataFrame.to_parquet()now accepts a string (GH 27117)
- pandas.read_json()now parses- NaN,- Infinityand- -Infinity(GH 12213)
- DataFrame constructor preserve - ExtensionArraydtype with- ExtensionArray(GH 11363)
- DataFrame.sort_values()and- Series.sort_values()have gained- ignore_indexkeyword to be able to reset index after sorting (GH 30114)
- DataFrame.sort_index()and- Series.sort_index()have gained- ignore_indexkeyword to reset index (GH 30114)
- DataFrame.drop_duplicates()has gained- ignore_indexkeyword to reset index (GH 30114)
- Added new writer for exporting Stata dta files in versions 118 and 119, - StataWriterUTF8. These files formats support exporting strings containing Unicode characters. Format 119 supports data sets with more than 32,767 variables (GH 23573, GH 30959)
- Series.map()now accepts- collections.abc.Mappingsubclasses as a mapper (GH 29733)
- Added an experimental - attrsfor storing global metadata about a dataset (GH 29062)
- Timestamp.fromisocalendar()is now compatible with python 3.8 and above (GH 28115)
- DataFrame.to_pickle()and- read_pickle()now accept URL (GH 30163)
Backwards incompatible API changes#
Avoid using names from MultiIndex.levels#
As part of a larger refactor to MultiIndex the level names are now
stored separately from the levels (GH 27242). We recommend using
MultiIndex.names to access the names, and Index.set_names()
to update the names.
For backwards compatibility, you can still access the names via the levels.
In [24]: mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
In [25]: mi.levels[0].name
Out[25]: 'x'
However, it is no longer possible to update the names of the MultiIndex
via the level.
In [26]: mi.levels[0].name = "new name"
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[26], line 1
----> 1 mi.levels[0].name = "new name"
File ~/work/pandas/pandas/pandas/core/indexes/base.py:1675, in Index.name(self, value)
   1671 @name.setter
   1672 def name(self, value: Hashable) -> None:
   1673     if self._no_setting_name:
   1674         # Used in MultiIndex.levels to avoid silently ignoring name updates.
-> 1675         raise RuntimeError(
   1676             "Cannot set name on a level of a MultiIndex. Use "
   1677             "'MultiIndex.set_names' instead."
   1678         )
   1679     maybe_extract_name(value, None, type(self))
   1680     self._name = value
RuntimeError: Cannot set name on a level of a MultiIndex. Use 'MultiIndex.set_names' instead.
In [27]: mi.names
Out[27]: FrozenList(['x', 'y'])
To update, use MultiIndex.set_names, which returns a new MultiIndex.
In [28]: mi2 = mi.set_names("new name", level=0)
In [29]: mi2.names
Out[29]: FrozenList(['new name', 'y'])
New repr for IntervalArray#
pandas.arrays.IntervalArray adopts a new __repr__ in accordance with other array classes (GH 25022)
pandas 0.25.x
In [1]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[2]:
IntervalArray([(0, 1], (2, 3]],
              closed='right',
              dtype='interval[int64]')
pandas 1.0.0
In [30]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[30]: 
<IntervalArray>
[(0, 1], (2, 3]]
Length: 2, dtype: interval[int64, right]
DataFrame.rename now only accepts one positional argument#
DataFrame.rename() would previously accept positional arguments that would lead
to ambiguous or undefined behavior. From pandas 1.0, only the very first argument, which
maps labels to their new names along the default axis, is allowed to be passed by position
(GH 29136).
pandas 0.25.x
In [1]: df = pd.DataFrame([[1]])
In [2]: df.rename({0: 1}, {0: 2})
Out[2]:
FutureWarning: ...Use named arguments to resolve ambiguity...
   2
1  1
pandas 1.0.0
In [3]: df.rename({0: 1}, {0: 2})
Traceback (most recent call last):
...
TypeError: rename() takes from 1 to 2 positional arguments but 3 were given
Note that errors will now be raised when conflicting or potentially ambiguous arguments are provided.
pandas 0.25.x
In [4]: df.rename({0: 1}, index={0: 2})
Out[4]:
   0
1  1
In [5]: df.rename(mapper={0: 1}, index={0: 2})
Out[5]:
   0
2  1
pandas 1.0.0
In [6]: df.rename({0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'
In [7]: df.rename(mapper={0: 1}, index={0: 2})
Traceback (most recent call last):
...
TypeError: Cannot specify both 'mapper' and any of 'index' or 'columns'
You can still change the axis along which the first positional argument is applied by
supplying the axis keyword argument.
In [31]: df.rename({0: 1})
Out[31]: 
   0
1  1
[1 rows x 1 columns]
In [32]: df.rename({0: 1}, axis=1)
Out[32]: 
   1
0  1
[1 rows x 1 columns]
If you would like to update both the index and column labels, be sure to use the respective keywords.
In [33]: df.rename(index={0: 1}, columns={0: 2})
Out[33]: 
   2
1  1
[1 rows x 1 columns]
Extended verbose info output for DataFrame#
DataFrame.info() now shows line numbers for the columns summary (GH 17304)
pandas 0.25.x
In [1]: df = pd.DataFrame({"int_col": [1, 2, 3],
...                    "text_col": ["a", "b", "c"],
...                    "float_col": [0.0, 0.1, 0.2]})
In [2]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
int_col      3 non-null int64
text_col     3 non-null object
float_col    3 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 152.0+ bytes
pandas 1.0.0
In [34]: df = pd.DataFrame({"int_col": [1, 2, 3],
   ....:                    "text_col": ["a", "b", "c"],
   ....:                    "float_col": [0.0, 0.1, 0.2]})
   ....: 
In [35]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   int_col    3 non-null      int64  
 1   text_col   3 non-null      object 
 2   float_col  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
pandas.array() inference changes#
pandas.array() now infers pandas’ new extension types in several cases (GH 29791):
- String data (including missing values) now returns a - arrays.StringArray.
- Integer data (including missing values) now returns a - arrays.IntegerArray.
- Boolean data (including missing values) now returns the new - arrays.BooleanArray
pandas 0.25.x
In [1]: pd.array(["a", None])
Out[1]:
<PandasArray>
['a', None]
Length: 2, dtype: object
In [2]: pd.array([1, None])
Out[2]:
<PandasArray>
[1, None]
Length: 2, dtype: object
pandas 1.0.0
In [36]: pd.array(["a", None])
Out[36]: 
<StringArray>
['a', <NA>]
Length: 2, dtype: string
In [37]: pd.array([1, None])
Out[37]: 
<IntegerArray>
[1, <NA>]
Length: 2, dtype: Int64
As a reminder, you can specify the dtype to disable all inference.
arrays.IntegerArray now uses pandas.NA#
arrays.IntegerArray now uses pandas.NA rather than
numpy.nan as its missing value marker (GH 29964).
pandas 0.25.x
In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64
In [3]: a[2]
Out[3]:
nan
pandas 1.0.0
In [38]: a = pd.array([1, 2, None], dtype="Int64")
In [39]: a
Out[39]: 
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64
In [40]: a[2]
Out[40]: <NA>
This has a few API-breaking consequences.
Converting to a NumPy ndarray
When converting to a NumPy array missing values will be pd.NA, which cannot
be converted to a float. So calling np.asarray(integer_array, dtype="float")
will now raise.
pandas 0.25.x
In [1]: np.asarray(a, dtype="float")
Out[1]:
array([ 1.,  2., nan])
pandas 1.0.0
In [41]: np.asarray(a, dtype="float")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[41], line 1
----> 1 np.asarray(a, dtype="float")
File ~/work/pandas/pandas/pandas/core/arrays/masked.py:575, in BaseMaskedArray.__array__(self, dtype)
    570 def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
    571     """
    572     the array interface, return my values
    573     We return an object array here to preserve our scalar values
    574     """
--> 575     return self.to_numpy(dtype=dtype)
File ~/work/pandas/pandas/pandas/core/arrays/masked.py:487, in BaseMaskedArray.to_numpy(self, dtype, copy, na_value)
    481 if self._hasna:
    482     if (
    483         dtype != object
    484         and not is_string_dtype(dtype)
    485         and na_value is libmissing.NA
    486     ):
--> 487         raise ValueError(
    488             f"cannot convert to '{dtype}'-dtype NumPy array "
    489             "with missing values. Specify an appropriate 'na_value' "
    490             "for this dtype."
    491         )
    492     # don't pass copy to astype -> always need a copy since we are mutating
    493     with warnings.catch_warnings():
ValueError: cannot convert to 'float64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
Use arrays.IntegerArray.to_numpy() with an explicit na_value instead.
In [42]: a.to_numpy(dtype="float", na_value=np.nan)
Out[42]: array([ 1.,  2., nan])
Reductions can return pd.NA
When performing a reduction such as a sum with skipna=False, the result
will now be pd.NA instead of np.nan in presence of missing values
(GH 30958).
pandas 0.25.x
In [1]: pd.Series(a).sum(skipna=False)
Out[1]:
nan
pandas 1.0.0
In [43]: pd.Series(a).sum(skipna=False)
Out[43]: <NA>
value_counts returns a nullable integer dtype
Series.value_counts() with a nullable integer dtype now returns a nullable
integer dtype for the values.
pandas 0.25.x
In [1]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[1]:
dtype('int64')
pandas 1.0.0
In [44]: pd.Series([2, 1, 1, None], dtype="Int64").value_counts().dtype
Out[44]: Int64Dtype()
See Experimental NA scalar to denote missing values for more on the differences between pandas.NA
and numpy.nan.
arrays.IntegerArray comparisons return arrays.BooleanArray#
Comparison operations on a arrays.IntegerArray now returns a
arrays.BooleanArray rather than a NumPy array (GH 29964).
pandas 0.25.x
In [1]: a = pd.array([1, 2, None], dtype="Int64")
In [2]: a
Out[2]:
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64
In [3]: a > 1
Out[3]:
array([False,  True, False])
pandas 1.0.0
In [45]: a = pd.array([1, 2, None], dtype="Int64")
In [46]: a > 1
Out[46]: 
<BooleanArray>
[False, True, <NA>]
Length: 3, dtype: boolean
Note that missing values now propagate, rather than always comparing unequal
like numpy.nan. See Experimental NA scalar to denote missing values for more.
By default Categorical.min() now returns the minimum instead of np.nan#
When Categorical contains np.nan,
Categorical.min() no longer return np.nan by default (skipna=True) (GH 25303)
pandas 0.25.x
In [1]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[1]: nan
pandas 1.0.0
In [47]: pd.Categorical([1, 2, np.nan], ordered=True).min()
Out[47]: 1
Default dtype of empty pandas.Series#
Initialising an empty pandas.Series without specifying a dtype will raise a DeprecationWarning now
(GH 17261). The default dtype will change from float64 to object in future releases so that it is
consistent with the behaviour of DataFrame and Index.
pandas 1.0.0
In [1]: pd.Series()
Out[2]:
DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
Series([], dtype: float64)
Result dtype inference changes for resample operations#
The rules for the result dtype in DataFrame.resample() aggregations have changed for extension types (GH 31359).
Previously, pandas would attempt to convert the result back to the original dtype, falling back to the usual
inference rules if that was not possible. Now, pandas will only return a result of the original dtype if the
scalar values in the result are instances of the extension dtype’s scalar type.
In [48]: df = pd.DataFrame({"A": ['a', 'b']}, dtype='category',
   ....:                   index=pd.date_range('2000', periods=2))
   ....: 
In [49]: df
Out[49]: 
            A
2000-01-01  a
2000-01-02  b
[2 rows x 1 columns]
pandas 0.25.x
In [1]> df.resample("2D").agg(lambda x: 'a').A.dtype
Out[1]:
CategoricalDtype(categories=['a', 'b'], ordered=False)
pandas 1.0.0
In [50]: df.resample("2D").agg(lambda x: 'a').A.dtype
Out[50]: dtype('O')
This fixes an inconsistency between resample and groupby.
This also fixes a potential bug, where the values of the result might change
depending on how the results are cast back to the original dtype.
pandas 0.25.x
In [1] df.resample("2D").agg(lambda x: 'c')
Out[1]:
     A
0  NaN
pandas 1.0.0
In [51]: df.resample("2D").agg(lambda x: 'c')
Out[51]: 
            A
2000-01-01  c
[1 rows x 1 columns]
Increased minimum version for Python#
pandas 1.0.0 supports Python 3.6.1 and higher (GH 29212).
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated (GH 29766, GH 29723). If installed, we now require:
| Package | Minimum Version | Required | Changed | 
|---|---|---|---|
| numpy | 1.13.3 | X | |
| pytz | 2015.4 | X | |
| python-dateutil | 2.6.1 | X | |
| bottleneck | 1.2.1 | ||
| numexpr | 2.6.2 | ||
| pytest (dev) | 4.0.2 | 
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
| Package | Minimum Version | Changed | 
|---|---|---|
| beautifulsoup4 | 4.6.0 | |
| fastparquet | 0.3.2 | X | 
| gcsfs | 0.2.2 | |
| lxml | 3.8.0 | |
| matplotlib | 2.2.2 | |
| numba | 0.46.0 | X | 
| openpyxl | 2.5.7 | X | 
| pyarrow | 0.13.0 | X | 
| pymysql | 0.7.1 | |
| pytables | 3.4.2 | |
| s3fs | 0.3.0 | X | 
| scipy | 0.19.0 | |
| sqlalchemy | 1.1.4 | |
| xarray | 0.8.2 | |
| xlrd | 1.1.0 | |
| xlsxwriter | 0.9.8 | |
| xlwt | 1.2.0 | 
See Dependencies and Optional dependencies for more.
Build changes#
pandas has added a pyproject.toml file and will no longer include
cythonized files in the source distribution uploaded to PyPI (GH 28341, GH 20775). If you’re installing
a built distribution (wheel) or via conda, this shouldn’t have any effect on you. If you’re building pandas from
source, you should no longer need to install Cython into your build environment before calling pip install pandas.
Other API changes#
- DataFrameGroupBy.transform()and- SeriesGroupBy.transform()now raises on invalid operation names (GH 27489)
- pandas.api.types.infer_dtype()will now return “integer-na” for integer and- np.nanmix (GH 27283)
- MultiIndex.from_arrays()will no longer infer names from arrays if- names=Noneis explicitly provided (GH 27292)
- In order to improve tab-completion, pandas does not include most deprecated attributes when introspecting a pandas object using - dir(e.g.- dir(df)). To see which attributes are excluded, see an object’s- _deprecationsattribute, for example- pd.DataFrame._deprecations(GH 28805).
- The returned dtype of - unique()now matches the input dtype. (GH 27874)
- Changed the default configuration value for - options.matplotlib.register_convertersfrom- Trueto- "auto"(GH 18720). Now, pandas custom formatters will only be applied to plots created by pandas, through- plot(). Previously, pandas’ formatters would be applied to all plots created after a- plot(). See units registration for more.
- Series.dropna()has dropped its- **kwargsargument in favor of a single- howparameter. Supplying anything else than- howto- **kwargsraised a- TypeErrorpreviously (GH 29388)
- When testing pandas, the new minimum required version of pytest is 5.0.1 (GH 29664) 
- Series.str.__iter__()was deprecated and will be removed in future releases (GH 28277).
- Added - <NA>to the list of default NA values for- read_csv()(GH 30821)
Documentation improvements#
- Added new section on Scaling to large datasets (GH 28315). 
- Added sub-section on Query MultiIndex for HDF5 datasets (GH 28791). 
Deprecations#
- Series.item()and- Index.item()have been _undeprecated_ (GH 29250)
- Index.set_valuehas been deprecated. For a given index- idx, array- arr, value in- idxof- idx_valand a new value of- val,- idx.set_value(arr, idx_val, val)is equivalent to- arr[idx.get_loc(idx_val)] = val, which should be used instead (GH 28621).
- is_extension_type()is deprecated,- is_extension_array_dtype()should be used instead (GH 29457)
- eval()keyword argument “truediv” is deprecated and will be removed in a future version (GH 29812)
- DateOffset.isAnchored()and- DatetOffset.onOffset()are deprecated and will be removed in a future version, use- DateOffset.is_anchored()and- DateOffset.is_on_offset()instead (GH 30340)
- pandas.tseries.frequencies.get_offsetis deprecated and will be removed in a future version, use- pandas.tseries.frequencies.to_offsetinstead (GH 4205)
- Categorical.take_nd()and- CategoricalIndex.take_nd()are deprecated, use- Categorical.take()and- CategoricalIndex.take()instead (GH 27745)
- The parameter - numeric_onlyof- Categorical.min()and- Categorical.max()is deprecated and replaced with- skipna(GH 25303)
- The parameter - labelin- lreshape()has been deprecated and will be removed in a future version (GH 29742)
- pandas.core.indexhas been deprecated and will be removed in a future version, the public classes are available in the top-level namespace (GH 19711)
- pandas.json_normalize()is now exposed in the top-level namespace. Usage of- json_normalizeas- pandas.io.json.json_normalizeis now deprecated and it is recommended to use- json_normalizeas- pandas.json_normalize()instead (GH 27586).
- The - numpyargument of- pandas.read_json()is deprecated (GH 28512).
- DataFrame.to_stata(),- DataFrame.to_feather(), and- DataFrame.to_parquet()argument “fname” is deprecated, use “path” instead (GH 23574)
- The deprecated internal attributes - _start,- _stopand- _stepof- RangeIndexnow raise a- FutureWarninginstead of a- DeprecationWarning(GH 26581)
- The - pandas.util.testingmodule has been deprecated. Use the public API in- pandas.testingdocumented at Assertion functions (GH 16232).
- pandas.SparseArrayhas been deprecated. Use- pandas.arrays.SparseArray(- arrays.SparseArray) instead. (GH 30642)
- The parameter - is_copyof- Series.take()and- DataFrame.take()has been deprecated and will be removed in a future version. (GH 27357)
- Support for multi-dimensional indexing (e.g. - index[:, None]) on a- Indexis deprecated and will be removed in a future version, convert to a numpy array before indexing instead (GH 30588)
- The - pandas.npsubmodule is now deprecated. Import numpy directly instead (GH 30296)
- The - pandas.datetimeclass is now deprecated. Import from- datetimeinstead (GH 30610)
- diffwill raise a- TypeErrorrather than implicitly losing the dtype of extension types in the future. Convert to the correct dtype before calling- diffinstead (GH 31025)
Selecting Columns from a Grouped DataFrame
When selecting columns from a DataFrameGroupBy object, passing individual keys (or a tuple of keys) inside single brackets is deprecated,
a list of items should be used instead. (GH 23566) For example:
df = pd.DataFrame({
    "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
    "B": np.random.randn(8),
    "C": np.random.randn(8),
})
g = df.groupby('A')
# single key, returns SeriesGroupBy
g['B']
# tuple of single key, returns SeriesGroupBy
g[('B',)]
# tuple of multiple keys, returns DataFrameGroupBy, raises FutureWarning
g[('B', 'C')]
# multiple keys passed directly, returns DataFrameGroupBy, raises FutureWarning
# (implicitly converts the passed strings into a single tuple)
g['B', 'C']
# proper way, returns DataFrameGroupBy
g[['B', 'C']]
Removal of prior version deprecations/changes#
Removed SparseSeries and SparseDataFrame
SparseSeries, SparseDataFrame and the DataFrame.to_sparse method
have been removed (GH 28425). We recommend using a Series or
DataFrame with sparse values instead.
Matplotlib unit registration
Previously, pandas would register converters with matplotlib as a side effect of importing pandas (GH 18720).
This changed the output of plots made via matplotlib plots after pandas was imported, even if you were using
matplotlib directly rather than plot().
To use pandas formatters with a matplotlib plot, specify
In [1]: import pandas as pd
In [2]: pd.options.plotting.matplotlib.register_converters = True
Note that plots created by DataFrame.plot() and Series.plot() do register the converters
automatically. The only behavior change is when plotting a date-like object via matplotlib.pyplot.plot
or matplotlib.Axes.plot. See Custom formatters for timeseries plots for more.
Other removals
- Removed the previously deprecated keyword “index” from - read_stata(),- StataReader, and- StataReader.read(), use “index_col” instead (GH 17328)
- Removed - StataReader.datamethod, use- StataReader.read()instead (GH 9493)
- Removed - pandas.plotting._matplotlib.tsplot, use- Series.plot()instead (GH 19980)
- pandas.tseries.converter.registerhas been moved to- pandas.plotting.register_matplotlib_converters()(GH 18307)
- Series.plot()no longer accepts positional arguments, pass keyword arguments instead (GH 30003)
- DataFrame.hist()and- Series.hist()no longer allows- figsize="default", specify figure size by passinig a tuple instead (GH 30003)
- Floordiv of integer-dtyped array by - Timedeltanow raises- TypeError(GH 21036)
- TimedeltaIndexand- DatetimeIndexno longer accept non-nanosecond dtype strings like “timedelta64” or “datetime64”, use “timedelta64[ns]” and “datetime64[ns]” instead (GH 24806)
- Changed the default “skipna” argument in - pandas.api.types.infer_dtype()from- Falseto- True(GH 24050)
- Removed - Series.ixand- DataFrame.ix(GH 26438)
- Removed - Index.summary(GH 18217)
- Removed the previously deprecated keyword “fastpath” from the - Indexconstructor (GH 23110)
- Removed - Series.get_value,- Series.set_value,- DataFrame.get_value,- DataFrame.set_value(GH 17739)
- Removed - Series.compoundand- DataFrame.compound(GH 26405)
- Changed the default “inplace” argument in - DataFrame.set_index()and- Series.set_axis()from- Noneto- False(GH 27600)
- Removed - Series.cat.categorical,- Series.cat.index,- Series.cat.name(GH 24751)
- Removed the previously deprecated keyword “box” from - to_datetime()and- to_timedelta(); in addition these now always returns- DatetimeIndex,- TimedeltaIndex,- Index,- Series, or- DataFrame(GH 24486)
- to_timedelta(),- Timedelta, and- TimedeltaIndexno longer allow “M”, “y”, or “Y” for the “unit” argument (GH 23264)
- Removed the previously deprecated keyword “time_rule” from (non-public) - offsets.generate_range, which has been moved to- core.arrays._ranges.generate_range()(GH 24157)
- DataFrame.loc()or- Series.loc()with listlike indexers and missing labels will no longer reindex (GH 17295)
- DataFrame.to_excel()and- Series.to_excel()with non-existent columns will no longer reindex (GH 17295)
- Removed the previously deprecated keyword “join_axes” from - concat(); use- reindex_likeon the result instead (GH 22318)
- Removed the previously deprecated keyword “by” from - DataFrame.sort_index(), use- DataFrame.sort_values()instead (GH 10726)
- Removed support for nested renaming in - DataFrame.aggregate(),- Series.aggregate(),- core.groupby.DataFrameGroupBy.aggregate(),- core.groupby.SeriesGroupBy.aggregate(),- core.window.rolling.Rolling.aggregate()(GH 18529)
- Passing - datetime64data to- TimedeltaIndexor- timedelta64data to- DatetimeIndexnow raises- TypeError(GH 23539, GH 23937)
- Passing - int64values to- DatetimeIndexand a timezone now interprets the values as nanosecond timestamps in UTC, not wall times in the given timezone (GH 24559)
- A tuple passed to - DataFrame.groupby()is now exclusively treated as a single key (GH 18314)
- Removed - Index.contains, use- key in indexinstead (GH 30103)
- Addition and subtraction of - intor integer-arrays is no longer allowed in- Timestamp,- DatetimeIndex,- TimedeltaIndex, use- obj + n * obj.freqinstead of- obj + n(GH 22535)
- Removed - Series.ptp(GH 21614)
- Removed - Series.from_array(GH 18258)
- Removed - DataFrame.from_items(GH 18458)
- Removed - DataFrame.as_matrix,- Series.as_matrix(GH 18458)
- Removed - Series.asobject(GH 18477)
- Removed - DataFrame.as_blocks,- Series.as_blocks,- DataFrame.blocks,- Series.blocks(GH 17656)
- pandas.Series.str.cat()now defaults to aligning- others, using- join='left'(GH 27611)
- pandas.Series.str.cat()does not accept list-likes within list-likes anymore (GH 27611)
- Series.where()with- Categoricaldtype (or- DataFrame.where()with- Categoricalcolumn) no longer allows setting new categories (GH 24114)
- Removed the previously deprecated keywords “start”, “end”, and “periods” from the - DatetimeIndex,- TimedeltaIndex, and- PeriodIndexconstructors; use- date_range(),- timedelta_range(), and- period_range()instead (GH 23919)
- Removed the previously deprecated keyword “verify_integrity” from the - DatetimeIndexand- TimedeltaIndexconstructors (GH 23919)
- Removed the previously deprecated keyword “fastpath” from - pandas.core.internals.blocks.make_block(GH 19265)
- Removed the previously deprecated keyword “dtype” from - Block.make_block_same_class()(GH 19434)
- Removed - ExtensionArray._formatting_values. Use- ExtensionArray._formatterinstead. (GH 23601)
- Removed - MultiIndex.to_hierarchical(GH 21613)
- Removed - MultiIndex.labels, use- MultiIndex.codesinstead (GH 23752)
- Removed the previously deprecated keyword “labels” from the - MultiIndexconstructor, use “codes” instead (GH 23752)
- Removed - MultiIndex.set_labels, use- MultiIndex.set_codes()instead (GH 23752)
- Removed the previously deprecated keyword “labels” from - MultiIndex.set_codes(),- MultiIndex.copy(),- MultiIndex.drop(), use “codes” instead (GH 23752)
- Removed support for legacy HDF5 formats (GH 29787) 
- Passing a dtype alias (e.g. ‘datetime64[ns, UTC]’) to - DatetimeTZDtypeis no longer allowed, use- DatetimeTZDtype.construct_from_string()instead (GH 23990)
- Removed the previously deprecated keyword “skip_footer” from - read_excel(); use “skipfooter” instead (GH 18836)
- read_excel()no longer allows an integer value for the parameter- usecols, instead pass a list of integers from 0 to- usecolsinclusive (GH 23635)
- Removed the previously deprecated keyword “convert_datetime64” from - DataFrame.to_records()(GH 18902)
- Removed - IntervalIndex.from_intervalsin favor of the- IntervalIndexconstructor (GH 19263)
- Changed the default “keep_tz” argument in - DatetimeIndex.to_series()from- Noneto- True(GH 23739)
- Removed - api.types.is_periodand- api.types.is_datetimetz(GH 23917)
- Ability to read pickles containing - Categoricalinstances created with pre-0.16 version of pandas has been removed (GH 27538)
- Removed - pandas.tseries.plotting.tsplot(GH 18627)
- Removed the previously deprecated keywords “reduce” and “broadcast” from - DataFrame.apply()(GH 18577)
- Removed the previously deprecated - assert_raises_regexfunction in- pandas._testing(GH 29174)
- Removed the previously deprecated - FrozenNDArrayclass in- pandas.core.indexes.frozen(GH 29335)
- Removed the previously deprecated keyword “nthreads” from - read_feather(), use “use_threads” instead (GH 23053)
- Removed - Index.is_lexsorted_for_tuple(GH 29305)
- Removed support for nested renaming in - DataFrame.aggregate(),- Series.aggregate(),- core.groupby.DataFrameGroupBy.aggregate(),- core.groupby.SeriesGroupBy.aggregate(),- core.window.rolling.Rolling.aggregate()(GH 29608)
- Removed - Series.valid; use- Series.dropna()instead (GH 18800)
- Removed - DataFrame.is_copy,- Series.is_copy(GH 18812)
- Removed - DataFrame.get_ftype_counts,- Series.get_ftype_counts(GH 18243)
- Removed - DataFrame.ftypes,- Series.ftypes,- Series.ftype(GH 26744)
- Removed - Index.get_duplicates, use- idx[idx.duplicated()].unique()instead (GH 20239)
- Removed - Series.clip_upper,- Series.clip_lower,- DataFrame.clip_upper,- DataFrame.clip_lower(GH 24203)
- Removed the ability to alter - DatetimeIndex.freq,- TimedeltaIndex.freq, or- PeriodIndex.freq(GH 20772)
- Removed - DatetimeIndex.offset(GH 20730)
- Removed - DatetimeIndex.asobject,- TimedeltaIndex.asobject,- PeriodIndex.asobject, use- astype(object)instead (GH 29801)
- Removed the previously deprecated keyword “order” from - factorize()(GH 19751)
- Removed the previously deprecated keyword “encoding” from - read_stata()and- DataFrame.to_stata()(GH 21400)
- Changed the default “sort” argument in - concat()from- Noneto- False(GH 20613)
- Removed the previously deprecated keyword “raise_conflict” from - DataFrame.update(), use “errors” instead (GH 23585)
- Removed the previously deprecated keyword “n” from - DatetimeIndex.shift(),- TimedeltaIndex.shift(),- PeriodIndex.shift(), use “periods” instead (GH 22458)
- Removed the previously deprecated keywords “how”, “fill_method”, and “limit” from - DataFrame.resample()(GH 30139)
- Passing an integer to - Series.fillna()or- DataFrame.fillna()with- timedelta64[ns]dtype now raises- TypeError(GH 24694)
- Passing multiple axes to - DataFrame.dropna()is no longer supported (GH 20995)
- Removed - Series.nonzero, use- to_numpy().nonzero()instead (GH 24048)
- Passing floating dtype - codesto- Categorical.from_codes()is no longer supported, pass- codes.astype(np.int64)instead (GH 21775)
- Removed the previously deprecated keyword “pat” from - Series.str.partition()and- Series.str.rpartition(), use “sep” instead (GH 23767)
- Removed - Series.put(GH 27106)
- Removed - Series.real,- Series.imag(GH 27106)
- Removed - Series.to_dense,- DataFrame.to_dense(GH 26684)
- Removed - Index.dtype_str, use- str(index.dtype)instead (GH 27106)
- Categorical.ravel()returns a- Categoricalinstead of a- ndarray(GH 27199)
- The ‘outer’ method on Numpy ufuncs, e.g. - np.subtract.outeroperating on- Seriesobjects is no longer supported, and will raise- NotImplementedError(GH 27198)
- Removed - Series.get_dtype_countsand- DataFrame.get_dtype_counts(GH 27145)
- Changed the default “fill_value” argument in - Categorical.take()from- Trueto- False(GH 20841)
- Changed the default value for the - rawargument in- Series.rolling().apply(),- DataFrame.rolling().apply(),- Series.expanding().apply(), and- DataFrame.expanding().apply()from- Noneto- False(GH 20584)
- Removed deprecated behavior of - Series.argmin()and- Series.argmax(), use- Series.idxmin()and- Series.idxmax()for the old behavior (GH 16955)
- Passing a tz-aware - datetime.datetimeor- Timestampinto the- Timestampconstructor with the- tzargument now raises a- ValueError(GH 23621)
- Removed - Series.base,- Index.base,- Categorical.base,- Series.flags,- Index.flags,- PeriodArray.flags,- Series.strides,- Index.strides,- Series.itemsize,- Index.itemsize,- Series.data,- Index.data(GH 20721)
- Changed - Timedelta.resolution()to match the behavior of the standard library- datetime.timedelta.resolution, for the old behavior, use- Timedelta.resolution_string()(GH 26839)
- Removed - Timestamp.weekday_name,- DatetimeIndex.weekday_name, and- Series.dt.weekday_name(GH 18164)
- Removed the previously deprecated keyword “errors” in - Timestamp.tz_localize(),- DatetimeIndex.tz_localize(), and- Series.tz_localize()(GH 22644)
- Changed the default “ordered” argument in - CategoricalDtypefrom- Noneto- False(GH 26336)
- Series.set_axis()and- DataFrame.set_axis()now require “labels” as the first argument and “axis” as an optional named parameter (GH 30089)
- Removed - to_msgpack,- read_msgpack,- DataFrame.to_msgpack,- Series.to_msgpack(GH 27103)
- Removed - Series.compress(GH 21930)
- Removed the previously deprecated keyword “fill_value” from - Categorical.fillna(), use “value” instead (GH 19269)
- Removed the previously deprecated keyword “data” from - andrews_curves(), use “frame” instead (GH 6956)
- Removed the previously deprecated keyword “data” from - parallel_coordinates(), use “frame” instead (GH 6956)
- Removed the previously deprecated keyword “colors” from - parallel_coordinates(), use “color” instead (GH 6956)
- Removed the previously deprecated keywords “verbose” and “private_key” from - read_gbq()(GH 30200)
- Calling - np.arrayand- np.asarrayon tz-aware- Seriesand- DatetimeIndexwill now return an object array of tz-aware- Timestamp(GH 24596)
Performance improvements#
- Performance improvement in - DataFramearithmetic and comparison operations with scalars (GH 24990, GH 29853)
- Performance improvement in indexing with a non-unique - IntervalIndex(GH 27489)
- Performance improvement in - MultiIndex.is_monotonic(GH 27495)
- Performance improvement in - cut()when- binsis an- IntervalIndex(GH 27668)
- Performance improvement when initializing a - DataFrameusing a- range(GH 30171)
- Performance improvement in - DataFrame.corr()when- methodis- "spearman"(GH 28139)
- Performance improvement in - DataFrame.replace()when provided a list of values to replace (GH 28099)
- Performance improvement in - DataFrame.select_dtypes()by using vectorization instead of iterating over a loop (GH 28317)
- Performance improvement in - Categorical.searchsorted()and- CategoricalIndex.searchsorted()(GH 28795)
- Performance improvement when comparing a - Categoricalwith a scalar and the scalar is not found in the categories (GH 29750)
- Performance improvement when checking if values in a - Categoricalare equal, equal or larger or larger than a given scalar. The improvement is not present if checking if the- Categoricalis less than or less than or equal than the scalar (GH 29820)
- Performance improvement in - Index.equals()and- MultiIndex.equals()(GH 29134)
- Performance improvement in - infer_dtype()when- skipnais- True(GH 28814)
Bug fixes#
Categorical#
- Added test to assert the - fillna()raises the correct- ValueErrormessage when the value isn’t a value from categories (GH 13628)
- Bug in - Categorical.astype()where- NaNvalues were handled incorrectly when casting to int (GH 28406)
- DataFrame.reindex()with a- CategoricalIndexwould fail when the targets contained duplicates, and wouldn’t fail if the source contained duplicates (GH 28107)
- Bug in - Categorical.astype()not allowing for casting to extension dtypes (GH 28668)
- Bug where - merge()was unable to join on categorical and extension dtype columns (GH 28668)
- Categorical.searchsorted()and- CategoricalIndex.searchsorted()now work on unordered categoricals also (GH 21667)
- Added test to assert roundtripping to parquet with - DataFrame.to_parquet()or- read_parquet()will preserve Categorical dtypes for string types (GH 27955)
- Changed the error message in - Categorical.remove_categories()to always show the invalid removals as a set (GH 28669)
- Using date accessors on a categorical dtyped - Seriesof datetimes was not returning an object of the same type as if one used the- str.()/- dt.()on a- Seriesof that type. E.g. when accessing- Series.dt.tz_localize()on a- Categoricalwith duplicate entries, the accessor was skipping duplicates (GH 27952)
- Bug in - DataFrame.replace()and- Series.replace()that would give incorrect results on categorical data (GH 26988)
- Bug where calling - Categorical.min()or- Categorical.max()on an empty Categorical would raise a numpy exception (GH 30227)
- The following methods now also correctly output values for unobserved categories when called through - groupby(..., observed=False)(GH 17605) *- core.groupby.SeriesGroupBy.count()*- core.groupby.SeriesGroupBy.size()*- core.groupby.SeriesGroupBy.nunique()*- core.groupby.SeriesGroupBy.nth()
Datetimelike#
- Bug in - Series.__setitem__()incorrectly casting- np.timedelta64("NaT")to- np.datetime64("NaT")when inserting into a- Serieswith datetime64 dtype (GH 27311)
- Bug in - Series.dt()property lookups when the underlying data is read-only (GH 27529)
- Bug in - HDFStore.__getitem__incorrectly reading tz attribute created in Python 2 (GH 26443)
- Bug in - to_datetime()where passing arrays of malformed- strwith errors=”coerce” could incorrectly lead to raising- ValueError(GH 28299)
- Bug in - core.groupby.SeriesGroupBy.nunique()where- NaTvalues were interfering with the count of unique values (GH 27951)
- Bug in - Timestampsubtraction when subtracting a- Timestampfrom a- np.datetime64object incorrectly raising- TypeError(GH 28286)
- Addition and subtraction of integer or integer-dtype arrays with - Timestampwill now raise- NullFrequencyErrorinstead of- ValueError(GH 28268)
- Bug in - Seriesand- DataFramewith integer dtype failing to raise- TypeErrorwhen adding or subtracting a- np.datetime64object (GH 28080)
- Bug in - Series.astype(),- Index.astype(), and- DataFrame.astype()failing to handle- NaTwhen casting to an integer dtype (GH 28492)
- Bug in - Weekwith- weekdayincorrectly raising- AttributeErrorinstead of- TypeErrorwhen adding or subtracting an invalid type (GH 28530)
- Bug in - DataFramearithmetic operations when operating with a- Serieswith dtype- 'timedelta64[ns]'(GH 28049)
- Bug in - core.groupby.generic.SeriesGroupBy.apply()raising- ValueErrorwhen a column in the original DataFrame is a datetime and the column labels are not standard integers (GH 28247)
- Bug in - pandas._config.localization.get_locales()where the- locales -aencodes the locales list as windows-1252 (GH 23638, GH 24760, GH 27368)
- Bug in - Series.var()failing to raise- TypeErrorwhen called with- timedelta64[ns]dtype (GH 28289)
- Bug in - DatetimeIndex.strftime()and- Series.dt.strftime()where- NaTwas converted to the string- 'NaT'instead of- np.nan(GH 29578)
- Bug in masking datetime-like arrays with a boolean mask of an incorrect length not raising an - IndexError(GH 30308)
- Bug in - Timestamp.resolutionbeing a property instead of a class attribute (GH 29910)
- Bug in - pandas.to_datetime()when called with- Noneraising- TypeErrorinstead of returning- NaT(GH 30011)
- Bug in - pandas.to_datetime()failing for- dequeswhen using- cache=True(the default) (GH 29403)
- Bug in - Series.item()with- datetime64or- timedelta64dtype,- DatetimeIndex.item(), and- TimedeltaIndex.item()returning an integer instead of a- Timestampor- Timedelta(GH 30175)
- Bug in - DatetimeIndexaddition when adding a non-optimized- DateOffsetincorrectly dropping timezone information (GH 30336)
- Bug in - DataFrame.drop()where attempting to drop non-existent values from a DatetimeIndex would yield a confusing error message (GH 30399)
- Bug in - DataFrame.append()would remove the timezone-awareness of new data (GH 30238)
- Bug in - Series.cummin()and- Series.cummax()with timezone-aware dtype incorrectly dropping its timezone (GH 15553)
- Bug in - DatetimeArray,- TimedeltaArray, and- PeriodArraywhere inplace addition and subtraction did not actually operate inplace (GH 24115)
- Bug in - pandas.to_datetime()when called with- Seriesstoring- IntegerArrayraising- TypeErrorinstead of returning- Series(GH 30050)
- Bug in - date_range()with custom business hours as- freqand given number of- periods(GH 30593)
- Bug in - PeriodIndexcomparisons with incorrectly casting integers to- Periodobjects, inconsistent with the- Periodcomparison behavior (GH 30722)
- Bug in - DatetimeIndex.insert()raising a- ValueErrorinstead of a- TypeErrorwhen trying to insert a timezone-aware- Timestampinto a timezone-naive- DatetimeIndex, or vice-versa (GH 30806)
Timedelta#
- Bug in subtracting a - TimedeltaIndexor- TimedeltaArrayfrom a- np.datetime64object (GH 29558)
Timezones#
Numeric#
- Bug in - DataFrame.quantile()with zero-column- DataFrameincorrectly raising (GH 23925)
- DataFrameflex inequality comparisons methods (- DataFrame.lt(),- DataFrame.le(),- DataFrame.gt(),- DataFrame.ge()) with object-dtype and- complexentries failing to raise- TypeErrorlike their- Seriescounterparts (GH 28079)
- Bug in - DataFramelogical operations (- &,- |,- ^) not matching- Seriesbehavior by filling NA values (GH 28741)
- Bug in - DataFrame.interpolate()where specifying axis by name references variable before it is assigned (GH 29142)
- Bug in - Series.var()not computing the right value with a nullable integer dtype series not passing through ddof argument (GH 29128)
- Improved error message when using - frac> 1 and- replace= False (GH 27451)
- Bug in numeric indexes resulted in it being possible to instantiate an - Int64Index,- UInt64Index, or- Float64Indexwith an invalid dtype (e.g. datetime-like) (GH 29539)
- Bug in - UInt64Indexprecision loss while constructing from a list with values in the- np.uint64range (GH 29526)
- Bug in - NumericIndexconstruction that caused indexing to fail when integers in the- np.uint64range were used (GH 28023)
- Bug in - NumericIndexconstruction that caused- UInt64Indexto be casted to- Float64Indexwhen integers in the- np.uint64range were used to index a- DataFrame(GH 28279)
- Bug in - Series.interpolate()when using method=`index` with an unsorted index, would previously return incorrect results. (GH 21037)
- Bug in - DataFrame.round()where a- DataFramewith a- CategoricalIndexof- IntervalIndexcolumns would incorrectly raise a- TypeError(GH 30063)
- Bug in - Series.pct_change()and- DataFrame.pct_change()when there are duplicated indices (GH 30463)
- Bug in - DataFramecumulative operations (e.g. cumsum, cummax) incorrect casting to object-dtype (GH 19296)
- Bug in - DataFrame.diffraising an- IndexErrorwhen one of the columns was a nullable integer dtype (GH 30967)
Conversion#
Strings#
- Calling - Series.str.isalnum()(and other “ismethods”) on an empty- Serieswould return an- objectdtype instead of- bool(GH 29624)
Interval#
- Bug in - IntervalIndex.get_indexer()where a- Categoricalor- CategoricalIndex- targetwould incorrectly raise a- TypeError(GH 30063)
- Bug in - pandas.core.dtypes.cast.infer_dtype_from_scalarwhere passing- pandas_dtype=Truedid not infer- IntervalDtype(GH 30337)
- Bug in - Seriesconstructor where constructing a- Seriesfrom a- listof- Intervalobjects resulted in- objectdtype instead of- IntervalDtype(GH 23563)
- Bug in - IntervalDtypewhere the- kindattribute was incorrectly set as- Noneinstead of- "O"(GH 30568)
- Bug in - IntervalIndex,- IntervalArray, and- Serieswith interval data where equality comparisons were incorrect (GH 24112)
Indexing#
- Bug in assignment using a reverse slicer (GH 26939) 
- Bug in - DataFrame.explode()would duplicate frame in the presence of duplicates in the index (GH 28010)
- Bug in reindexing a - PeriodIndex()with another type of index that contained a- Period(GH 28323) (GH 28337)
- Fix assignment of column via - .locwith numpy non-ns datetime type (GH 27395)
- Bug in - Float64Index.astype()where- np.infwas not handled properly when casting to an integer dtype (GH 28475)
- Index.union()could fail when the left contained duplicates (GH 28257)
- Bug when indexing with - .locwhere the index was a- CategoricalIndexwith non-string categories didn’t work (GH 17569, GH 30225)
- Index.get_indexer_non_unique()could fail with- TypeErrorin some cases, such as when searching for ints in a string index (GH 28257)
- Bug in - Float64Index.get_loc()incorrectly raising- TypeErrorinstead of- KeyError(GH 29189)
- Bug in - DataFrame.loc()with incorrect dtype when setting Categorical value in 1-row DataFrame (GH 25495)
- MultiIndex.get_loc()can’t find missing values when input includes missing values (GH 19132)
- Bug in - Series.__setitem__()incorrectly assigning values with boolean indexer when the length of new data matches the number of- Truevalues and new data is not a- Seriesor an- np.array(GH 30567)
- Bug in indexing with a - PeriodIndexincorrectly accepting integers representing years, use e.g.- ser.loc["2007"]instead of- ser.loc[2007](GH 30763)
Missing#
MultiIndex#
- Constructor for - MultiIndexverifies that the given- sortorderis compatible with the actual- lexsort_depthif- verify_integrityparameter is- True(the default) (GH 28735)
- Series and MultiIndex - .dropwith- MultiIndexraise exception if labels not in given in level (GH 8594)
IO#
- read_csv()now accepts binary mode file buffers when using the Python csv engine (GH 23779)
- Bug in - DataFrame.to_json()where using a Tuple as a column or index value and using- orient="columns"or- orient="index"would produce invalid JSON (GH 20500)
- Improve infinity parsing. - read_csv()now interprets- Infinity,- +Infinity,- -Infinityas floating point values (GH 10065)
- Bug in - DataFrame.to_csv()where values were truncated when the length of- na_repwas shorter than the text input data. (GH 25099)
- Bug in - DataFrame.to_string()where values were truncated using display options instead of outputting the full content (GH 9784)
- Bug in - DataFrame.to_json()where a datetime column label would not be written out in ISO format with- orient="table"(GH 28130)
- Bug in - DataFrame.to_parquet()where writing to GCS would fail with- engine='fastparquet'if the file did not already exist (GH 28326)
- Bug in - read_hdf()closing stores that it didn’t open when Exceptions are raised (GH 28699)
- Bug in - DataFrame.read_json()where using- orient="index"would not maintain the order (GH 28557)
- Bug in - DataFrame.to_html()where the length of the- formattersargument was not verified (GH 28469)
- Bug in - DataFrame.read_excel()with- engine='ods'when- sheet_nameargument references a non-existent sheet (GH 27676)
- Bug in - pandas.io.formats.style.Styler()formatting for floating values not displaying decimals correctly (GH 13257)
- Bug in - DataFrame.to_html()when using- formatters=<list>and- max_colstogether. (GH 25955)
- Bug in - Styler.background_gradient()not able to work with dtype- Int64(GH 28869)
- Bug in - DataFrame.to_clipboard()which did not work reliably in ipython (GH 22707)
- Bug in - read_json()where default encoding was not set to- utf-8(GH 29565)
- Bug in - PythonParserwhere str and bytes were being mixed when dealing with the decimal field (GH 29650)
- read_gbq()now accepts- progress_bar_typeto display progress bar while the data downloads. (GH 29857)
- Bug in - pandas.io.json.json_normalize()where a missing value in the location specified by- record_pathwould raise a- TypeError(GH 30148)
- read_excel()now accepts binary data (GH 15914)
- Bug in - read_csv()in which encoding handling was limited to just the string- utf-16for the C engine (GH 24130)
Plotting#
- Bug in - Series.plot()not able to plot boolean values (GH 23719)
- Bug in - DataFrame.plot()not able to plot when no rows (GH 27758)
- Bug in - DataFrame.plot()producing incorrect legend markers when plotting multiple series on the same axis (GH 18222)
- Bug in - DataFrame.plot()when- kind='box'and data contains datetime or timedelta data. These types are now automatically dropped (GH 22799)
- Bug in - DataFrame.plot.line()and- DataFrame.plot.area()produce wrong xlim in x-axis (GH 27686, GH 25160, GH 24784)
- Bug where - DataFrame.boxplot()would not accept a- colorparameter like- DataFrame.plot.box()(GH 26214)
- Bug in the - xticksargument being ignored for- DataFrame.plot.bar()(GH 14119)
- set_option()now validates that the plot backend provided to- 'plotting.backend'implements the backend when the option is set, rather than when a plot is created (GH 28163)
- DataFrame.plot()now allow a- backendkeyword argument to allow changing between backends in one session (GH 28619).
- Bug in color validation incorrectly raising for non-color styles (GH 29122). 
- Allow - DataFrame.plot.scatter()to plot- objectsand- datetimetype data (GH 18755, GH 30391)
- Bug in - DataFrame.hist(),- xrot=0does not work with- byand subplots (GH 30288).
GroupBy/resample/rolling#
- Bug in - core.groupby.DataFrameGroupBy.apply()only showing output from a single group when function returns an- Index(GH 28652)
- Bug in - DataFrame.groupby()with multiple groups where an- IndexErrorwould be raised if any group contained all NA values (GH 20519)
- Bug in - pandas.core.resample.Resampler.size()and- pandas.core.resample.Resampler.count()returning wrong dtype when used with an empty- Seriesor- DataFrame(GH 28427)
- Bug in - DataFrame.rolling()not allowing for rolling over datetimes when- axis=1(GH 28192)
- Bug in - DataFrame.rolling()not allowing rolling over multi-index levels (GH 15584).
- Bug in - DataFrame.rolling()not allowing rolling on monotonic decreasing time indexes (GH 19248).
- Bug in - DataFrame.groupby()not offering selection by column name when- axis=1(GH 27614)
- Bug in - core.groupby.DataFrameGroupby.agg()not able to use lambda function with named aggregation (GH 27519)
- Bug in - DataFrame.groupby()losing column name information when grouping by a categorical column (GH 28787)
- Remove error raised due to duplicated input functions in named aggregation in - DataFrame.groupby()and- Series.groupby(). Previously error will be raised if the same function is applied on the same column and now it is allowed if new assigned names are different. (GH 28426)
- core.groupby.SeriesGroupBy.value_counts()will be able to handle the case even when the- Groupermakes empty groups (GH 28479)
- Bug in - core.window.rolling.Rolling.quantile()ignoring- interpolationkeyword argument when used within a groupby (GH 28779)
- Bug in - DataFrame.groupby()where- any,- all,- nuniqueand transform functions would incorrectly handle duplicate column labels (GH 21668)
- Bug in - core.groupby.DataFrameGroupBy.agg()with timezone-aware datetime64 column incorrectly casting results to the original dtype (GH 29641)
- Bug in - DataFrame.groupby()when using axis=1 and having a single level columns index (GH 30208)
- Bug in - DataFrame.groupby()when using nunique on axis=1 (GH 30253)
- Bug in - DataFrameGroupBy.quantile()and- SeriesGroupBy.quantile()with multiple list-like q value and integer column names (GH 30289)
- Bug in - DataFrameGroupBy.pct_change()and- SeriesGroupBy.pct_change()causes- TypeErrorwhen- fill_methodis- None(GH 30463)
- Bug in - Rolling.count()and- Expanding.count()argument where- min_periodswas ignored (GH 26996)
Reshaping#
- Bug in - DataFrame.apply()that caused incorrect output with empty- DataFrame(GH 28202, GH 21959)
- Bug in - DataFrame.stack()not handling non-unique indexes correctly when creating MultiIndex (GH 28301)
- Bug in - pivot_table()not returning correct type- floatwhen- margins=Trueand- aggfunc='mean'(GH 24893)
- Bug - merge_asof()could not use- datetime.timedeltafor- tolerancekwarg (GH 28098)
- Bug in - merge(), did not append suffixes correctly with MultiIndex (GH 28518)
- Fix to ensure all int dtypes can be used in - merge_asof()when using a tolerance value. Previously every non-int64 type would raise an erroneous- MergeError(GH 28870).
- Better error message in - get_dummies()when- columnsisn’t a list-like value (GH 28383)
- Bug in - Index.join()that caused infinite recursion error for mismatched- MultiIndexname orders. (GH 25760, GH 28956)
- Bug - Series.pct_change()where supplying an anchored frequency would throw a- ValueError(GH 28664)
- Bug where - DataFrame.equals()returned True incorrectly in some cases when two DataFrames had the same columns in different orders (GH 28839)
- Bug in - DataFrame.replace()that caused non-numeric replacer’s dtype not respected (GH 26632)
- Bug in - melt()where supplying mixed strings and numeric values for- id_varsor- value_varswould incorrectly raise a- ValueError(GH 29718)
- Dtypes are now preserved when transposing a - DataFramewhere each column is the same extension dtype (GH 30091)
- Bug in - merge_asof()merging on a tz-aware- left_indexand- right_ona tz-aware column (GH 29864)
- Improved error message and docstring in - cut()and- qcut()when- labels=True(GH 13318)
- Bug in missing - fill_naparameter to- DataFrame.unstack()with list of levels (GH 30740)
Sparse#
- Bug in - SparseDataFramearithmetic operations incorrectly casting inputs to float (GH 28107)
- Bug in - DataFrame.sparsereturning a- Serieswhen there was a column named- sparserather than the accessor (GH 30758)
- Fixed - operator.xor()with a boolean-dtype- SparseArray. Now returns a sparse result, rather than object dtype (GH 31025)
ExtensionArray#
Other#
- Trying to set the - display.precision,- display.max_rowsor- display.max_columnsusing- set_option()to anything but a- Noneor a positive int will raise a- ValueError(GH 23348)
- Using - DataFrame.replace()with overlapping keys in a nested dictionary will no longer raise, now matching the behavior of a flat dictionary (GH 27660)
- DataFrame.to_csv()and- Series.to_csv()now support dicts as- compressionargument with key- 'method'being the compression method and others as additional compression options when the compression method is- 'zip'. (GH 26023)
- Bug in - Series.diff()where a boolean series would incorrectly raise a- TypeError(GH 17294)
- Series.append()will no longer raise a- TypeErrorwhen passed a tuple of- Series(GH 28410)
- Fix corrupted error message when calling - pandas.libs._json.encode()on a 0d array (GH 18878)
- Backtick quoting in - DataFrame.query()and- DataFrame.eval()can now also be used to use invalid identifiers like names that start with a digit, are python keywords, or are using single character operators. (GH 27017)
- Bug in - pd.core.util.hashing.hash_pandas_objectwhere arrays containing tuples were incorrectly treated as non-hashable (GH 28969)
- Bug in - DataFrame.append()that raised- IndexErrorwhen appending with empty list (GH 28769)
- Fix - AbstractHolidayCalendarto return correct results for years after 2030 (now goes up to 2200) (GH 27790)
- Fixed - IntegerArrayreturning- infrather than- NaNfor operations dividing by- 0(GH 27398)
- Fixed - powoperations for- IntegerArraywhen the other value is- 0or- 1(GH 29997)
- Bug in - Series.count()raises if use_inf_as_na is enabled (GH 29478)
- Bug in - Indexwhere a non-hashable name could be set without raising- TypeError(GH 29069)
- Bug in - DataFrameconstructor when passing a 2D- ndarrayand an extension dtype (GH 12513)
- Bug in - DataFrame.to_csv()when supplied a series with a- dtype="string"and a- na_rep, the- na_repwas being truncated to 2 characters. (GH 29975)
- Bug where - DataFrame.itertuples()would incorrectly determine whether or not namedtuples could be used for dataframes of 255 columns (GH 28282)
- Handle nested NumPy - objectarrays in- testing.assert_series_equal()for ExtensionArray implementations (GH 30841)
- Bug in - Indexconstructor incorrectly allowing 2-dimensional input arrays (GH 13601, GH 27125)
Contributors#
A total of 308 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Aaditya Panikath + 
- Abdullah İhsan Seçer 
- Abhijeet Krishnan + 
- Adam J. Stewart 
- Adam Klaum + 
- Addison Lynch 
- Aivengoe + 
- Alastair James + 
- Albert Villanova del Moral 
- Alex Kirko + 
- Alfredo Granja + 
- Allen Downey 
- Alp Arıbal + 
- Andreas Buhr + 
- Andrew Munch + 
- Andy 
- Angela Ambroz + 
- Aniruddha Bhattacharjee + 
- Ankit Dhankhar + 
- Antonio Andraues Jr + 
- Arda Kosar + 
- Asish Mahapatra + 
- Austin Hackett + 
- Avi Kelman + 
- AyowoleT + 
- Bas Nijholt + 
- Ben Thayer 
- Bharat Raghunathan 
- Bhavani Ravi 
- Bhuvana KA + 
- Big Head 
- Blake Hawkins + 
- Bobae Kim + 
- Brett Naul 
- Brian Wignall 
- Bruno P. Kinoshita + 
- Bryant Moscon + 
- Cesar H + 
- Chris Stadler 
- Chris Zimmerman + 
- Christopher Whelan 
- Clemens Brunner 
- Clemens Tolboom + 
- Connor Charles + 
- Daniel Hähnke + 
- Daniel Saxton 
- Darin Plutchok + 
- Dave Hughes 
- David Stansby 
- DavidRosen + 
- Dean + 
- Deepan Das + 
- Deepyaman Datta 
- DorAmram + 
- Dorothy Kabarozi + 
- Drew Heenan + 
- Eliza Mae Saret + 
- Elle + 
- Endre Mark Borza + 
- Eric Brassell + 
- Eric Wong + 
- Eunseop Jeong + 
- Eyden Villanueva + 
- Felix Divo 
- ForTimeBeing + 
- Francesco Truzzi + 
- Gabriel Corona + 
- Gabriel Monteiro + 
- Galuh Sahid + 
- Georgi Baychev + 
- Gina 
- GiuPassarelli + 
- Grigorios Giannakopoulos + 
- Guilherme Leite + 
- Guilherme Salomé + 
- Gyeongjae Choi + 
- Harshavardhan Bachina + 
- Harutaka Kawamura + 
- Hassan Kibirige 
- Hielke Walinga 
- Hubert 
- Hugh Kelley + 
- Ian Eaves + 
- Ignacio Santolin + 
- Igor Filippov + 
- Irv Lustig 
- Isaac Virshup + 
- Ivan Bessarabov + 
- JMBurley + 
- Jack Bicknell + 
- Jacob Buckheit + 
- Jan Koch 
- Jan Pipek + 
- Jan Škoda + 
- Jan-Philip Gehrcke 
- Jasper J.F. van den Bosch + 
- Javad + 
- Jeff Reback 
- Jeremy Schendel 
- Jeroen Kant + 
- Jesse Pardue + 
- Jethro Cao + 
- Jiang Yue 
- Jiaxiang + 
- Jihyung Moon + 
- Jimmy Callin 
- Jinyang Zhou + 
- Joao Victor Martinelli + 
- Joaq Almirante + 
- John G Evans + 
- John Ward + 
- Jonathan Larkin + 
- Joris Van den Bossche 
- Josh Dimarsky + 
- Joshua Smith + 
- Josiah Baker + 
- Julia Signell + 
- Jung Dong Ho + 
- Justin Cole + 
- Justin Zheng 
- Kaiqi Dong 
- Karthigeyan + 
- Katherine Younglove + 
- Katrin Leinweber 
- Kee Chong Tan + 
- Keith Kraus + 
- Kevin Nguyen + 
- Kevin Sheppard 
- Kisekka David + 
- Koushik + 
- Kyle Boone + 
- Kyle McCahill + 
- Laura Collard, PhD + 
- LiuSeeker + 
- Louis Huynh + 
- Lucas Scarlato Astur + 
- Luiz Gustavo + 
- Luke + 
- Luke Shepard + 
- MKhalusova + 
- Mabel Villalba 
- Maciej J + 
- Mak Sze Chun 
- Manu NALEPA + 
- Marc 
- Marc Garcia 
- Marco Gorelli + 
- Marco Neumann + 
- Martin Winkel + 
- Martina G. Vilas + 
- Mateusz + 
- Matthew Roeschke 
- Matthew Tan + 
- Max Bolingbroke 
- Max Chen + 
- MeeseeksMachine 
- Miguel + 
- MinGyo Jung + 
- Mohamed Amine ZGHAL + 
- Mohit Anand + 
- MomIsBestFriend + 
- Naomi Bonnin + 
- Nathan Abel + 
- Nico Cernek + 
- Nigel Markey + 
- Noritada Kobayashi + 
- Oktay Sabak + 
- Oliver Hofkens + 
- Oluokun Adedayo + 
- Osman + 
- Oğuzhan Öğreden + 
- Pandas Development Team + 
- Patrik Hlobil + 
- Paul Lee + 
- Paul Siegel + 
- Petr Baev + 
- Pietro Battiston 
- Prakhar Pandey + 
- Puneeth K + 
- Raghav + 
- Rajat + 
- Rajhans Jadhao + 
- Rajiv Bharadwaj + 
- Rik-de-Kort + 
- Roei.r 
- Rohit Sanjay + 
- Ronan Lamy + 
- Roshni + 
- Roymprog + 
- Rushabh Vasani + 
- Ryan Grout + 
- Ryan Nazareth 
- Samesh Lakhotia + 
- Samuel Sinayoko 
- Samyak Jain + 
- Sarah Donehower + 
- Sarah Masud + 
- Saul Shanabrook + 
- Scott Cole + 
- SdgJlbl + 
- Seb + 
- Sergei Ivko + 
- Shadi Akiki 
- Shorokhov Sergey 
- Siddhesh Poyarekar + 
- Sidharthan Nair + 
- Simon Gibbons 
- Simon Hawkins 
- Simon-Martin Schröder + 
- Sofiane Mahiou + 
- Sourav kumar + 
- Souvik Mandal + 
- Soyoun Kim + 
- Sparkle Russell-Puleri + 
- Srinivas Reddy Thatiparthy (శ్రీనివాస్ రెడ్డి తాటిపర్తి) 
- Stuart Berg + 
- Sumanau Sareen 
- Szymon Bednarek + 
- Tambe Tabitha Achere + 
- Tan Tran 
- Tang Heyi + 
- Tanmay Daripa + 
- Tanya Jain 
- Terji Petersen 
- Thomas Li + 
- Tirth Jain + 
- Tola A + 
- Tom Augspurger 
- Tommy Lynch + 
- Tomoyuki Suzuki + 
- Tony Lorenzo 
- Unprocessable + 
- Uwe L. Korn 
- Vaibhav Vishal 
- Victoria Zdanovskaya + 
- Vijayant + 
- Vishwak Srinivasan + 
- WANG Aiyong 
- Wenhuan 
- Wes McKinney 
- Will Ayd 
- Will Holmgren 
- William Ayd 
- William Blan + 
- Wouter Overmeire 
- Wuraola Oyewusi + 
- YaOzI + 
- Yash Shukla + 
- Yu Wang + 
- Yusei Tahara + 
- alexander135 + 
- alimcmaster1 
- avelineg + 
- bganglia + 
- bolkedebruin 
- bravech + 
- chinhwee + 
- cruzzoe + 
- dalgarno + 
- daniellebrown + 
- danielplawrence 
- est271 + 
- francisco souza + 
- ganevgv + 
- garanews + 
- gfyoung 
- h-vetinari 
- hasnain2808 + 
- ianzur + 
- jalbritt + 
- jbrockmendel 
- jeschwar + 
- jlamborn324 + 
- joy-rosie + 
- kernc 
- killerontherun1 
- krey + 
- lexy-lixinyu + 
- lucyleeow + 
- lukasbk + 
- maheshbapatu + 
- mck619 + 
- nathalier 
- naveenkaushik2504 + 
- nlepleux + 
- nrebena 
- ohad83 + 
- pilkibun 
- pqzx + 
- proost + 
- pv8493013j + 
- qudade + 
- rhstanton + 
- rmunjal29 + 
- sangarshanan + 
- sardonick + 
- saskakarsi + 
- shaido987 + 
- ssikdar1 
- steveayers124 + 
- tadashigaki + 
- timcera + 
- tlaytongoogle + 
- tobycheese 
- tonywu1999 + 
- tsvikas + 
- yogendrasoni + 
- zys5945 +