.. _whatsnew_0210: Version 0.21.0 (October 27, 2017) --------------------------------- {{ header }} .. ipython:: python :suppress: from pandas import * # noqa F401, F403 This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. Highlights include: - Integration with `Apache Parquet `__, including a new top-level :func:`read_parquet` function and :meth:`DataFrame.to_parquet` method, see :ref:`here `. - New user-facing :class:`pandas.api.types.CategoricalDtype` for specifying categoricals independent of the data, see :ref:`here `. - The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames is now consistent and no longer depends on whether `bottleneck `__ is installed, and ``sum`` and ``prod`` on empty Series now return NaN instead of 0, see :ref:`here `. - Compatibility fixes for pypy, see :ref:`here `. - Additions to the ``drop``, ``reindex`` and ``rename`` API to make them more consistent, see :ref:`here `. - Addition of the new methods ``DataFrame.infer_objects`` (see :ref:`here `) and ``GroupBy.pipe`` (see :ref:`here `). - Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see :ref:`here `. Check the :ref:`API Changes ` and :ref:`deprecations ` before updating. .. contents:: What's new in v0.21.0 :local: :backlinks: none :depth: 2 .. _whatsnew_0210.enhancements: New features ~~~~~~~~~~~~ .. _whatsnew_0210.enhancements.parquet: Integration with Apache Parquet file format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Integration with `Apache Parquet `__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here ` (:issue:`15838`, :issue:`17438`). `Apache Parquet `__ provides a cross-language, binary file format for reading and writing data frames efficiently. Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, supporting all of the pandas dtypes, including extension dtypes such as datetime with timezones. This functionality depends on either the `pyarrow `__ or `fastparquet `__ library. For more details, see :ref:`the IO docs on Parquet `. .. _whatsnew_0210.enhancements.infer_objects: Method ``infer_objects`` type conversion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :meth:`DataFrame.infer_objects` and :meth:`Series.infer_objects` methods have been added to perform dtype inference on object columns, replacing some of the functionality of the deprecated ``convert_objects`` method. See the documentation :ref:`here ` for more details. (:issue:`11221`) This method only performs soft conversions on object columns, converting Python objects to native types, but not any coercive conversions. For example: .. ipython:: python df = pd.DataFrame({'A': [1, 2, 3], 'B': np.array([1, 2, 3], dtype='object'), 'C': ['1', '2', '3']}) df.dtypes df.infer_objects().dtypes Note that column ``'C'`` was not converted - only scalar numeric types will be converted to a new type. Other types of conversion should be accomplished using the :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`). .. ipython:: python df = df.infer_objects() df['C'] = pd.to_numeric(df['C'], errors='coerce') df.dtypes .. _whatsnew_0210.enhancements.attribute_access: Improved warnings when attempting to create columns ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ New users are often puzzled by the relationship between column operations and attribute access on ``DataFrame`` instances (:issue:`7175`). One specific instance of this confusion is attempting to create a new column by setting an attribute on the ``DataFrame``: .. code-block:: ipython In [1]: df = pd.DataFrame({'one': [1., 2., 3.]}) In [2]: df.two = [4, 5, 6] This does not raise any obvious exceptions, but also does not create a new column: .. code-block:: ipython In [3]: df Out[3]: one 0 1.0 1 2.0 2 3.0 Setting a list-like data structure into a new attribute now raises a ``UserWarning`` about the potential for unexpected behavior. See :ref:`Attribute Access `. .. _whatsnew_0210.enhancements.drop_api: Method ``drop`` now also accepts index/columns keywords ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :meth:`~DataFrame.drop` method has gained ``index``/``columns`` keywords as an alternative to specifying the ``axis``. This is similar to the behavior of ``reindex`` (:issue:`12392`). For example: .. ipython:: python df = pd.DataFrame(np.arange(8).reshape(2, 4), columns=['A', 'B', 'C', 'D']) df df.drop(['B', 'C'], axis=1) # the following is now equivalent df.drop(columns=['B', 'C']) .. _whatsnew_0210.enhancements.rename_reindex_axis: Methods ``rename``, ``reindex`` now also accept axis keyword ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :meth:`DataFrame.rename` and :meth:`DataFrame.reindex` methods have gained the ``axis`` keyword to specify the axis to target with the operation (:issue:`12392`). Here's ``rename``: .. ipython:: python df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) df.rename(str.lower, axis='columns') df.rename(id, axis='index') And ``reindex``: .. ipython:: python df.reindex(['A', 'B', 'C'], axis='columns') df.reindex([0, 1, 3], axis='index') The "index, columns" style continues to work as before. .. ipython:: python df.rename(index=id, columns=str.lower) df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C']) We *highly* encourage using named arguments to avoid confusion when using either style. .. _whatsnew_0210.enhancements.categorical_dtype: ``CategoricalDtype`` for specifying categoricals ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :class:`pandas.api.types.CategoricalDtype` has been added to the public API and expanded to include the ``categories`` and ``ordered`` attributes. A ``CategoricalDtype`` can be used to specify the set of categories and orderedness of an array, independent of the data. This can be useful for example, when converting string data to a ``Categorical`` (:issue:`14711`, :issue:`15078`, :issue:`16015`, :issue:`17643`): .. ipython:: python from pandas.api.types import CategoricalDtype s = pd.Series(['a', 'b', 'c', 'a']) # strings dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True) s.astype(dtype) One place that deserves special mention is in :meth:`read_csv`. Previously, with ``dtype={'col': 'category'}``, the returned values and categories would always be strings. .. ipython:: python :suppress: from io import StringIO .. ipython:: python data = 'A,B\na,1\nb,2\nc,3' pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories Notice the "object" dtype. With a ``CategoricalDtype`` of all numerics, datetimes, or timedeltas, we can automatically convert to the correct type .. ipython:: python dtype = {'B': CategoricalDtype([1, 2, 3])} pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories The values have been correctly interpreted as integers. The ``.dtype`` property of a ``Categorical``, ``CategoricalIndex`` or a ``Series`` with categorical type will now return an instance of ``CategoricalDtype``. While the repr has changed, ``str(CategoricalDtype())`` is still the string ``'category'``. We'll take this moment to remind users that the *preferred* way to detect categorical data is to use :func:`pandas.api.types.is_categorical_dtype`, and not ``str(dtype) == 'category'``. See the :ref:`CategoricalDtype docs ` for more. .. _whatsnew_0210.enhancements.GroupBy_pipe: ``GroupBy`` objects now have a ``pipe`` method ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series``, that allow for functions that take a ``GroupBy`` to be composed in a clean, readable syntax. (:issue:`17871`) For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine having a DataFrame with columns for stores, products, revenue and sold quantity. We'd like to do a groupwise calculation of *prices* (i.e. revenue/quantity) per store and per product. We could do this in a multi-step operation, but expressing it in terms of piping can make the code more readable. First we set the data: .. ipython:: python import numpy as np n = 1000 df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n), 'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3' ], n), 'Revenue': (np.random.random(n) * 50 + 10).round(2), 'Quantity': np.random.randint(1, 10, size=n)}) df.head(2) Now, to find prices per store/product, we can simply do: .. ipython:: python (df.groupby(['Store', 'Product']) .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum()) .unstack().round(2)) See the :ref:`documentation ` for more. .. _whatsnew_0210.enhancements.rename_categories: ``Categorical.rename_categories`` accepts a dict-like ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :meth:`~Series.cat.rename_categories` now accepts a dict-like argument for ``new_categories``. The previous categories are looked up in the dictionary's keys and replaced if found. The behavior of missing and extra keys is the same as in :meth:`DataFrame.rename`. .. ipython:: python c = pd.Categorical(['a', 'a', 'b']) c.rename_categories({"a": "eh", "b": "bee"}) .. warning:: To assist with upgrading pandas, ``rename_categories`` treats ``Series`` as list-like. Typically, Series are considered to be dict-like (e.g. in ``.rename``, ``.map``). In a future version of pandas ``rename_categories`` will change to treat them as dict-like. Follow the warning message's recommendations for writing future-proof code. .. code-block:: ipython In [33]: c.rename_categories(pd.Series([0, 1], index=['a', 'c'])) FutureWarning: Treating Series 'new_categories' as a list-like and using the values. In a future version, 'rename_categories' will treat Series like a dictionary. For dict-like, use 'new_categories.to_dict()' For list-like, use 'new_categories.values'. Out[33]: [0, 0, 1] Categories (2, int64): [0, 1] .. _whatsnew_0210.enhancements.other: Other enhancements ^^^^^^^^^^^^^^^^^^ New functions or methods """""""""""""""""""""""" - :meth:`~pandas.core.resample.Resampler.nearest` is added to support nearest-neighbor upsampling (:issue:`17496`). - :class:`~pandas.Index` has added support for a ``to_frame`` method (:issue:`15230`). New keywords """""""""""" - Added a ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to support type inference in the presence of missing values (:issue:`17059`). - :func:`Series.to_dict` and :func:`DataFrame.to_dict` now support an ``into`` keyword which allows you to specify the ``collections.Mapping`` subclass that you would like returned. The default is ``dict``, which is backwards compatible. (:issue:`16122`) - :func:`Series.set_axis` and :func:`DataFrame.set_axis` now support the ``inplace`` parameter. (:issue:`14636`) - :func:`Series.to_pickle` and :func:`DataFrame.to_pickle` have gained a ``protocol`` parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL `__ - :func:`read_feather` has gained the ``nthreads`` parameter for multi-threaded operations (:issue:`16359`) - :func:`DataFrame.clip()` and :func:`Series.clip()` have gained an ``inplace`` argument. (:issue:`15388`) - :func:`crosstab` has gained a ``margins_name`` parameter to define the name of the row / column that will contain the totals when ``margins=True``. (:issue:`15972`) - :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`) - :func:`read_json` and :func:`~DataFrame.to_json` now accept a ``compression`` argument which allows them to transparently handle compressed files. (:issue:`17798`) Various enhancements """""""""""""""""""" - Improved the import time of pandas by about 2.25x. (:issue:`16764`) - Support for `PEP 519 -- Adding a file system path protocol `_ on most readers (e.g. :func:`read_csv`) and writers (e.g. :meth:`DataFrame.to_csv`) (:issue:`13823`). - Added a ``__fspath__`` method to ``pd.HDFStore``, ``pd.ExcelFile``, and ``pd.ExcelWriter`` to work properly with the file system path protocol (:issue:`13823`). - The ``validate`` argument for :func:`merge` now checks whether a merge is one-to-one, one-to-many, many-to-one, or many-to-many. If a merge is found to not be an example of specified merge type, an exception of type ``MergeError`` will be raised. For more, see :ref:`here ` (:issue:`16270`) - Added support for `PEP 518 `_ (``pyproject.toml``) to the build system (:issue:`16745`) - :func:`RangeIndex.append` now returns a ``RangeIndex`` object when possible (:issue:`16212`) - :func:`Series.rename_axis` and :func:`DataFrame.rename_axis` with ``inplace=True`` now return ``None`` while renaming the axis inplace. (:issue:`15704`) - :func:`api.types.infer_dtype` now infers decimals. (:issue:`15690`) - :func:`DataFrame.select_dtypes` now accepts scalar values for include/exclude as well as list-like. (:issue:`16855`) - :func:`date_range` now accepts 'YS' in addition to 'AS' as an alias for start of year. (:issue:`9313`) - :func:`date_range` now accepts 'Y' in addition to 'A' as an alias for end of year. (:issue:`9313`) - :func:`DataFrame.add_prefix` and :func:`DataFrame.add_suffix` now accept strings containing the '%' character. (:issue:`17151`) - Read/write methods that infer compression (:func:`read_csv`, :func:`read_table`, :func:`read_pickle`, and :meth:`~DataFrame.to_pickle`) can now infer from path-like objects, such as ``pathlib.Path``. (:issue:`17206`) - :func:`read_sas` now recognizes much more of the most frequently used date (datetime) formats in SAS7BDAT files. (:issue:`15871`) - :func:`DataFrame.items` and :func:`Series.items` are now present in both Python 2 and 3 and is lazy in all cases. (:issue:`13918`, :issue:`17213`) - :meth:`pandas.io.formats.style.Styler.where` has been implemented as a convenience for :meth:`pandas.io.formats.style.Styler.applymap`. (:issue:`17474`) - :func:`MultiIndex.is_monotonic_decreasing` has been implemented. Previously returned ``False`` in all cases. (:issue:`16554`) - :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`) - :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names. (:issue:`14207`) - :func:`Series.reindex`, :func:`DataFrame.reindex`, :func:`Index.get_indexer` now support list-like argument for ``tolerance``. (:issue:`17367`) .. _whatsnew_0210.api_breaking: Backwards incompatible API changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. _whatsnew_0210.api_breaking.deps: Dependencies have increased minimum versions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We have updated our minimum supported versions of dependencies (:issue:`15206`, :issue:`15543`, :issue:`15214`). If installed, we now require: +--------------+-----------------+----------+ | Package | Minimum Version | Required | +==============+=================+==========+ | Numpy | 1.9.0 | X | +--------------+-----------------+----------+ | Matplotlib | 1.4.3 | | +--------------+-----------------+----------+ | Scipy | 0.14.0 | | +--------------+-----------------+----------+ | Bottleneck | 1.0.0 | | +--------------+-----------------+----------+ Additionally, support has been dropped for Python 3.4 (:issue:`15251`). .. _whatsnew_0210.api_breaking.bottleneck: Sum/prod of all-NaN or empty Series/DataFrames is now consistently NaN ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: The changes described here have been partially reverted. See the :ref:`v0.22.0 Whatsnew ` for more. The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames no longer depends on whether `bottleneck `__ is installed, and return value of ``sum`` and ``prod`` on an empty Series has changed (:issue:`9422`, :issue:`15507`). Calling ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, will result in ``NaN``. See the :ref:`docs `. .. ipython:: python s = pd.Series([np.nan]) Previously WITHOUT ``bottleneck`` installed: .. code-block:: ipython In [2]: s.sum() Out[2]: np.nan Previously WITH ``bottleneck``: .. code-block:: ipython In [2]: s.sum() Out[2]: 0.0 New behavior, without regard to the bottleneck installation: .. ipython:: python s.sum() Note that this also changes the sum of an empty ``Series``. Previously this always returned 0 regardless of a ``bottleneck`` installation: .. code-block:: ipython In [1]: pd.Series([]).sum() Out[1]: 0 but for consistency with the all-NaN case, this was changed to return NaN as well: .. ipython:: python :okwarning: pd.Series([]).sum() .. _whatsnew_0210.api_breaking.loc: Indexing with a list with missing labels is deprecated ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning ``NaN`` for missing labels. This will now show a ``FutureWarning``. In the future this will raise a ``KeyError`` (:issue:`15747`). This warning will trigger on a ``DataFrame`` or a ``Series`` for using ``.loc[]`` or ``[[]]`` when passing a list-of-labels with at least 1 missing label. See the :ref:`deprecation docs `. .. ipython:: python s = pd.Series([1, 2, 3]) s Previous behavior .. code-block:: ipython In [4]: s.loc[[1, 2, 3]] Out[4]: 1 2.0 2 3.0 3 NaN dtype: float64 Current behavior .. code-block:: ipython In [4]: s.loc[[1, 2, 3]] Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative. See the documentation here: https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike Out[4]: 1 2.0 2 3.0 3 NaN dtype: float64 The idiomatic way to achieve selecting potentially not-found elements is via ``.reindex()`` .. ipython:: python s.reindex([1, 2, 3]) Selection with all keys found is unchanged. .. ipython:: python s.loc[[1, 2]] .. _whatsnew_0210.api.na_changes: NA naming changes ^^^^^^^^^^^^^^^^^ In order to promote more consistency among the pandas API, we have added additional top-level functions :func:`isna` and :func:`notna` that are aliases for :func:`isnull` and :func:`notnull`. The naming scheme is now more consistent with methods like ``.dropna()`` and ``.fillna()``. Furthermore in all cases where ``.isnull()`` and ``.notnull()`` methods are defined, these have additional methods named ``.isna()`` and ``.notna()``, these are included for classes ``Categorical``, ``Index``, ``Series``, and ``DataFrame``. (:issue:`15001`). The configuration option ``pd.options.mode.use_inf_as_null`` is deprecated, and ``pd.options.mode.use_inf_as_na`` is added as a replacement. .. _whatsnew_0210.api_breaking.iteration_scalars: Iteration of Series/Index will now return Python scalars ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Previously, when using certain iteration methods for a ``Series`` with dtype ``int`` or ``float``, you would receive a ``numpy`` scalar, e.g. a ``np.int64``, rather than a Python ``int``. Issue (:issue:`10904`) corrected this for ``Series.tolist()`` and ``list(Series)``. This change makes all iteration methods consistent, in particular, for ``__iter__()`` and ``.map()``; note that this only affects int/float dtypes. (:issue:`13236`, :issue:`13258`, :issue:`14216`). .. ipython:: python s = pd.Series([1, 2, 3]) s Previously: .. code-block:: ipython In [2]: type(list(s)[0]) Out[2]: numpy.int64 New behavior: .. ipython:: python type(list(s)[0]) Furthermore this will now correctly box the results of iteration for :func:`DataFrame.to_dict` as well. .. ipython:: python d = {'a': [1], 'b': ['b']} df = pd.DataFrame(d) Previously: .. code-block:: ipython In [8]: type(df.to_dict()['a'][0]) Out[8]: numpy.int64 New behavior: .. ipython:: python type(df.to_dict()['a'][0]) .. _whatsnew_0210.api_breaking.loc_with_index: Indexing with a Boolean Index ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Previously when passing a boolean ``Index`` to ``.loc``, if the index of the ``Series/DataFrame`` had ``boolean`` labels, you would get a label based selection, potentially duplicating result labels, rather than a boolean indexing selection (where ``True`` selects elements), this was inconsistent how a boolean numpy array indexed. The new behavior is to act like a boolean numpy array indexer. (:issue:`17738`) Previous behavior: .. ipython:: python s = pd.Series([1, 2, 3], index=[False, True, False]) s .. code-block:: ipython In [59]: s.loc[pd.Index([True, False, True])] Out[59]: True 2 False 1 False 3 True 2 dtype: int64 Current behavior .. ipython:: python s.loc[pd.Index([True, False, True])] Furthermore, previously if you had an index that was non-numeric (e.g. strings), then a boolean Index would raise a ``KeyError``. This will now be treated as a boolean indexer. Previously behavior: .. ipython:: python s = pd.Series([1, 2, 3], index=['a', 'b', 'c']) s .. code-block:: ipython In [39]: s.loc[pd.Index([True, False, True])] KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]" Current behavior .. ipython:: python s.loc[pd.Index([True, False, True])] .. _whatsnew_0210.api_breaking.period_index_resampling: ``PeriodIndex`` resampling ^^^^^^^^^^^^^^^^^^^^^^^^^^ In previous versions of pandas, resampling a ``Series``/``DataFrame`` indexed by a ``PeriodIndex`` returned a ``DatetimeIndex`` in some cases (:issue:`12884`). Resampling to a multiplied frequency now returns a ``PeriodIndex`` (:issue:`15944`). As a minor enhancement, resampling a ``PeriodIndex`` can now handle ``NaT`` values (:issue:`13224`) Previous behavior: .. code-block:: ipython In [1]: pi = pd.period_range('2017-01', periods=12, freq='M') In [2]: s = pd.Series(np.arange(12), index=pi) In [3]: resampled = s.resample('2Q').mean() In [4]: resampled Out[4]: 2017-03-31 1.0 2017-09-30 5.5 2018-03-31 10.0 Freq: 2Q-DEC, dtype: float64 In [5]: resampled.index Out[5]: DatetimeIndex(['2017-03-31', '2017-09-30', '2018-03-31'], dtype='datetime64[ns]', freq='2Q-DEC') New behavior: .. ipython:: python pi = pd.period_range('2017-01', periods=12, freq='M') s = pd.Series(np.arange(12), index=pi) resampled = s.resample('2Q').mean() resampled resampled.index Upsampling and calling ``.ohlc()`` previously returned a ``Series``, basically identical to calling ``.asfreq()``. OHLC upsampling now returns a DataFrame with columns ``open``, ``high``, ``low`` and ``close`` (:issue:`13083`). This is consistent with downsampling and ``DatetimeIndex`` behavior. Previous behavior: .. code-block:: ipython In [1]: pi = pd.period_range(start='2000-01-01', freq='D', periods=10) In [2]: s = pd.Series(np.arange(10), index=pi) In [3]: s.resample('H').ohlc() Out[3]: 2000-01-01 00:00 0.0 ... 2000-01-10 23:00 NaN Freq: H, Length: 240, dtype: float64 In [4]: s.resample('M').ohlc() Out[4]: open high low close 2000-01 0 9 0 9 New behavior: .. ipython:: python pi = pd.period_range(start='2000-01-01', freq='D', periods=10) s = pd.Series(np.arange(10), index=pi) s.resample('H').ohlc() s.resample('M').ohlc() .. _whatsnew_0210.api_breaking.pandas_eval: Improved error handling during item assignment in pd.eval ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :func:`eval` will now raise a ``ValueError`` when item assignment malfunctions, or inplace operations are specified, but there is no item assignment in the expression (:issue:`16732`) .. ipython:: python arr = np.array([1, 2, 3]) Previously, if you attempted the following expression, you would get a not very helpful error message: .. code-block:: ipython In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True) ... IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices This is a very long way of saying numpy arrays don't support string-item indexing. With this change, the error message is now this: .. code-block:: python In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True) ... ValueError: Cannot assign expression output to target It also used to be possible to evaluate expressions inplace, even if there was no item assignment: .. code-block:: ipython In [4]: pd.eval("1 + 2", target=arr, inplace=True) Out[4]: 3 However, this input does not make much sense because the output is not being assigned to the target. Now, a ``ValueError`` will be raised when such an input is passed in: .. code-block:: ipython In [4]: pd.eval("1 + 2", target=arr, inplace=True) ... ValueError: Cannot operate inplace if there is no assignment .. _whatsnew_0210.api_breaking.dtype_conversions: Dtype conversions ^^^^^^^^^^^^^^^^^ Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignment, would coerce to same the type (e.g. int / float), or raise for datetimelikes. These will now preserve the bools with ``object`` dtypes. (:issue:`16821`). .. ipython:: python s = pd.Series([1, 2, 3]) .. code-block:: python In [5]: s[1] = True In [6]: s Out[6]: 0 1 1 1 2 3 dtype: int64 New behavior .. ipython:: python s[1] = True s Previously, as assignment to a datetimelike with a non-datetimelike would coerce the non-datetime-like item being assigned (:issue:`14145`). .. ipython:: python s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')]) .. code-block:: python In [1]: s[1] = 1 In [2]: s Out[2]: 0 2011-01-01 00:00:00.000000000 1 1970-01-01 00:00:00.000000001 dtype: datetime64[ns] These now coerce to ``object`` dtype. .. ipython:: python s[1] = 1 s - Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`) - Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`) .. _whatsnew_210.api.multiindex_single: MultiIndex constructor with a single level ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``MultiIndex`` constructors no longer squeezes a MultiIndex with all length-one levels down to a regular ``Index``. This affects all the ``MultiIndex`` constructors. (:issue:`17178`) Previous behavior: .. code-block:: ipython In [2]: pd.MultiIndex.from_tuples([('a',), ('b',)]) Out[2]: Index(['a', 'b'], dtype='object') Length 1 levels are no longer special-cased. They behave exactly as if you had length 2+ levels, so a :class:`MultiIndex` is always returned from all of the ``MultiIndex`` constructors: .. ipython:: python pd.MultiIndex.from_tuples([('a',), ('b',)]) .. _whatsnew_0210.api.utc_localization_with_series: UTC localization with Series ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Previously, :func:`to_datetime` did not localize datetime ``Series`` data when ``utc=True`` was passed. Now, :func:`to_datetime` will correctly localize ``Series`` with a ``datetime64[ns, UTC]`` dtype to be consistent with how list-like and ``Index`` data are handled. (:issue:`6415`). Previous behavior .. ipython:: python s = pd.Series(['20130101 00:00:00'] * 3) .. code-block:: ipython In [12]: pd.to_datetime(s, utc=True) Out[12]: 0 2013-01-01 1 2013-01-01 2 2013-01-01 dtype: datetime64[ns] New behavior .. ipython:: python pd.to_datetime(s, utc=True) Additionally, DataFrames with datetime columns that were parsed by :func:`read_sql_table` and :func:`read_sql_query` will also be localized to UTC only if the original SQL columns were timezone aware datetime columns. .. _whatsnew_0210.api.consistency_of_range_functions: Consistency of range functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In previous versions, there were some inconsistencies between the various range functions: :func:`date_range`, :func:`bdate_range`, :func:`period_range`, :func:`timedelta_range`, and :func:`interval_range`. (:issue:`17471`). One of the inconsistent behaviors occurred when the ``start``, ``end`` and ``period`` parameters were all specified, potentially leading to ambiguous ranges. When all three parameters were passed, ``interval_range`` ignored the ``period`` parameter, ``period_range`` ignored the ``end`` parameter, and the other range functions raised. To promote consistency among the range functions, and avoid potentially ambiguous ranges, ``interval_range`` and ``period_range`` will now raise when all three parameters are passed. Previous behavior: .. code-block:: ipython In [2]: pd.interval_range(start=0, end=4, periods=6) Out[2]: IntervalIndex([(0, 1], (1, 2], (2, 3]] closed='right', dtype='interval[int64]') In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q') Out[3]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1', '2018Q2'], dtype='period[Q-DEC]', freq='Q-DEC') New behavior: .. code-block:: ipython In [2]: pd.interval_range(start=0, end=4, periods=6) --------------------------------------------------------------------------- ValueError: Of the three parameters: start, end, and periods, exactly two must be specified In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q') --------------------------------------------------------------------------- ValueError: Of the three parameters: start, end, and periods, exactly two must be specified Additionally, the endpoint parameter ``end`` was not included in the intervals produced by ``interval_range``. However, all other range functions include ``end`` in their output. To promote consistency among the range functions, ``interval_range`` will now include ``end`` as the right endpoint of the final interval, except if ``freq`` is specified in a way which skips ``end``. Previous behavior: .. code-block:: ipython In [4]: pd.interval_range(start=0, end=4) Out[4]: IntervalIndex([(0, 1], (1, 2], (2, 3]] closed='right', dtype='interval[int64]') New behavior: .. ipython:: python pd.interval_range(start=0, end=4) .. _whatsnew_0210.api.mpl_converters: No automatic Matplotlib converters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pandas no longer registers our ``date``, ``time``, ``datetime``, ``datetime64``, and ``Period`` converters with matplotlib when pandas is imported. Matplotlib plot methods (``plt.plot``, ``ax.plot``, ...), will not nicely format the x-axis for ``DatetimeIndex`` or ``PeriodIndex`` values. You must explicitly register these methods: pandas built-in ``Series.plot`` and ``DataFrame.plot`` *will* register these converters on first-use (:issue:`17710`). .. note:: This change has been temporarily reverted in pandas 0.21.1, for more details see :ref:`here `. .. _whatsnew_0210.api: Other API changes ^^^^^^^^^^^^^^^^^ - The Categorical constructor no longer accepts a scalar for the ``categories`` keyword. (:issue:`16022`) - Accessing a non-existent attribute on a closed :class:`~pandas.HDFStore` will now raise an ``AttributeError`` rather than a ``ClosedFileError`` (:issue:`16301`) - :func:`read_csv` now issues a ``UserWarning`` if the ``names`` parameter contains duplicates (:issue:`17095`) - :func:`read_csv` now treats ``'null'`` and ``'n/a'`` strings as missing values by default (:issue:`16471`, :issue:`16078`) - :class:`pandas.HDFStore`'s string representation is now faster and less detailed. For the previous behavior, use ``pandas.HDFStore.info()``. (:issue:`16503`). - Compression defaults in HDF stores now follow pytables standards. Default is no compression and if ``complib`` is missing and ``complevel`` > 0 ``zlib`` is used (:issue:`15943`) - ``Index.get_indexer_non_unique()`` now returns a ndarray indexer rather than an ``Index``; this is consistent with ``Index.get_indexer()`` (:issue:`16819`) - Removed the ``@slow`` decorator from ``pandas._testing``, which caused issues for some downstream packages' test suites. Use ``@pytest.mark.slow`` instead, which achieves the same thing (:issue:`16850`) - Moved definition of ``MergeError`` to the ``pandas.errors`` module. - The signature of :func:`Series.set_axis` and :func:`DataFrame.set_axis` has been changed from ``set_axis(axis, labels)`` to ``set_axis(labels, axis=0)``, for consistency with the rest of the API. The old signature is deprecated and will show a ``FutureWarning`` (:issue:`14636`) - :func:`Series.argmin` and :func:`Series.argmax` will now raise a ``TypeError`` when used with ``object`` dtypes, instead of a ``ValueError`` (:issue:`13595`) - :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`). - :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`) - :func:`to_datetime` now raises a ``ValueError`` when format includes ``%W`` or ``%U`` without also including day of the week and calendar year (:issue:`16774`) - Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`) - Bug in :func:`DataFrame.drop` caused boolean labels ``False`` and ``True`` to be treated as labels 0 and 1 respectively when dropping indices from a numeric index. This will now raise a ValueError (:issue:`16877`) - Restricted DateOffset keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`). .. _whatsnew_0210.deprecations: Deprecations ~~~~~~~~~~~~ - :meth:`DataFrame.from_csv` and :meth:`Series.from_csv` have been deprecated in favor of :func:`read_csv()` (:issue:`4191`) - :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with ``.to_excel()`` (:issue:`10559`). - :func:`read_excel()` has deprecated ``parse_cols`` in favor of ``usecols`` for consistency with :func:`read_csv` (:issue:`4988`) - :func:`read_csv()` has deprecated the ``tupleize_cols`` argument. Column tuples will always be converted to a ``MultiIndex`` (:issue:`17060`) - :meth:`DataFrame.to_csv` has deprecated the ``tupleize_cols`` argument. MultiIndex columns will be always written as rows in the CSV file (:issue:`17060`) - The ``convert`` parameter has been deprecated in the ``.take()`` method, as it was not being respected (:issue:`16948`) - ``pd.options.html.border`` has been deprecated in favor of ``pd.options.display.html.border`` (:issue:`15793`). - :func:`SeriesGroupBy.nth` has deprecated ``True`` in favor of ``'all'`` for its kwarg ``dropna`` (:issue:`11038`). - :func:`DataFrame.as_blocks` is deprecated, as this is exposing the internal implementation (:issue:`17302`) - ``pd.TimeGrouper`` is deprecated in favor of :class:`pandas.Grouper` (:issue:`16747`) - ``cdate_range`` has been deprecated in favor of :func:`bdate_range`, which has gained ``weekmask`` and ``holidays`` parameters for building custom frequency date ranges. See the :ref:`documentation ` for more details (:issue:`17596`) - passing ``categories`` or ``ordered`` kwargs to :func:`Series.astype` is deprecated, in favor of passing a :ref:`CategoricalDtype ` (:issue:`17636`) - ``.get_value`` and ``.set_value`` on ``Series``, ``DataFrame``, ``Panel``, ``SparseSeries``, and ``SparseDataFrame`` are deprecated in favor of using ``.iat[]`` or ``.at[]`` accessors (:issue:`15269`) - Passing a non-existent column in ``.to_excel(..., columns=)`` is deprecated and will raise a ``KeyError`` in the future (:issue:`17295`) - ``raise_on_error`` parameter to :func:`Series.where`, :func:`Series.mask`, :func:`DataFrame.where`, :func:`DataFrame.mask` is deprecated, in favor of ``errors=`` (:issue:`14968`) - Using :meth:`DataFrame.rename_axis` and :meth:`Series.rename_axis` to alter index or column *labels* is now deprecated in favor of using ``.rename``. ``rename_axis`` may still be used to alter the name of the index or columns (:issue:`17833`). - :meth:`~DataFrame.reindex_axis` has been deprecated in favor of :meth:`~DataFrame.reindex`. See :ref:`here ` for more (:issue:`17833`). .. _whatsnew_0210.deprecations.select: Series.select and DataFrame.select ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The :meth:`Series.select` and :meth:`DataFrame.select` methods are deprecated in favor of using ``df.loc[labels.map(crit)]`` (:issue:`12401`) .. ipython:: python df = pd.DataFrame({'A': [1, 2, 3]}, index=['foo', 'bar', 'baz']) .. code-block:: ipython In [3]: df.select(lambda x: x in ['bar', 'baz']) FutureWarning: select is deprecated and will be removed in a future release. You can use .loc[crit] as a replacement Out[3]: A bar 2 baz 3 .. ipython:: python df.loc[df.index.map(lambda x: x in ['bar', 'baz'])] .. _whatsnew_0210.deprecations.argmin_min: Series.argmax and Series.argmin ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The behavior of :func:`Series.argmax` and :func:`Series.argmin` have been deprecated in favor of :func:`Series.idxmax` and :func:`Series.idxmin`, respectively (:issue:`16830`). For compatibility with NumPy arrays, ``pd.Series`` implements ``argmax`` and ``argmin``. Since pandas 0.13.0, ``argmax`` has been an alias for :meth:`pandas.Series.idxmax`, and ``argmin`` has been an alias for :meth:`pandas.Series.idxmin`. They return the *label* of the maximum or minimum, rather than the *position*. We've deprecated the current behavior of ``Series.argmax`` and ``Series.argmin``. Using either of these will emit a ``FutureWarning``. Use :meth:`Series.idxmax` if you want the label of the maximum. Use ``Series.values.argmax()`` if you want the position of the maximum. Likewise for the minimum. In a future release ``Series.argmax`` and ``Series.argmin`` will return the position of the maximum or minimum. .. _whatsnew_0210.prior_deprecations: Removal of prior version deprecations/changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - :func:`read_excel()` has dropped the ``has_index_names`` parameter (:issue:`10967`) - The ``pd.options.display.height`` configuration has been dropped (:issue:`3663`) - The ``pd.options.display.line_width`` configuration has been dropped (:issue:`2881`) - The ``pd.options.display.mpl_style`` configuration has been dropped (:issue:`12190`) - ``Index`` has dropped the ``.sym_diff()`` method in favor of ``.symmetric_difference()`` (:issue:`12591`) - ``Categorical`` has dropped the ``.order()`` and ``.sort()`` methods in favor of ``.sort_values()`` (:issue:`12882`) - :func:`eval` and :func:`DataFrame.eval` have changed the default of ``inplace`` from ``None`` to ``False`` (:issue:`11149`) - The function ``get_offset_name`` has been dropped in favor of the ``.freqstr`` attribute for an offset (:issue:`11834`) - pandas no longer tests for compatibility with hdf5-files created with pandas < 0.11 (:issue:`17404`). .. _whatsnew_0210.performance: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - Improved performance of instantiating :class:`SparseDataFrame` (:issue:`16773`) - :attr:`Series.dt` no longer performs frequency inference, yielding a large speedup when accessing the attribute (:issue:`17210`) - Improved performance of :meth:`~Series.cat.set_categories` by not materializing the values (:issue:`17508`) - :attr:`Timestamp.microsecond` no longer re-computes on attribute access (:issue:`17331`) - Improved performance of the :class:`CategoricalIndex` for data that is already categorical dtype (:issue:`17513`) - Improved performance of :meth:`RangeIndex.min` and :meth:`RangeIndex.max` by using ``RangeIndex`` properties to perform the computations (:issue:`17607`) .. _whatsnew_0210.docs: Documentation changes ~~~~~~~~~~~~~~~~~~~~~ - Several ``NaT`` method docstrings (e.g. :func:`NaT.ctime`) were incorrect (:issue:`17327`) - The documentation has had references to versions < v0.17 removed and cleaned up (:issue:`17442`, :issue:`17442`, :issue:`17404` & :issue:`17504`) .. _whatsnew_0210.bug_fixes: Bug fixes ~~~~~~~~~ Conversion ^^^^^^^^^^ - Bug in assignment against datetime-like data with ``int`` may incorrectly convert to datetime-like (:issue:`14145`) - Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`) - Fixed the return type of ``IntervalIndex.is_non_overlapping_monotonic`` to be a Python ``bool`` for consistency with similar attributes/methods. Previously returned a ``numpy.bool_``. (:issue:`17237`) - Bug in ``IntervalIndex.is_non_overlapping_monotonic`` when intervals are closed on both sides and overlap at a point (:issue:`16560`) - Bug in :func:`Series.fillna` returns frame when ``inplace=True`` and ``value`` is dict (:issue:`16156`) - Bug in :attr:`Timestamp.weekday_name` returning a UTC-based weekday name when localized to a timezone (:issue:`17354`) - Bug in ``Timestamp.replace`` when replacing ``tzinfo`` around DST changes (:issue:`15683`) - Bug in ``Timedelta`` construction and arithmetic that would not propagate the ``Overflow`` exception (:issue:`17367`) - Bug in :meth:`~DataFrame.astype` converting to object dtype when passed extension type classes (``DatetimeTZDtype``, ``CategoricalDtype``) rather than instances. Now a ``TypeError`` is raised when a class is passed (:issue:`17780`). - Bug in :meth:`to_numeric` in which elements were not always being coerced to numeric when ``errors='coerce'`` (:issue:`17007`, :issue:`17125`) - Bug in ``DataFrame`` and ``Series`` constructors where ``range`` objects are converted to ``int32`` dtype on Windows instead of ``int64`` (:issue:`16804`) Indexing ^^^^^^^^ - When called with a null slice (e.g. ``df.iloc[:]``), the ``.iloc`` and ``.loc`` indexers return a shallow copy of the original object. Previously they returned the original object. (:issue:`13873`). - When called on an unsorted ``MultiIndex``, the ``loc`` indexer now will raise ``UnsortedIndexError`` only if proper slicing is used on non-sorted levels (:issue:`16734`). - Fixes regression in 0.20.3 when indexing with a string on a ``TimedeltaIndex`` (:issue:`16896`). - Fixed :func:`TimedeltaIndex.get_loc` handling of ``np.timedelta64`` inputs (:issue:`16909`). - Fix :func:`MultiIndex.sort_index` ordering when ``ascending`` argument is a list, but not all levels are specified, or are in a different order (:issue:`16934`). - Fixes bug where indexing with ``np.inf`` caused an ``OverflowError`` to be raised (:issue:`16957`) - Bug in reindexing on an empty ``CategoricalIndex`` (:issue:`16770`) - Fixes ``DataFrame.loc`` for setting with alignment and tz-aware ``DatetimeIndex`` (:issue:`16889`) - Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`) - Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`) - Bug in ``.iloc`` when used with inplace addition or assignment and an int indexer on a ``MultiIndex`` causing the wrong indexes to be read from and written to (:issue:`17148`) - Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`) - Bug in ``CategoricalIndex`` reindexing in which specified indices containing duplicates were not being respected (:issue:`17323`) - Bug in intersection of ``RangeIndex`` with negative step (:issue:`17296`) - Bug in ``IntervalIndex`` where performing a scalar lookup fails for included right endpoints of non-overlapping monotonic decreasing indexes (:issue:`16417`, :issue:`17271`) - Bug in :meth:`DataFrame.first_valid_index` and :meth:`DataFrame.last_valid_index` when no valid entry (:issue:`17400`) - Bug in :func:`Series.rename` when called with a callable, incorrectly alters the name of the ``Series``, rather than the name of the ``Index``. (:issue:`17407`) - Bug in :func:`String.str_get` raises ``IndexError`` instead of inserting NaNs when using a negative index. (:issue:`17704`) IO ^^ - Bug in :func:`read_hdf` when reading a timezone aware index from ``fixed`` format HDFStore (:issue:`17618`) - Bug in :func:`read_csv` in which columns were not being thoroughly de-duplicated (:issue:`17060`) - Bug in :func:`read_csv` in which specified column names were not being thoroughly de-duplicated (:issue:`17095`) - Bug in :func:`read_csv` in which non integer values for the header argument generated an unhelpful / unrelated error message (:issue:`16338`) - Bug in :func:`read_csv` in which memory management issues in exception handling, under certain conditions, would cause the interpreter to segfault (:issue:`14696`, :issue:`16798`). - Bug in :func:`read_csv` when called with ``low_memory=False`` in which a CSV with at least one column > 2GB in size would incorrectly raise a ``MemoryError`` (:issue:`16798`). - Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`) - Bug in :meth:`DataFrame.to_csv` defaulting to 'ascii' encoding in Python 3, instead of 'utf-8' (:issue:`17097`) - Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`) - Bug in :func:`read_stata` where the index was not set (:issue:`16342`) - Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`) - Bug in :func:`read_csv` where automatic delimiter detection caused a ``TypeError`` to be thrown when a bad line was encountered rather than the correct error message (:issue:`13374`) - Bug in :meth:`DataFrame.to_html` with ``notebook=True`` where DataFrames with named indices or non-MultiIndex indices had undesired horizontal or vertical alignment for column or row labels, respectively (:issue:`16792`) - Bug in :meth:`DataFrame.to_html` in which there was no validation of the ``justify`` parameter (:issue:`17527`) - Bug in :func:`HDFStore.select` when reading a contiguous mixed-data table featuring VLArray (:issue:`17021`) - Bug in :func:`to_json` where several conditions (including objects with unprintable symbols, objects with deep recursion, overlong labels) caused segfaults instead of raising the appropriate exception (:issue:`14256`) Plotting ^^^^^^^^ - Bug in plotting methods using ``secondary_y`` and ``fontsize`` not setting secondary axis font size (:issue:`12565`) - Bug when plotting ``timedelta`` and ``datetime`` dtypes on y-axis (:issue:`16953`) - Line plots no longer assume monotonic x data when calculating xlims, they show the entire lines now even for unsorted x data. (:issue:`11310`, :issue:`11471`) - With matplotlib 2.0.0 and above, calculation of x limits for line plots is left to matplotlib, so that its new default settings are applied. (:issue:`15495`) - Bug in ``Series.plot.bar`` or ``DataFrame.plot.bar`` with ``y`` not respecting user-passed ``color`` (:issue:`16822`) - Bug causing ``plotting.parallel_coordinates`` to reset the random seed when using random colors (:issue:`17525`) GroupBy/resample/rolling ^^^^^^^^^^^^^^^^^^^^^^^^ - Bug in ``DataFrame.resample(...).size()`` where an empty ``DataFrame`` did not return a ``Series`` (:issue:`14962`) - Bug in :func:`infer_freq` causing indices with 2-day gaps during the working week to be wrongly inferred as business daily (:issue:`16624`) - Bug in ``.rolling(...).quantile()`` which incorrectly used different defaults than :func:`Series.quantile()` and :func:`DataFrame.quantile()` (:issue:`9413`, :issue:`16211`) - Bug in ``groupby.transform()`` that would coerce boolean dtypes back to float (:issue:`16875`) - Bug in ``Series.resample(...).apply()`` where an empty ``Series`` modified the source index and did not return the name of a ``Series`` (:issue:`14313`) - Bug in ``.rolling(...).apply(...)`` with a ``DataFrame`` with a ``DatetimeIndex``, a ``window`` of a timedelta-convertible and ``min_periods >= 1`` (:issue:`15305`) - Bug in ``DataFrame.groupby`` where index and column keys were not recognized correctly when the number of keys equaled the number of elements on the groupby axis (:issue:`16859`) - Bug in ``groupby.nunique()`` with ``TimeGrouper`` which cannot handle ``NaT`` correctly (:issue:`17575`) - Bug in ``DataFrame.groupby`` where a single level selection from a ``MultiIndex`` unexpectedly sorts (:issue:`17537`) - Bug in ``DataFrame.groupby`` where spurious warning is raised when ``Grouper`` object is used to override ambiguous column name (:issue:`17383`) - Bug in ``TimeGrouper`` differs when passes as a list and as a scalar (:issue:`17530`) Sparse ^^^^^^ - Bug in ``SparseSeries`` raises ``AttributeError`` when a dictionary is passed in as data (:issue:`16905`) - Bug in :func:`SparseDataFrame.fillna` not filling all NaNs when frame was instantiated from SciPy sparse matrix (:issue:`16112`) - Bug in :func:`SparseSeries.unstack` and :func:`SparseDataFrame.stack` (:issue:`16614`, :issue:`15045`) - Bug in :func:`make_sparse` treating two numeric/boolean data, which have same bits, as same when array ``dtype`` is ``object`` (:issue:`17574`) - :func:`SparseArray.all` and :func:`SparseArray.any` are now implemented to handle ``SparseArray``, these were used but not implemented (:issue:`17570`) Reshaping ^^^^^^^^^ - Joining/Merging with a non unique ``PeriodIndex`` raised a ``TypeError`` (:issue:`16871`) - Bug in :func:`crosstab` where non-aligned series of integers were casted to float (:issue:`17005`) - Bug in merging with categorical dtypes with datetimelikes incorrectly raised a ``TypeError`` (:issue:`16900`) - Bug when using :func:`isin` on a large object series and large comparison array (:issue:`16012`) - Fixes regression from 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`) - Fixes dtype of result with integer dtype input, from :func:`pivot_table` when called with ``margins=True`` (:issue:`17013`) - Bug in :func:`crosstab` where passing two ``Series`` with the same name raised a ``KeyError`` (:issue:`13279`) - :func:`Series.argmin`, :func:`Series.argmax`, and their counterparts on ``DataFrame`` and groupby objects work correctly with floating point data that contains infinite values (:issue:`13595`). - Bug in :func:`unique` where checking a tuple of strings raised a ``TypeError`` (:issue:`17108`) - Bug in :func:`concat` where order of result index was unpredictable if it contained non-comparable elements (:issue:`17344`) - Fixes regression when sorting by multiple columns on a ``datetime64`` dtype ``Series`` with ``NaT`` values (:issue:`16836`) - Bug in :func:`pivot_table` where the result's columns did not preserve the categorical dtype of ``columns`` when ``dropna`` was ``False`` (:issue:`17842`) - Bug in ``DataFrame.drop_duplicates`` where dropping with non-unique column names raised a ``ValueError`` (:issue:`17836`) - Bug in :func:`unstack` which, when called on a list of levels, would discard the ``fillna`` argument (:issue:`13971`) - Bug in the alignment of ``range`` objects and other list-likes with ``DataFrame`` leading to operations being performed row-wise instead of column-wise (:issue:`17901`) Numeric ^^^^^^^ - Bug in ``.clip()`` with ``axis=1`` and a list-like for ``threshold`` is passed; previously this raised ``ValueError`` (:issue:`15390`) - :func:`Series.clip()` and :func:`DataFrame.clip()` now treat NA values for upper and lower arguments as ``None`` instead of raising ``ValueError`` (:issue:`17276`). Categorical ^^^^^^^^^^^ - Bug in :func:`Series.isin` when called with a categorical (:issue:`16639`) - Bug in the categorical constructor with empty values and categories causing the ``.categories`` to be an empty ``Float64Index`` rather than an empty ``Index`` with object dtype (:issue:`17248`) - Bug in categorical operations with :ref:`Series.cat ` not preserving the original Series' name (:issue:`17509`) - Bug in :func:`DataFrame.merge` failing for categorical columns with boolean/int data types (:issue:`17187`) - Bug in constructing a ``Categorical``/``CategoricalDtype`` when the specified ``categories`` are of categorical type (:issue:`17884`). .. _whatsnew_0210.pypy: PyPy ^^^^ - Compatibility with PyPy in :func:`read_csv` with ``usecols=[]`` and :func:`read_json` (:issue:`17351`) - Split tests into cases for CPython and PyPy where needed, which highlights the fragility of index matching with ``float('nan')``, ``np.nan`` and ``NAT`` (:issue:`17351`) - Fix :func:`DataFrame.memory_usage` to support PyPy. Objects on PyPy do not have a fixed size, so an approximation is used instead (:issue:`17228`) Other ^^^^^ - Bug where some inplace operators were not being wrapped and produced a copy when invoked (:issue:`12962`) - Bug in :func:`eval` where the ``inplace`` parameter was being incorrectly handled (:issue:`16732`) .. _whatsnew_0.21.0.contributors: Contributors ~~~~~~~~~~~~ .. contributors:: v0.20.3..v0.21.0