.. _whatsnew_0210:
Version 0.21.0 (October 27, 2017)
---------------------------------
{{ header }}
.. ipython:: python
:suppress:
from pandas import * # noqa F401, F403
This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.
Highlights include:
- Integration with `Apache Parquet `__, including a new top-level :func:`read_parquet` function and :meth:`DataFrame.to_parquet` method, see :ref:`here `.
- New user-facing :class:`pandas.api.types.CategoricalDtype` for specifying
categoricals independent of the data, see :ref:`here `.
- The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames is now consistent and no longer depends on whether `bottleneck `__ is installed, and ``sum`` and ``prod`` on empty Series now return NaN instead of 0, see :ref:`here `.
- Compatibility fixes for pypy, see :ref:`here `.
- Additions to the ``drop``, ``reindex`` and ``rename`` API to make them more consistent, see :ref:`here `.
- Addition of the new methods ``DataFrame.infer_objects`` (see :ref:`here `) and ``GroupBy.pipe`` (see :ref:`here `).
- Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see :ref:`here `.
Check the :ref:`API Changes ` and :ref:`deprecations ` before updating.
.. contents:: What's new in v0.21.0
:local:
:backlinks: none
:depth: 2
.. _whatsnew_0210.enhancements:
New features
~~~~~~~~~~~~
.. _whatsnew_0210.enhancements.parquet:
Integration with Apache Parquet file format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Integration with `Apache Parquet `__, including a new top-level :func:`read_parquet` and :func:`DataFrame.to_parquet` method, see :ref:`here ` (:issue:`15838`, :issue:`17438`).
`Apache Parquet `__ provides a cross-language, binary file format for reading and writing data frames efficiently.
Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, supporting all of the pandas
dtypes, including extension dtypes such as datetime with timezones.
This functionality depends on either the `pyarrow `__ or `fastparquet `__ library.
For more details, see :ref:`the IO docs on Parquet `.
.. _whatsnew_0210.enhancements.infer_objects:
Method ``infer_objects`` type conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`DataFrame.infer_objects` and :meth:`Series.infer_objects`
methods have been added to perform dtype inference on object columns, replacing
some of the functionality of the deprecated ``convert_objects``
method. See the documentation :ref:`here `
for more details. (:issue:`11221`)
This method only performs soft conversions on object columns, converting Python objects
to native types, but not any coercive conversions. For example:
.. ipython:: python
df = pd.DataFrame({'A': [1, 2, 3],
'B': np.array([1, 2, 3], dtype='object'),
'C': ['1', '2', '3']})
df.dtypes
df.infer_objects().dtypes
Note that column ``'C'`` was not converted - only scalar numeric types
will be converted to a new type. Other types of conversion should be accomplished
using the :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedelta`).
.. ipython:: python
df = df.infer_objects()
df['C'] = pd.to_numeric(df['C'], errors='coerce')
df.dtypes
.. _whatsnew_0210.enhancements.attribute_access:
Improved warnings when attempting to create columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
New users are often puzzled by the relationship between column operations and
attribute access on ``DataFrame`` instances (:issue:`7175`). One specific
instance of this confusion is attempting to create a new column by setting an
attribute on the ``DataFrame``:
.. code-block:: ipython
In [1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In [2]: df.two = [4, 5, 6]
This does not raise any obvious exceptions, but also does not create a new column:
.. code-block:: ipython
In [3]: df
Out[3]:
one
0 1.0
1 2.0
2 3.0
Setting a list-like data structure into a new attribute now raises a ``UserWarning`` about the potential for unexpected behavior. See :ref:`Attribute Access `.
.. _whatsnew_0210.enhancements.drop_api:
Method ``drop`` now also accepts index/columns keywords
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`~DataFrame.drop` method has gained ``index``/``columns`` keywords as an
alternative to specifying the ``axis``. This is similar to the behavior of ``reindex``
(:issue:`12392`).
For example:
.. ipython:: python
df = pd.DataFrame(np.arange(8).reshape(2, 4),
columns=['A', 'B', 'C', 'D'])
df
df.drop(['B', 'C'], axis=1)
# the following is now equivalent
df.drop(columns=['B', 'C'])
.. _whatsnew_0210.enhancements.rename_reindex_axis:
Methods ``rename``, ``reindex`` now also accept axis keyword
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`DataFrame.rename` and :meth:`DataFrame.reindex` methods have gained
the ``axis`` keyword to specify the axis to target with the operation
(:issue:`12392`).
Here's ``rename``:
.. ipython:: python
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(str.lower, axis='columns')
df.rename(id, axis='index')
And ``reindex``:
.. ipython:: python
df.reindex(['A', 'B', 'C'], axis='columns')
df.reindex([0, 1, 3], axis='index')
The "index, columns" style continues to work as before.
.. ipython:: python
df.rename(index=id, columns=str.lower)
df.reindex(index=[0, 1, 3], columns=['A', 'B', 'C'])
We *highly* encourage using named arguments to avoid confusion when using either
style.
.. _whatsnew_0210.enhancements.categorical_dtype:
``CategoricalDtype`` for specifying categoricals
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:class:`pandas.api.types.CategoricalDtype` has been added to the public API and
expanded to include the ``categories`` and ``ordered`` attributes. A
``CategoricalDtype`` can be used to specify the set of categories and
orderedness of an array, independent of the data. This can be useful for example,
when converting string data to a ``Categorical`` (:issue:`14711`,
:issue:`15078`, :issue:`16015`, :issue:`17643`):
.. ipython:: python
from pandas.api.types import CategoricalDtype
s = pd.Series(['a', 'b', 'c', 'a']) # strings
dtype = CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)
s.astype(dtype)
One place that deserves special mention is in :meth:`read_csv`. Previously, with
``dtype={'col': 'category'}``, the returned values and categories would always
be strings.
.. ipython:: python
:suppress:
from io import StringIO
.. ipython:: python
data = 'A,B\na,1\nb,2\nc,3'
pd.read_csv(StringIO(data), dtype={'B': 'category'}).B.cat.categories
Notice the "object" dtype.
With a ``CategoricalDtype`` of all numerics, datetimes, or
timedeltas, we can automatically convert to the correct type
.. ipython:: python
dtype = {'B': CategoricalDtype([1, 2, 3])}
pd.read_csv(StringIO(data), dtype=dtype).B.cat.categories
The values have been correctly interpreted as integers.
The ``.dtype`` property of a ``Categorical``, ``CategoricalIndex`` or a
``Series`` with categorical type will now return an instance of
``CategoricalDtype``. While the repr has changed, ``str(CategoricalDtype())`` is
still the string ``'category'``. We'll take this moment to remind users that the
*preferred* way to detect categorical data is to use
:func:`pandas.api.types.is_categorical_dtype`, and not ``str(dtype) == 'category'``.
See the :ref:`CategoricalDtype docs ` for more.
.. _whatsnew_0210.enhancements.GroupBy_pipe:
``GroupBy`` objects now have a ``pipe`` method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``GroupBy`` objects now have a ``pipe`` method, similar to the one on
``DataFrame`` and ``Series``, that allow for functions that take a
``GroupBy`` to be composed in a clean, readable syntax. (:issue:`17871`)
For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine having a
DataFrame with columns for stores, products, revenue and sold quantity. We'd like to
do a groupwise calculation of *prices* (i.e. revenue/quantity) per store and per product.
We could do this in a multi-step operation, but expressing it in terms of piping can make the
code more readable.
First we set the data:
.. ipython:: python
import numpy as np
n = 1000
df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
'Product': np.random.choice(['Product_1',
'Product_2',
'Product_3'
], n),
'Revenue': (np.random.random(n) * 50 + 10).round(2),
'Quantity': np.random.randint(1, 10, size=n)})
df.head(2)
Now, to find prices per store/product, we can simply do:
.. ipython:: python
(df.groupby(['Store', 'Product'])
.pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum())
.unstack().round(2))
See the :ref:`documentation ` for more.
.. _whatsnew_0210.enhancements.rename_categories:
``Categorical.rename_categories`` accepts a dict-like
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:meth:`~Series.cat.rename_categories` now accepts a dict-like argument for
``new_categories``. The previous categories are looked up in the dictionary's
keys and replaced if found. The behavior of missing and extra keys is the same
as in :meth:`DataFrame.rename`.
.. ipython:: python
c = pd.Categorical(['a', 'a', 'b'])
c.rename_categories({"a": "eh", "b": "bee"})
.. warning::
To assist with upgrading pandas, ``rename_categories`` treats ``Series`` as
list-like. Typically, Series are considered to be dict-like (e.g. in
``.rename``, ``.map``). In a future version of pandas ``rename_categories``
will change to treat them as dict-like. Follow the warning message's
recommendations for writing future-proof code.
.. code-block:: ipython
In [33]: c.rename_categories(pd.Series([0, 1], index=['a', 'c']))
FutureWarning: Treating Series 'new_categories' as a list-like and using the values.
In a future version, 'rename_categories' will treat Series like a dictionary.
For dict-like, use 'new_categories.to_dict()'
For list-like, use 'new_categories.values'.
Out[33]:
[0, 0, 1]
Categories (2, int64): [0, 1]
.. _whatsnew_0210.enhancements.other:
Other enhancements
^^^^^^^^^^^^^^^^^^
New functions or methods
""""""""""""""""""""""""
- :meth:`.Resampler.nearest` is added to support nearest-neighbor upsampling (:issue:`17496`).
- :class:`~pandas.Index` has added support for a ``to_frame`` method (:issue:`15230`).
New keywords
""""""""""""
- Added a ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to
support type inference in the presence of missing values (:issue:`17059`).
- :func:`Series.to_dict` and :func:`DataFrame.to_dict` now support an ``into`` keyword which allows you to specify the ``collections.Mapping`` subclass that you would like returned. The default is ``dict``, which is backwards compatible. (:issue:`16122`)
- :func:`Series.set_axis` and :func:`DataFrame.set_axis` now support the ``inplace`` parameter. (:issue:`14636`)
- :func:`Series.to_pickle` and :func:`DataFrame.to_pickle` have gained a ``protocol`` parameter (:issue:`16252`). By default, this parameter is set to `HIGHEST_PROTOCOL `__
- :func:`read_feather` has gained the ``nthreads`` parameter for multi-threaded operations (:issue:`16359`)
- :func:`DataFrame.clip()` and :func:`Series.clip()` have gained an ``inplace`` argument. (:issue:`15388`)
- :func:`crosstab` has gained a ``margins_name`` parameter to define the name of the row / column that will contain the totals when ``margins=True``. (:issue:`15972`)
- :func:`read_json` now accepts a ``chunksize`` parameter that can be used when ``lines=True``. If ``chunksize`` is passed, read_json now returns an iterator which reads in ``chunksize`` lines with each iteration. (:issue:`17048`)
- :func:`read_json` and :func:`~DataFrame.to_json` now accept a ``compression`` argument which allows them to transparently handle compressed files. (:issue:`17798`)
Various enhancements
""""""""""""""""""""
- Improved the import time of pandas by about 2.25x. (:issue:`16764`)
- Support for `PEP 519 -- Adding a file system path protocol
`_ on most readers (e.g.
:func:`read_csv`) and writers (e.g. :meth:`DataFrame.to_csv`) (:issue:`13823`).
- Added a ``__fspath__`` method to ``pd.HDFStore``, ``pd.ExcelFile``,
and ``pd.ExcelWriter`` to work properly with the file system path protocol (:issue:`13823`).
- The ``validate`` argument for :func:`merge` now checks whether a merge is one-to-one, one-to-many, many-to-one, or many-to-many. If a merge is found to not be an example of specified merge type, an exception of type ``MergeError`` will be raised. For more, see :ref:`here ` (:issue:`16270`)
- Added support for `PEP 518 `_ (``pyproject.toml``) to the build system (:issue:`16745`)
- :func:`RangeIndex.append` now returns a ``RangeIndex`` object when possible (:issue:`16212`)
- :func:`Series.rename_axis` and :func:`DataFrame.rename_axis` with ``inplace=True`` now return ``None`` while renaming the axis inplace. (:issue:`15704`)
- :func:`api.types.infer_dtype` now infers decimals. (:issue:`15690`)
- :func:`DataFrame.select_dtypes` now accepts scalar values for include/exclude as well as list-like. (:issue:`16855`)
- :func:`date_range` now accepts 'YS' in addition to 'AS' as an alias for start of year. (:issue:`9313`)
- :func:`date_range` now accepts 'Y' in addition to 'A' as an alias for end of year. (:issue:`9313`)
- :func:`DataFrame.add_prefix` and :func:`DataFrame.add_suffix` now accept strings containing the '%' character. (:issue:`17151`)
- Read/write methods that infer compression (:func:`read_csv`, :func:`read_table`, :func:`read_pickle`, and :meth:`~DataFrame.to_pickle`) can now infer from path-like objects, such as ``pathlib.Path``. (:issue:`17206`)
- :func:`read_sas` now recognizes much more of the most frequently used date (datetime) formats in SAS7BDAT files. (:issue:`15871`)
- :func:`DataFrame.items` and :func:`Series.items` are now present in both Python 2 and 3 and is lazy in all cases. (:issue:`13918`, :issue:`17213`)
- :meth:`pandas.io.formats.style.Styler.where` has been implemented as a convenience for :meth:`pandas.io.formats.style.Styler.applymap`. (:issue:`17474`)
- :func:`MultiIndex.is_monotonic_decreasing` has been implemented. Previously returned ``False`` in all cases. (:issue:`16554`)
- :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`)
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names. (:issue:`14207`)
- :func:`Series.reindex`, :func:`DataFrame.reindex`, :func:`Index.get_indexer` now support list-like argument for ``tolerance``. (:issue:`17367`)
.. _whatsnew_0210.api_breaking:
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. _whatsnew_0210.api_breaking.deps:
Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We have updated our minimum supported versions of dependencies (:issue:`15206`, :issue:`15543`, :issue:`15214`).
If installed, we now require:
+--------------+-----------------+----------+
| Package | Minimum Version | Required |
+==============+=================+==========+
| Numpy | 1.9.0 | X |
+--------------+-----------------+----------+
| Matplotlib | 1.4.3 | |
+--------------+-----------------+----------+
| Scipy | 0.14.0 | |
+--------------+-----------------+----------+
| Bottleneck | 1.0.0 | |
+--------------+-----------------+----------+
Additionally, support has been dropped for Python 3.4 (:issue:`15251`).
.. _whatsnew_0210.api_breaking.bottleneck:
Sum/prod of all-NaN or empty Series/DataFrames is now consistently NaN
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. note::
The changes described here have been partially reverted. See
the :ref:`v0.22.0 Whatsnew ` for more.
The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames no longer depends on
whether `bottleneck `__ is installed, and return value of ``sum`` and ``prod`` on an empty Series has changed (:issue:`9422`, :issue:`15507`).
Calling ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of a ``DataFrame``, will result in ``NaN``. See the :ref:`docs `.
.. ipython:: python
s = pd.Series([np.nan])
Previously WITHOUT ``bottleneck`` installed:
.. code-block:: ipython
In [2]: s.sum()
Out[2]: np.nan
Previously WITH ``bottleneck``:
.. code-block:: ipython
In [2]: s.sum()
Out[2]: 0.0
New behavior, without regard to the bottleneck installation:
.. ipython:: python
s.sum()
Note that this also changes the sum of an empty ``Series``. Previously this always returned 0 regardless of a ``bottleneck`` installation:
.. code-block:: ipython
In [1]: pd.Series([]).sum()
Out[1]: 0
but for consistency with the all-NaN case, this was changed to return 0 as well:
.. code-block:: ipython
In [2]: pd.Series([]).sum()
Out[2]: 0
.. _whatsnew_0210.api_breaking.loc:
Indexing with a list with missing labels is deprecated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning ``NaN`` for missing labels.
This will now show a ``FutureWarning``. In the future this will raise a ``KeyError`` (:issue:`15747`).
This warning will trigger on a ``DataFrame`` or a ``Series`` for using ``.loc[]`` or ``[[]]`` when passing a list-of-labels with at least 1 missing label.
.. ipython:: python
s = pd.Series([1, 2, 3])
s
Previous behavior
.. code-block:: ipython
In [4]: s.loc[[1, 2, 3]]
Out[4]:
1 2.0
2 3.0
3 NaN
dtype: float64
Current behavior
.. code-block:: ipython
In [4]: s.loc[[1, 2, 3]]
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
Out[4]:
1 2.0
2 3.0
3 NaN
dtype: float64
The idiomatic way to achieve selecting potentially not-found elements is via ``.reindex()``
.. ipython:: python
s.reindex([1, 2, 3])
Selection with all keys found is unchanged.
.. ipython:: python
s.loc[[1, 2]]
.. _whatsnew_0210.api.na_changes:
NA naming changes
^^^^^^^^^^^^^^^^^
In order to promote more consistency among the pandas API, we have added additional top-level
functions :func:`isna` and :func:`notna` that are aliases for :func:`isnull` and :func:`notnull`.
The naming scheme is now more consistent with methods like ``.dropna()`` and ``.fillna()``. Furthermore
in all cases where ``.isnull()`` and ``.notnull()`` methods are defined, these have additional methods
named ``.isna()`` and ``.notna()``, these are included for classes ``Categorical``,
``Index``, ``Series``, and ``DataFrame``. (:issue:`15001`).
The configuration option ``pd.options.mode.use_inf_as_null`` is deprecated, and ``pd.options.mode.use_inf_as_na`` is added as a replacement.
.. _whatsnew_0210.api_breaking.iteration_scalars:
Iteration of Series/Index will now return Python scalars
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, when using certain iteration methods for a ``Series`` with dtype ``int`` or ``float``, you would receive a ``numpy`` scalar, e.g. a ``np.int64``, rather than a Python ``int``. Issue (:issue:`10904`) corrected this for ``Series.tolist()`` and ``list(Series)``. This change makes all iteration methods consistent, in particular, for ``__iter__()`` and ``.map()``; note that this only affects int/float dtypes. (:issue:`13236`, :issue:`13258`, :issue:`14216`).
.. ipython:: python
s = pd.Series([1, 2, 3])
s
Previously:
.. code-block:: ipython
In [2]: type(list(s)[0])
Out[2]: numpy.int64
New behavior:
.. ipython:: python
type(list(s)[0])
Furthermore this will now correctly box the results of iteration for :func:`DataFrame.to_dict` as well.
.. ipython:: python
d = {'a': [1], 'b': ['b']}
df = pd.DataFrame(d)
Previously:
.. code-block:: ipython
In [8]: type(df.to_dict()['a'][0])
Out[8]: numpy.int64
New behavior:
.. ipython:: python
type(df.to_dict()['a'][0])
.. _whatsnew_0210.api_breaking.loc_with_index:
Indexing with a Boolean Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously when passing a boolean ``Index`` to ``.loc``, if the index of the ``Series/DataFrame`` had ``boolean`` labels,
you would get a label based selection, potentially duplicating result labels, rather than a boolean indexing selection
(where ``True`` selects elements), this was inconsistent how a boolean numpy array indexed. The new behavior is to
act like a boolean numpy array indexer. (:issue:`17738`)
Previous behavior:
.. ipython:: python
s = pd.Series([1, 2, 3], index=[False, True, False])
s
.. code-block:: ipython
In [59]: s.loc[pd.Index([True, False, True])]
Out[59]:
True 2
False 1
False 3
True 2
dtype: int64
Current behavior
.. ipython:: python
s.loc[pd.Index([True, False, True])]
Furthermore, previously if you had an index that was non-numeric (e.g. strings), then a boolean Index would raise a ``KeyError``.
This will now be treated as a boolean indexer.
Previously behavior:
.. ipython:: python
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s
.. code-block:: ipython
In [39]: s.loc[pd.Index([True, False, True])]
KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]"
Current behavior
.. ipython:: python
s.loc[pd.Index([True, False, True])]
.. _whatsnew_0210.api_breaking.period_index_resampling:
``PeriodIndex`` resampling
^^^^^^^^^^^^^^^^^^^^^^^^^^
In previous versions of pandas, resampling a ``Series``/``DataFrame`` indexed by a ``PeriodIndex`` returned a ``DatetimeIndex`` in some cases (:issue:`12884`). Resampling to a multiplied frequency now returns a ``PeriodIndex`` (:issue:`15944`). As a minor enhancement, resampling a ``PeriodIndex`` can now handle ``NaT`` values (:issue:`13224`)
Previous behavior:
.. code-block:: ipython
In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')
In [2]: s = pd.Series(np.arange(12), index=pi)
In [3]: resampled = s.resample('2Q').mean()
In [4]: resampled
Out[4]:
2017-03-31 1.0
2017-09-30 5.5
2018-03-31 10.0
Freq: 2Q-DEC, dtype: float64
In [5]: resampled.index
Out[5]: DatetimeIndex(['2017-03-31', '2017-09-30', '2018-03-31'], dtype='datetime64[ns]', freq='2Q-DEC')
New behavior:
.. code-block:: ipython
In [1]: pi = pd.period_range('2017-01', periods=12, freq='M')
In [2]: s = pd.Series(np.arange(12), index=pi)
In [3]: resampled = s.resample('2Q').mean()
In [4]: resampled
Out[4]:
2017Q1 2.5
2017Q3 8.5
Freq: 2Q-DEC, dtype: float64
In [5]: resampled.index
Out[5]: PeriodIndex(['2017Q1', '2017Q3'], dtype='period[2Q-DEC]')
Upsampling and calling ``.ohlc()`` previously returned a ``Series``, basically identical to calling ``.asfreq()``. OHLC upsampling now returns a DataFrame with columns ``open``, ``high``, ``low`` and ``close`` (:issue:`13083`). This is consistent with downsampling and ``DatetimeIndex`` behavior.
Previous behavior:
.. code-block:: ipython
In [1]: pi = pd.period_range(start='2000-01-01', freq='D', periods=10)
In [2]: s = pd.Series(np.arange(10), index=pi)
In [3]: s.resample('H').ohlc()
Out[3]:
2000-01-01 00:00 0.0
...
2000-01-10 23:00 NaN
Freq: H, Length: 240, dtype: float64
In [4]: s.resample('M').ohlc()
Out[4]:
open high low close
2000-01 0 9 0 9
New behavior:
.. code-block:: ipython
In [56]: pi = pd.period_range(start='2000-01-01', freq='D', periods=10)
In [57]: s = pd.Series(np.arange(10), index=pi)
In [58]: s.resample('H').ohlc()
Out[58]:
open high low close
2000-01-01 00:00 0.0 0.0 0.0 0.0
2000-01-01 01:00 NaN NaN NaN NaN
2000-01-01 02:00 NaN NaN NaN NaN
2000-01-01 03:00 NaN NaN NaN NaN
2000-01-01 04:00 NaN NaN NaN NaN
... ... ... ... ...
2000-01-10 19:00 NaN NaN NaN NaN
2000-01-10 20:00 NaN NaN NaN NaN
2000-01-10 21:00 NaN NaN NaN NaN
2000-01-10 22:00 NaN NaN NaN NaN
2000-01-10 23:00 NaN NaN NaN NaN
[240 rows x 4 columns]
In [59]: s.resample('M').ohlc()
Out[59]:
open high low close
2000-01 0 9 0 9
[1 rows x 4 columns]
.. _whatsnew_0210.api_breaking.pandas_eval:
Improved error handling during item assignment in pd.eval
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:func:`eval` will now raise a ``ValueError`` when item assignment malfunctions, or
inplace operations are specified, but there is no item assignment in the expression (:issue:`16732`)
.. ipython:: python
arr = np.array([1, 2, 3])
Previously, if you attempted the following expression, you would get a not very helpful error message:
.. code-block:: ipython
In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`)
and integer or boolean arrays are valid indices
This is a very long way of saying numpy arrays don't support string-item indexing. With this
change, the error message is now this:
.. code-block:: python
In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True)
...
ValueError: Cannot assign expression output to target
It also used to be possible to evaluate expressions inplace, even if there was no item assignment:
.. code-block:: ipython
In [4]: pd.eval("1 + 2", target=arr, inplace=True)
Out[4]: 3
However, this input does not make much sense because the output is not being assigned to
the target. Now, a ``ValueError`` will be raised when such an input is passed in:
.. code-block:: ipython
In [4]: pd.eval("1 + 2", target=arr, inplace=True)
...
ValueError: Cannot operate inplace if there is no assignment
.. _whatsnew_0210.api_breaking.dtype_conversions:
Dtype conversions
^^^^^^^^^^^^^^^^^
Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignment, would coerce to same the type (e.g. int / float), or raise for datetimelikes. These will now preserve the bools with ``object`` dtypes. (:issue:`16821`).
.. ipython:: python
s = pd.Series([1, 2, 3])
.. code-block:: python
In [5]: s[1] = True
In [6]: s
Out[6]:
0 1
1 1
2 3
dtype: int64
New behavior
.. code-block:: ipython
In [7]: s[1] = True
In [8]: s
Out[8]:
0 1
1 True
2 3
Length: 3, dtype: object
Previously, as assignment to a datetimelike with a non-datetimelike would coerce the
non-datetime-like item being assigned (:issue:`14145`).
.. ipython:: python
s = pd.Series([pd.Timestamp('2011-01-01'), pd.Timestamp('2012-01-01')])
.. code-block:: python
In [1]: s[1] = 1
In [2]: s
Out[2]:
0 2011-01-01 00:00:00.000000000
1 1970-01-01 00:00:00.000000001
dtype: datetime64[ns]
These now coerce to ``object`` dtype.
.. code-block:: python
In [1]: s[1] = 1
In [2]: s
Out[2]:
0 2011-01-01 00:00:00
1 1
dtype: object
- Inconsistent behavior in ``.where()`` with datetimelikes which would raise rather than coerce to ``object`` (:issue:`16402`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
.. _whatsnew_210.api.multiindex_single:
MultiIndex constructor with a single level
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``MultiIndex`` constructors no longer squeezes a MultiIndex with all
length-one levels down to a regular ``Index``. This affects all the
``MultiIndex`` constructors. (:issue:`17178`)
Previous behavior:
.. code-block:: ipython
In [2]: pd.MultiIndex.from_tuples([('a',), ('b',)])
Out[2]: Index(['a', 'b'], dtype='object')
Length 1 levels are no longer special-cased. They behave exactly as if you had
length 2+ levels, so a :class:`MultiIndex` is always returned from all of the
``MultiIndex`` constructors:
.. ipython:: python
pd.MultiIndex.from_tuples([('a',), ('b',)])
.. _whatsnew_0210.api.utc_localization_with_series:
UTC localization with Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, :func:`to_datetime` did not localize datetime ``Series`` data when ``utc=True`` was passed. Now, :func:`to_datetime` will correctly localize ``Series`` with a ``datetime64[ns, UTC]`` dtype to be consistent with how list-like and ``Index`` data are handled. (:issue:`6415`).
Previous behavior
.. ipython:: python
s = pd.Series(['20130101 00:00:00'] * 3)
.. code-block:: ipython
In [12]: pd.to_datetime(s, utc=True)
Out[12]:
0 2013-01-01
1 2013-01-01
2 2013-01-01
dtype: datetime64[ns]
New behavior
.. ipython:: python
pd.to_datetime(s, utc=True)
Additionally, DataFrames with datetime columns that were parsed by :func:`read_sql_table` and :func:`read_sql_query` will also be localized to UTC only if the original SQL columns were timezone aware datetime columns.
.. _whatsnew_0210.api.consistency_of_range_functions:
Consistency of range functions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In previous versions, there were some inconsistencies between the various range functions: :func:`date_range`, :func:`bdate_range`, :func:`period_range`, :func:`timedelta_range`, and :func:`interval_range`. (:issue:`17471`).
One of the inconsistent behaviors occurred when the ``start``, ``end`` and ``period`` parameters were all specified, potentially leading to ambiguous ranges. When all three parameters were passed, ``interval_range`` ignored the ``period`` parameter, ``period_range`` ignored the ``end`` parameter, and the other range functions raised. To promote consistency among the range functions, and avoid potentially ambiguous ranges, ``interval_range`` and ``period_range`` will now raise when all three parameters are passed.
Previous behavior:
.. code-block:: ipython
In [2]: pd.interval_range(start=0, end=4, periods=6)
Out[2]:
IntervalIndex([(0, 1], (1, 2], (2, 3]]
closed='right',
dtype='interval[int64]')
In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')
Out[3]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1', '2018Q2'], dtype='period[Q-DEC]', freq='Q-DEC')
New behavior:
.. code-block:: ipython
In [2]: pd.interval_range(start=0, end=4, periods=6)
---------------------------------------------------------------------------
ValueError: Of the three parameters: start, end, and periods, exactly two must be specified
In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q')
---------------------------------------------------------------------------
ValueError: Of the three parameters: start, end, and periods, exactly two must be specified
Additionally, the endpoint parameter ``end`` was not included in the intervals produced by ``interval_range``. However, all other range functions include ``end`` in their output. To promote consistency among the range functions, ``interval_range`` will now include ``end`` as the right endpoint of the final interval, except if ``freq`` is specified in a way which skips ``end``.
Previous behavior:
.. code-block:: ipython
In [4]: pd.interval_range(start=0, end=4)
Out[4]:
IntervalIndex([(0, 1], (1, 2], (2, 3]]
closed='right',
dtype='interval[int64]')
New behavior:
.. ipython:: python
pd.interval_range(start=0, end=4)
.. _whatsnew_0210.api.mpl_converters:
No automatic Matplotlib converters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pandas no longer registers our ``date``, ``time``, ``datetime``,
``datetime64``, and ``Period`` converters with matplotlib when pandas is
imported. Matplotlib plot methods (``plt.plot``, ``ax.plot``, ...), will not
nicely format the x-axis for ``DatetimeIndex`` or ``PeriodIndex`` values. You
must explicitly register these methods:
pandas built-in ``Series.plot`` and ``DataFrame.plot`` *will* register these
converters on first-use (:issue:`17710`).
.. note::
This change has been temporarily reverted in pandas 0.21.1,
for more details see :ref:`here `.
.. _whatsnew_0210.api:
Other API changes
^^^^^^^^^^^^^^^^^
- The Categorical constructor no longer accepts a scalar for the ``categories`` keyword. (:issue:`16022`)
- Accessing a non-existent attribute on a closed :class:`~pandas.HDFStore` will now
raise an ``AttributeError`` rather than a ``ClosedFileError`` (:issue:`16301`)
- :func:`read_csv` now issues a ``UserWarning`` if the ``names`` parameter contains duplicates (:issue:`17095`)
- :func:`read_csv` now treats ``'null'`` and ``'n/a'`` strings as missing values by default (:issue:`16471`, :issue:`16078`)
- :class:`pandas.HDFStore`'s string representation is now faster and less detailed. For the previous behavior, use ``pandas.HDFStore.info()``. (:issue:`16503`).
- Compression defaults in HDF stores now follow pytables standards. Default is no compression and if ``complib`` is missing and ``complevel`` > 0 ``zlib`` is used (:issue:`15943`)
- ``Index.get_indexer_non_unique()`` now returns a ndarray indexer rather than an ``Index``; this is consistent with ``Index.get_indexer()`` (:issue:`16819`)
- Removed the ``@slow`` decorator from ``pandas._testing``, which caused issues for some downstream packages' test suites. Use ``@pytest.mark.slow`` instead, which achieves the same thing (:issue:`16850`)
- Moved definition of ``MergeError`` to the ``pandas.errors`` module.
- The signature of :func:`Series.set_axis` and :func:`DataFrame.set_axis` has been changed from ``set_axis(axis, labels)`` to ``set_axis(labels, axis=0)``, for consistency with the rest of the API. The old signature is deprecated and will show a ``FutureWarning`` (:issue:`14636`)
- :func:`Series.argmin` and :func:`Series.argmax` will now raise a ``TypeError`` when used with ``object`` dtypes, instead of a ``ValueError`` (:issue:`13595`)
- :class:`Period` is now immutable, and will now raise an ``AttributeError`` when a user tries to assign a new value to the ``ordinal`` or ``freq`` attributes (:issue:`17116`).
- :func:`to_datetime` when passed a tz-aware ``origin=`` kwarg will now raise a more informative ``ValueError`` rather than a ``TypeError`` (:issue:`16842`)
- :func:`to_datetime` now raises a ``ValueError`` when format includes ``%W`` or ``%U`` without also including day of the week and calendar year (:issue:`16774`)
- Renamed non-functional ``index`` to ``index_col`` in :func:`read_stata` to improve API consistency (:issue:`16342`)
- Bug in :func:`DataFrame.drop` caused boolean labels ``False`` and ``True`` to be treated as labels 0 and 1 respectively when dropping indices from a numeric index. This will now raise a ValueError (:issue:`16877`)
- Restricted DateOffset keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`).
.. _whatsnew_0210.deprecations:
Deprecations
~~~~~~~~~~~~
- :meth:`DataFrame.from_csv` and :meth:`Series.from_csv` have been deprecated in favor of :func:`read_csv()` (:issue:`4191`)
- :func:`read_excel()` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with ``.to_excel()`` (:issue:`10559`).
- :func:`read_excel()` has deprecated ``parse_cols`` in favor of ``usecols`` for consistency with :func:`read_csv` (:issue:`4988`)
- :func:`read_csv()` has deprecated the ``tupleize_cols`` argument. Column tuples will always be converted to a ``MultiIndex`` (:issue:`17060`)
- :meth:`DataFrame.to_csv` has deprecated the ``tupleize_cols`` argument. MultiIndex columns will be always written as rows in the CSV file (:issue:`17060`)
- The ``convert`` parameter has been deprecated in the ``.take()`` method, as it was not being respected (:issue:`16948`)
- ``pd.options.html.border`` has been deprecated in favor of ``pd.options.display.html.border`` (:issue:`15793`).
- :func:`SeriesGroupBy.nth` has deprecated ``True`` in favor of ``'all'`` for its kwarg ``dropna`` (:issue:`11038`).
- :func:`DataFrame.as_blocks` is deprecated, as this is exposing the internal implementation (:issue:`17302`)
- ``pd.TimeGrouper`` is deprecated in favor of :class:`pandas.Grouper` (:issue:`16747`)
- ``cdate_range`` has been deprecated in favor of :func:`bdate_range`, which has gained ``weekmask`` and ``holidays`` parameters for building custom frequency date ranges. See the :ref:`documentation ` for more details (:issue:`17596`)
- passing ``categories`` or ``ordered`` kwargs to :func:`Series.astype` is deprecated, in favor of passing a :ref:`CategoricalDtype ` (:issue:`17636`)
- ``.get_value`` and ``.set_value`` on ``Series``, ``DataFrame``, ``Panel``, ``SparseSeries``, and ``SparseDataFrame`` are deprecated in favor of using ``.iat[]`` or ``.at[]`` accessors (:issue:`15269`)
- Passing a non-existent column in ``.to_excel(..., columns=)`` is deprecated and will raise a ``KeyError`` in the future (:issue:`17295`)
- ``raise_on_error`` parameter to :func:`Series.where`, :func:`Series.mask`, :func:`DataFrame.where`, :func:`DataFrame.mask` is deprecated, in favor of ``errors=`` (:issue:`14968`)
- Using :meth:`DataFrame.rename_axis` and :meth:`Series.rename_axis` to alter index or column *labels* is now deprecated in favor of using ``.rename``. ``rename_axis`` may still be used to alter the name of the index or columns (:issue:`17833`).
- :meth:`~DataFrame.reindex_axis` has been deprecated in favor of :meth:`~DataFrame.reindex`. See :ref:`here ` for more (:issue:`17833`).
.. _whatsnew_0210.deprecations.select:
Series.select and DataFrame.select
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The :meth:`Series.select` and :meth:`DataFrame.select` methods are deprecated in favor of using ``df.loc[labels.map(crit)]`` (:issue:`12401`)
.. ipython:: python
df = pd.DataFrame({'A': [1, 2, 3]}, index=['foo', 'bar', 'baz'])
.. code-block:: ipython
In [3]: df.select(lambda x: x in ['bar', 'baz'])
FutureWarning: select is deprecated and will be removed in a future release. You can use .loc[crit] as a replacement
Out[3]:
A
bar 2
baz 3
.. ipython:: python
df.loc[df.index.map(lambda x: x in ['bar', 'baz'])]
.. _whatsnew_0210.deprecations.argmin_min:
Series.argmax and Series.argmin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The behavior of :func:`Series.argmax` and :func:`Series.argmin` have been deprecated in favor of :func:`Series.idxmax` and :func:`Series.idxmin`, respectively (:issue:`16830`).
For compatibility with NumPy arrays, ``pd.Series`` implements ``argmax`` and
``argmin``. Since pandas 0.13.0, ``argmax`` has been an alias for
:meth:`pandas.Series.idxmax`, and ``argmin`` has been an alias for
:meth:`pandas.Series.idxmin`. They return the *label* of the maximum or minimum,
rather than the *position*.
We've deprecated the current behavior of ``Series.argmax`` and
``Series.argmin``. Using either of these will emit a ``FutureWarning``. Use
:meth:`Series.idxmax` if you want the label of the maximum. Use
``Series.values.argmax()`` if you want the position of the maximum. Likewise for
the minimum. In a future release ``Series.argmax`` and ``Series.argmin`` will
return the position of the maximum or minimum.
.. _whatsnew_0210.prior_deprecations:
Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- :func:`read_excel()` has dropped the ``has_index_names`` parameter (:issue:`10967`)
- The ``pd.options.display.height`` configuration has been dropped (:issue:`3663`)
- The ``pd.options.display.line_width`` configuration has been dropped (:issue:`2881`)
- The ``pd.options.display.mpl_style`` configuration has been dropped (:issue:`12190`)
- ``Index`` has dropped the ``.sym_diff()`` method in favor of ``.symmetric_difference()`` (:issue:`12591`)
- ``Categorical`` has dropped the ``.order()`` and ``.sort()`` methods in favor of ``.sort_values()`` (:issue:`12882`)
- :func:`eval` and :func:`DataFrame.eval` have changed the default of ``inplace`` from ``None`` to ``False`` (:issue:`11149`)
- The function ``get_offset_name`` has been dropped in favor of the ``.freqstr`` attribute for an offset (:issue:`11834`)
- pandas no longer tests for compatibility with hdf5-files created with pandas < 0.11 (:issue:`17404`).
.. _whatsnew_0210.performance:
Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Improved performance of instantiating :class:`SparseDataFrame` (:issue:`16773`)
- :attr:`Series.dt` no longer performs frequency inference, yielding a large speedup when accessing the attribute (:issue:`17210`)
- Improved performance of :meth:`~Series.cat.set_categories` by not materializing the values (:issue:`17508`)
- :attr:`Timestamp.microsecond` no longer re-computes on attribute access (:issue:`17331`)
- Improved performance of the :class:`CategoricalIndex` for data that is already categorical dtype (:issue:`17513`)
- Improved performance of :meth:`RangeIndex.min` and :meth:`RangeIndex.max` by using ``RangeIndex`` properties to perform the computations (:issue:`17607`)
.. _whatsnew_0210.docs:
Documentation changes
~~~~~~~~~~~~~~~~~~~~~
- Several ``NaT`` method docstrings (e.g. :func:`NaT.ctime`) were incorrect (:issue:`17327`)
- The documentation has had references to versions < v0.17 removed and cleaned up (:issue:`17442`, :issue:`17442`, :issue:`17404` & :issue:`17504`)
.. _whatsnew_0210.bug_fixes:
Bug fixes
~~~~~~~~~
Conversion
^^^^^^^^^^
- Bug in assignment against datetime-like data with ``int`` may incorrectly convert to datetime-like (:issue:`14145`)
- Bug in assignment against ``int64`` data with ``np.ndarray`` with ``float64`` dtype may keep ``int64`` dtype (:issue:`14001`)
- Fixed the return type of ``IntervalIndex.is_non_overlapping_monotonic`` to be a Python ``bool`` for consistency with similar attributes/methods. Previously returned a ``numpy.bool_``. (:issue:`17237`)
- Bug in ``IntervalIndex.is_non_overlapping_monotonic`` when intervals are closed on both sides and overlap at a point (:issue:`16560`)
- Bug in :func:`Series.fillna` returns frame when ``inplace=True`` and ``value`` is dict (:issue:`16156`)
- Bug in :attr:`Timestamp.weekday_name` returning a UTC-based weekday name when localized to a timezone (:issue:`17354`)
- Bug in ``Timestamp.replace`` when replacing ``tzinfo`` around DST changes (:issue:`15683`)
- Bug in ``Timedelta`` construction and arithmetic that would not propagate the ``Overflow`` exception (:issue:`17367`)
- Bug in :meth:`~DataFrame.astype` converting to object dtype when passed extension type classes (``DatetimeTZDtype``, ``CategoricalDtype``) rather than instances. Now a ``TypeError`` is raised when a class is passed (:issue:`17780`).
- Bug in :meth:`to_numeric` in which elements were not always being coerced to numeric when ``errors='coerce'`` (:issue:`17007`, :issue:`17125`)
- Bug in ``DataFrame`` and ``Series`` constructors where ``range`` objects are converted to ``int32`` dtype on Windows instead of ``int64`` (:issue:`16804`)
Indexing
^^^^^^^^
- When called with a null slice (e.g. ``df.iloc[:]``), the ``.iloc`` and ``.loc`` indexers return a shallow copy of the original object. Previously they returned the original object. (:issue:`13873`).
- When called on an unsorted ``MultiIndex``, the ``loc`` indexer now will raise ``UnsortedIndexError`` only if proper slicing is used on non-sorted levels (:issue:`16734`).
- Fixes regression in 0.20.3 when indexing with a string on a ``TimedeltaIndex`` (:issue:`16896`).
- Fixed :func:`TimedeltaIndex.get_loc` handling of ``np.timedelta64`` inputs (:issue:`16909`).
- Fix :func:`MultiIndex.sort_index` ordering when ``ascending`` argument is a list, but not all levels are specified, or are in a different order (:issue:`16934`).
- Fixes bug where indexing with ``np.inf`` caused an ``OverflowError`` to be raised (:issue:`16957`)
- Bug in reindexing on an empty ``CategoricalIndex`` (:issue:`16770`)
- Fixes ``DataFrame.loc`` for setting with alignment and tz-aware ``DatetimeIndex`` (:issue:`16889`)
- Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`)
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
- Bug in ``.iloc`` when used with inplace addition or assignment and an int indexer on a ``MultiIndex`` causing the wrong indexes to be read from and written to (:issue:`17148`)
- Bug in ``.isin()`` in which checking membership in empty ``Series`` objects raised an error (:issue:`16991`)
- Bug in ``CategoricalIndex`` reindexing in which specified indices containing duplicates were not being respected (:issue:`17323`)
- Bug in intersection of ``RangeIndex`` with negative step (:issue:`17296`)
- Bug in ``IntervalIndex`` where performing a scalar lookup fails for included right endpoints of non-overlapping monotonic decreasing indexes (:issue:`16417`, :issue:`17271`)
- Bug in :meth:`DataFrame.first_valid_index` and :meth:`DataFrame.last_valid_index` when no valid entry (:issue:`17400`)
- Bug in :func:`Series.rename` when called with a callable, incorrectly alters the name of the ``Series``, rather than the name of the ``Index``. (:issue:`17407`)
- Bug in :func:`String.str_get` raises ``IndexError`` instead of inserting NaNs when using a negative index. (:issue:`17704`)
IO
^^
- Bug in :func:`read_hdf` when reading a timezone aware index from ``fixed`` format HDFStore (:issue:`17618`)
- Bug in :func:`read_csv` in which columns were not being thoroughly de-duplicated (:issue:`17060`)
- Bug in :func:`read_csv` in which specified column names were not being thoroughly de-duplicated (:issue:`17095`)
- Bug in :func:`read_csv` in which non integer values for the header argument generated an unhelpful / unrelated error message (:issue:`16338`)
- Bug in :func:`read_csv` in which memory management issues in exception handling, under certain conditions, would cause the interpreter to segfault (:issue:`14696`, :issue:`16798`).
- Bug in :func:`read_csv` when called with ``low_memory=False`` in which a CSV with at least one column > 2GB in size would incorrectly raise a ``MemoryError`` (:issue:`16798`).
- Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`)
- Bug in :meth:`DataFrame.to_csv` defaulting to 'ascii' encoding in Python 3, instead of 'utf-8' (:issue:`17097`)
- Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`)
- Bug in :func:`read_stata` where the index was not set (:issue:`16342`)
- Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`)
- Bug in :func:`read_csv` where automatic delimiter detection caused a ``TypeError`` to be thrown when a bad line was encountered rather than the correct error message (:issue:`13374`)
- Bug in :meth:`DataFrame.to_html` with ``notebook=True`` where DataFrames with named indices or non-MultiIndex indices had undesired horizontal or vertical alignment for column or row labels, respectively (:issue:`16792`)
- Bug in :meth:`DataFrame.to_html` in which there was no validation of the ``justify`` parameter (:issue:`17527`)
- Bug in :func:`HDFStore.select` when reading a contiguous mixed-data table featuring VLArray (:issue:`17021`)
- Bug in :func:`to_json` where several conditions (including objects with unprintable symbols, objects with deep recursion, overlong labels) caused segfaults instead of raising the appropriate exception (:issue:`14256`)
Plotting
^^^^^^^^
- Bug in plotting methods using ``secondary_y`` and ``fontsize`` not setting secondary axis font size (:issue:`12565`)
- Bug when plotting ``timedelta`` and ``datetime`` dtypes on y-axis (:issue:`16953`)
- Line plots no longer assume monotonic x data when calculating xlims, they show the entire lines now even for unsorted x data. (:issue:`11310`, :issue:`11471`)
- With matplotlib 2.0.0 and above, calculation of x limits for line plots is left to matplotlib, so that its new default settings are applied. (:issue:`15495`)
- Bug in ``Series.plot.bar`` or ``DataFrame.plot.bar`` with ``y`` not respecting user-passed ``color`` (:issue:`16822`)
- Bug causing ``plotting.parallel_coordinates`` to reset the random seed when using random colors (:issue:`17525`)
GroupBy/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in ``DataFrame.resample(...).size()`` where an empty ``DataFrame`` did not return a ``Series`` (:issue:`14962`)
- Bug in :func:`infer_freq` causing indices with 2-day gaps during the working week to be wrongly inferred as business daily (:issue:`16624`)
- Bug in ``.rolling(...).quantile()`` which incorrectly used different defaults than :func:`Series.quantile()` and :func:`DataFrame.quantile()` (:issue:`9413`, :issue:`16211`)
- Bug in ``groupby.transform()`` that would coerce boolean dtypes back to float (:issue:`16875`)
- Bug in ``Series.resample(...).apply()`` where an empty ``Series`` modified the source index and did not return the name of a ``Series`` (:issue:`14313`)
- Bug in ``.rolling(...).apply(...)`` with a ``DataFrame`` with a ``DatetimeIndex``, a ``window`` of a timedelta-convertible and ``min_periods >= 1`` (:issue:`15305`)
- Bug in ``DataFrame.groupby`` where index and column keys were not recognized correctly when the number of keys equaled the number of elements on the groupby axis (:issue:`16859`)
- Bug in ``groupby.nunique()`` with ``TimeGrouper`` which cannot handle ``NaT`` correctly (:issue:`17575`)
- Bug in ``DataFrame.groupby`` where a single level selection from a ``MultiIndex`` unexpectedly sorts (:issue:`17537`)
- Bug in ``DataFrame.groupby`` where spurious warning is raised when ``Grouper`` object is used to override ambiguous column name (:issue:`17383`)
- Bug in ``TimeGrouper`` differs when passes as a list and as a scalar (:issue:`17530`)
Sparse
^^^^^^
- Bug in ``SparseSeries`` raises ``AttributeError`` when a dictionary is passed in as data (:issue:`16905`)
- Bug in :func:`SparseDataFrame.fillna` not filling all NaNs when frame was instantiated from SciPy sparse matrix (:issue:`16112`)
- Bug in :func:`SparseSeries.unstack` and :func:`SparseDataFrame.stack` (:issue:`16614`, :issue:`15045`)
- Bug in :func:`make_sparse` treating two numeric/boolean data, which have same bits, as same when array ``dtype`` is ``object`` (:issue:`17574`)
- :func:`SparseArray.all` and :func:`SparseArray.any` are now implemented to handle ``SparseArray``, these were used but not implemented (:issue:`17570`)
Reshaping
^^^^^^^^^
- Joining/Merging with a non unique ``PeriodIndex`` raised a ``TypeError`` (:issue:`16871`)
- Bug in :func:`crosstab` where non-aligned series of integers were casted to float (:issue:`17005`)
- Bug in merging with categorical dtypes with datetimelikes incorrectly raised a ``TypeError`` (:issue:`16900`)
- Bug when using :func:`isin` on a large object series and large comparison array (:issue:`16012`)
- Fixes regression from 0.20, :func:`Series.aggregate` and :func:`DataFrame.aggregate` allow dictionaries as return values again (:issue:`16741`)
- Fixes dtype of result with integer dtype input, from :func:`pivot_table` when called with ``margins=True`` (:issue:`17013`)
- Bug in :func:`crosstab` where passing two ``Series`` with the same name raised a ``KeyError`` (:issue:`13279`)
- :func:`Series.argmin`, :func:`Series.argmax`, and their counterparts on ``DataFrame`` and groupby objects work correctly with floating point data that contains infinite values (:issue:`13595`).
- Bug in :func:`unique` where checking a tuple of strings raised a ``TypeError`` (:issue:`17108`)
- Bug in :func:`concat` where order of result index was unpredictable if it contained non-comparable elements (:issue:`17344`)
- Fixes regression when sorting by multiple columns on a ``datetime64`` dtype ``Series`` with ``NaT`` values (:issue:`16836`)
- Bug in :func:`pivot_table` where the result's columns did not preserve the categorical dtype of ``columns`` when ``dropna`` was ``False`` (:issue:`17842`)
- Bug in ``DataFrame.drop_duplicates`` where dropping with non-unique column names raised a ``ValueError`` (:issue:`17836`)
- Bug in :func:`unstack` which, when called on a list of levels, would discard the ``fillna`` argument (:issue:`13971`)
- Bug in the alignment of ``range`` objects and other list-likes with ``DataFrame`` leading to operations being performed row-wise instead of column-wise (:issue:`17901`)
Numeric
^^^^^^^
- Bug in ``.clip()`` with ``axis=1`` and a list-like for ``threshold`` is passed; previously this raised ``ValueError`` (:issue:`15390`)
- :func:`Series.clip()` and :func:`DataFrame.clip()` now treat NA values for upper and lower arguments as ``None`` instead of raising ``ValueError`` (:issue:`17276`).
Categorical
^^^^^^^^^^^
- Bug in :func:`Series.isin` when called with a categorical (:issue:`16639`)
- Bug in the categorical constructor with empty values and categories causing the ``.categories`` to be an empty ``Float64Index`` rather than an empty ``Index`` with object dtype (:issue:`17248`)
- Bug in categorical operations with :ref:`Series.cat ` not preserving the original Series' name (:issue:`17509`)
- Bug in :func:`DataFrame.merge` failing for categorical columns with boolean/int data types (:issue:`17187`)
- Bug in constructing a ``Categorical``/``CategoricalDtype`` when the specified ``categories`` are of categorical type (:issue:`17884`).
.. _whatsnew_0210.pypy:
PyPy
^^^^
- Compatibility with PyPy in :func:`read_csv` with ``usecols=[]`` and
:func:`read_json` (:issue:`17351`)
- Split tests into cases for CPython and PyPy where needed, which highlights the fragility
of index matching with ``float('nan')``, ``np.nan`` and ``NAT`` (:issue:`17351`)
- Fix :func:`DataFrame.memory_usage` to support PyPy. Objects on PyPy do not have a fixed size,
so an approximation is used instead (:issue:`17228`)
Other
^^^^^
- Bug where some inplace operators were not being wrapped and produced a copy when invoked (:issue:`12962`)
- Bug in :func:`eval` where the ``inplace`` parameter was being incorrectly handled (:issue:`16732`)
.. _whatsnew_0.21.0.contributors:
Contributors
~~~~~~~~~~~~
.. contributors:: v0.20.3..v0.21.0