.. _whatsnew_0150: Version 0.15.0 (October 18, 2014) --------------------------------- {{ header }} This is a major release from 0.14.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. .. warning:: pandas >= 0.15.0 will no longer support compatibility with NumPy versions < 1.7.0. If you want to use the latest versions of pandas, please upgrade to NumPy >= 1.7.0 (:issue:`7711`) - Highlights include: - The ``Categorical`` type was integrated as a first-class pandas type, see :ref:`here ` - New scalar type ``Timedelta``, and a new index type ``TimedeltaIndex``, see :ref:`here ` - New datetimelike properties accessor ``.dt`` for Series, see :ref:`Datetimelike Properties ` - New DataFrame default display for ``df.info()`` to include memory usage, see :ref:`Memory Usage ` - ``read_csv`` will now by default ignore blank lines when parsing, see :ref:`here ` - API change in using Indexes in set operations, see :ref:`here ` - Enhancements in the handling of timezones, see :ref:`here ` - A lot of improvements to the rolling and expanding moment functions, see :ref:`here ` - Internal refactoring of the ``Index`` class to no longer sub-class ``ndarray``, see :ref:`Internal Refactoring ` - dropping support for ``PyTables`` less than version 3.0.0, and ``numexpr`` less than version 2.1 (:issue:`7990`) - Split indexing documentation into :ref:`Indexing and Selecting Data ` and :ref:`MultiIndex / Advanced Indexing ` - Split out string methods documentation into :ref:`Working with Text Data ` - Check the :ref:`API Changes ` and :ref:`deprecations ` before updating - :ref:`Other Enhancements ` - :ref:`Performance Improvements ` - :ref:`Bug Fixes ` .. warning:: In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray`` but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be a transparent change with only very limited API implications (See the :ref:`Internal Refactoring `) .. warning:: The refactoring in :class:`~pandas.Categorical` changed the two argument constructor from "codes/labels and levels" to "values and levels (now called 'categories')". This can lead to subtle bugs. If you use :class:`~pandas.Categorical` directly, please audit your code before updating to this pandas version and change it to use the :meth:`~pandas.Categorical.from_codes` constructor. See more on ``Categorical`` :ref:`here ` New features ~~~~~~~~~~~~ .. _whatsnew_0150.cat: Categoricals in Series/DataFrame ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :class:`~pandas.Categorical` can now be included in ``Series`` and ``DataFrames`` and gained new methods to manipulate. Thanks to Jan Schulz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`, :issue:`7444`, :issue:`7839`, :issue:`7848`, :issue:`7864`, :issue:`7914`, :issue:`7768`, :issue:`8006`, :issue:`3678`, :issue:`8075`, :issue:`8076`, :issue:`8143`, :issue:`8453`, :issue:`8518`). For full docs, see the :ref:`categorical introduction ` and the :ref:`API documentation `. .. ipython:: python df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6], "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']}) df["grade"] = df["raw_grade"].astype("category") df["grade"] # Rename the categories df["grade"] = df["grade"].cat.rename_categories(["very good", "good", "very bad"]) # Reorder the categories and simultaneously add the missing categories df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"]) df["grade"] df.sort_values("grade") df.groupby("grade", observed=False).size() - ``pandas.core.group_agg`` and ``pandas.core.factor_agg`` were removed. As an alternative, construct a dataframe and use ``df.groupby().agg()``. - Supplying "codes/labels and levels" to the :class:`~pandas.Categorical` constructor is not supported anymore. Supplying two arguments to the constructor is now interpreted as "values and levels (now called 'categories')". Please change your code to use the :meth:`~pandas.Categorical.from_codes` constructor. - The ``Categorical.labels`` attribute was renamed to ``Categorical.codes`` and is read only. If you want to manipulate codes, please use one of the :ref:`API methods on Categoricals `. - The ``Categorical.levels`` attribute is renamed to ``Categorical.categories``. .. _whatsnew_0150.timedeltaindex: TimedeltaIndex/scalar ^^^^^^^^^^^^^^^^^^^^^ We introduce a new scalar type ``Timedelta``, which is a subclass of ``datetime.timedelta``, and behaves in a similar manner, but allows compatibility with ``np.timedelta64`` types as well as a host of custom representation, parsing, and attributes. This type is very similar to how ``Timestamp`` works for ``datetimes``. It is a nice-API box for the type. See the :ref:`docs `. (:issue:`3009`, :issue:`4533`, :issue:`8209`, :issue:`8187`, :issue:`8190`, :issue:`7869`, :issue:`7661`, :issue:`8345`, :issue:`8471`) .. warning:: ``Timedelta`` scalars (and ``TimedeltaIndex``) component fields are *not the same* as the component fields on a ``datetime.timedelta`` object. For example, ``.seconds`` on a ``datetime.timedelta`` object returns the total number of seconds combined between ``hours``, ``minutes`` and ``seconds``. In contrast, the pandas ``Timedelta`` breaks out hours, minutes, microseconds and nanoseconds separately. .. code-block:: ipython # Timedelta accessor In [9]: tds = pd.Timedelta('31 days 5 min 3 sec') In [10]: tds.minutes Out[10]: 5L In [11]: tds.seconds Out[11]: 3L # datetime.timedelta accessor # this is 5 minutes * 60 + 3 seconds In [12]: tds.to_pytimedelta().seconds Out[12]: 303 **Note**: this is no longer true starting from v0.16.0, where full compatibility with ``datetime.timedelta`` is introduced. See the :ref:`0.16.0 whatsnew entry ` .. warning:: Prior to 0.15.0 ``pd.to_timedelta`` would return a ``Series`` for list-like/Series input, and a ``np.timedelta64`` for scalar input. It will now return a ``TimedeltaIndex`` for list-like input, ``Series`` for Series input, and ``Timedelta`` for scalar input. The arguments to ``pd.to_timedelta`` are now ``(arg,unit='ns',box=True,coerce=False)``, previously were ``(arg,box=True,unit='ns')`` as these are more logical. Construct a scalar .. ipython:: python pd.Timedelta('1 days 06:05:01.00003') pd.Timedelta('15.5us') pd.Timedelta('1 hour 15.5us') # negative Timedeltas have this string repr # to be more consistent with datetime.timedelta conventions pd.Timedelta('-1us') # a NaT pd.Timedelta('nan') Access fields for a ``Timedelta`` .. ipython:: python td = pd.Timedelta('1 hour 3m 15.5us') td.seconds td.microseconds td.nanoseconds Construct a ``TimedeltaIndex`` .. ipython:: python :suppress: import datetime .. ipython:: python pd.TimedeltaIndex(['1 days', '1 days, 00:00:05', np.timedelta64(2, 'D'), datetime.timedelta(days=2, seconds=2)]) Constructing a ``TimedeltaIndex`` with a regular range .. ipython:: python pd.timedelta_range('1 days', periods=5, freq='D') .. code-block:: python In [20]: pd.timedelta_range(start='1 days', end='2 days', freq='30T') Out[20]: TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00', '1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00', '1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00', '1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00', '1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00', '1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00', '1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00', '1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00', '1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00', '1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00', '1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00', '1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00', '1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00', '1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00', '1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00', '1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00', '2 days 00:00:00'], dtype='timedelta64[ns]', freq='30T') You can now use a ``TimedeltaIndex`` as the index of a pandas object .. ipython:: python s = pd.Series(np.arange(5), index=pd.timedelta_range('1 days', periods=5, freq='s')) s You can select with partial string selections .. ipython:: python s['1 day 00:00:02'] s['1 day':'1 day 00:00:02'] Finally, the combination of ``TimedeltaIndex`` with ``DatetimeIndex`` allow certain combination operations that are ``NaT`` preserving: .. ipython:: python tdi = pd.TimedeltaIndex(['1 days', pd.NaT, '2 days']) tdi.tolist() dti = pd.date_range('20130101', periods=3) dti.tolist() (dti + tdi).tolist() (dti - tdi).tolist() - iteration of a ``Series`` e.g. ``list(Series(...))`` of ``timedelta64[ns]`` would prior to v0.15.0 return ``np.timedelta64`` for each element. These will now be wrapped in ``Timedelta``. .. _whatsnew_0150.memory: Memory usage ^^^^^^^^^^^^ Implemented methods to find memory usage of a DataFrame. See the :ref:`FAQ ` for more. (:issue:`6852`). A new display option ``display.memory_usage`` (see :ref:`options`) sets the default behavior of the ``memory_usage`` argument in the ``df.info()`` method. By default ``display.memory_usage`` is ``True``. .. ipython:: python dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]', 'complex128', 'object', 'bool'] n = 5000 data = {t: np.random.randint(100, size=n).astype(t) for t in dtypes} df = pd.DataFrame(data) df['categorical'] = df['object'].astype('category') df.info() Additionally :meth:`~pandas.DataFrame.memory_usage` is an available method for a dataframe object which returns the memory usage of each column. .. ipython:: python df.memory_usage(index=True) .. _whatsnew_0150.dt: Series.dt accessor ^^^^^^^^^^^^^^^^^^ ``Series`` has gained an accessor to succinctly return datetime like properties for the *values* of the Series, if its a datetime/period like Series. (:issue:`7207`) This will return a Series, indexed like the existing Series. See the :ref:`docs ` .. ipython:: python # datetime s = pd.Series(pd.date_range('20130101 09:10:12', periods=4)) s s.dt.hour s.dt.second s.dt.day s.dt.freq This enables nice expressions like this: .. ipython:: python s[s.dt.day == 2] You can easily produce tz aware transformations: .. ipython:: python stz = s.dt.tz_localize('US/Eastern') stz stz.dt.tz You can also chain these types of operations: .. ipython:: python s.dt.tz_localize('UTC').dt.tz_convert('US/Eastern') The ``.dt`` accessor works for period and timedelta dtypes. .. ipython:: python # period s = pd.Series(pd.period_range('20130101', periods=4, freq='D')) s s.dt.year s.dt.day .. ipython:: python # timedelta s = pd.Series(pd.timedelta_range('1 day 00:00:05', periods=4, freq='s')) s s.dt.days s.dt.seconds s.dt.components .. _whatsnew_0150.tz: Timezone handling improvements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``tz_localize(None)`` for tz-aware ``Timestamp`` and ``DatetimeIndex`` now removes timezone holding local time, previously this resulted in ``Exception`` or ``TypeError`` (:issue:`7812`) .. code-block:: ipython In [58]: ts = pd.Timestamp('2014-08-01 09:00', tz='US/Eastern') In[59]: ts Out[59]: Timestamp('2014-08-01 09:00:00-0400', tz='US/Eastern') In [60]: ts.tz_localize(None) Out[60]: Timestamp('2014-08-01 09:00:00') In [61]: didx = pd.date_range(start='2014-08-01 09:00', freq='H', ....: periods=10, tz='US/Eastern') ....: In [62]: didx Out[62]: DatetimeIndex(['2014-08-01 09:00:00-04:00', '2014-08-01 10:00:00-04:00', '2014-08-01 11:00:00-04:00', '2014-08-01 12:00:00-04:00', '2014-08-01 13:00:00-04:00', '2014-08-01 14:00:00-04:00', '2014-08-01 15:00:00-04:00', '2014-08-01 16:00:00-04:00', '2014-08-01 17:00:00-04:00', '2014-08-01 18:00:00-04:00'], dtype='datetime64[ns, US/Eastern]', freq='H') In [63]: didx.tz_localize(None) Out[63]: DatetimeIndex(['2014-08-01 09:00:00', '2014-08-01 10:00:00', '2014-08-01 11:00:00', '2014-08-01 12:00:00', '2014-08-01 13:00:00', '2014-08-01 14:00:00', '2014-08-01 15:00:00', '2014-08-01 16:00:00', '2014-08-01 17:00:00', '2014-08-01 18:00:00'], dtype='datetime64[ns]', freq=None) - ``tz_localize`` now accepts the ``ambiguous`` keyword which allows for passing an array of bools indicating whether the date belongs in DST or not, 'NaT' for setting transition times to NaT, 'infer' for inferring DST/non-DST, and 'raise' (default) for an ``AmbiguousTimeError`` to be raised. See :ref:`the docs` for more details (:issue:`7943`) - ``DataFrame.tz_localize`` and ``DataFrame.tz_convert`` now accepts an optional ``level`` argument for localizing a specific level of a MultiIndex (:issue:`7846`) - ``Timestamp.tz_localize`` and ``Timestamp.tz_convert`` now raise ``TypeError`` in error cases, rather than ``Exception`` (:issue:`8025`) - a timeseries/index localized to UTC when inserted into a Series/DataFrame will preserve the UTC timezone (rather than being a naive ``datetime64[ns]``) as ``object`` dtype (:issue:`8411`) - ``Timestamp.__repr__`` displays ``dateutil.tz.tzoffset`` info (:issue:`7907`) .. _whatsnew_0150.roll: Rolling/expanding moments improvements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - :func:`rolling_min`, :func:`rolling_max`, :func:`rolling_cov`, and :func:`rolling_corr` now return objects with all ``NaN`` when ``len(arg) < min_periods <= window`` rather than raising. (This makes all rolling functions consistent in this behavior). (:issue:`7766`) Prior to 0.15.0 .. ipython:: python s = pd.Series([10, 11, 12, 13]) .. code-block:: ipython In [15]: pd.rolling_min(s, window=10, min_periods=5) ValueError: min_periods (5) must be <= window (4) New behavior .. code-block:: ipython In [4]: pd.rolling_min(s, window=10, min_periods=5) Out[4]: 0 NaN 1 NaN 2 NaN 3 NaN dtype: float64 - :func:`rolling_max`, :func:`rolling_min`, :func:`rolling_sum`, :func:`rolling_mean`, :func:`rolling_median`, :func:`rolling_std`, :func:`rolling_var`, :func:`rolling_skew`, :func:`rolling_kurt`, :func:`rolling_quantile`, :func:`rolling_cov`, :func:`rolling_corr`, :func:`rolling_corr_pairwise`, :func:`rolling_window`, and :func:`rolling_apply` with ``center=True`` previously would return a result of the same structure as the input ``arg`` with ``NaN`` in the final ``(window-1)/2`` entries. Now the final ``(window-1)/2`` entries of the result are calculated as if the input ``arg`` were followed by ``(window-1)/2`` ``NaN`` values (or with shrinking windows, in the case of :func:`rolling_apply`). (:issue:`7925`, :issue:`8269`) Prior behavior (note final value is ``NaN``): .. code-block:: ipython In [7]: pd.rolling_sum(Series(range(4)), window=3, min_periods=0, center=True) Out[7]: 0 1 1 3 2 6 3 NaN dtype: float64 New behavior (note final value is ``5 = sum([2, 3, NaN])``): .. code-block:: ipython In [7]: pd.rolling_sum(pd.Series(range(4)), window=3, ....: min_periods=0, center=True) Out[7]: 0 1 1 3 2 6 3 5 dtype: float64 - :func:`rolling_window` now normalizes the weights properly in rolling mean mode (`mean=True`) so that the calculated weighted means (e.g. 'triang', 'gaussian') are distributed about the same means as those calculated without weighting (i.e. 'boxcar'). See :ref:`the note on normalization ` for further details. (:issue:`7618`) .. ipython:: python s = pd.Series([10.5, 8.8, 11.4, 9.7, 9.3]) Behavior prior to 0.15.0: .. code-block:: ipython In [39]: pd.rolling_window(s, window=3, win_type='triang', center=True) Out[39]: 0 NaN 1 6.583333 2 6.883333 3 6.683333 4 NaN dtype: float64 New behavior .. code-block:: ipython In [10]: pd.rolling_window(s, window=3, win_type='triang', center=True) Out[10]: 0 NaN 1 9.875 2 10.325 3 10.025 4 NaN dtype: float64 - Removed ``center`` argument from all :func:`expanding_ ` functions (see :ref:`list `), as the results produced when ``center=True`` did not make much sense. (:issue:`7925`) - Added optional ``ddof`` argument to :func:`expanding_cov` and :func:`rolling_cov`. The default value of ``1`` is backwards-compatible. (:issue:`8279`) - Documented the ``ddof`` argument to :func:`expanding_var`, :func:`expanding_std`, :func:`rolling_var`, and :func:`rolling_std`. These functions' support of a ``ddof`` argument (with a default value of ``1``) was previously undocumented. (:issue:`8064`) - :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr` now interpret ``min_periods`` in the same manner that the :func:`rolling_*()` and :func:`expanding_*()` functions do: a given result entry will be ``NaN`` if the (expanding, in this case) window does not contain at least ``min_periods`` values. The previous behavior was to set to ``NaN`` the ``min_periods`` entries starting with the first non- ``NaN`` value. (:issue:`7977`) Prior behavior (note values start at index ``2``, which is ``min_periods`` after index ``0`` (the index of the first non-empty value)): .. ipython:: python s = pd.Series([1, None, None, None, 2, 3]) .. code-block:: ipython In [51]: pd.ewma(s, com=3., min_periods=2) Out[51]: 0 NaN 1 NaN 2 1.000000 3 1.000000 4 1.571429 5 2.189189 dtype: float64 New behavior (note values start at index ``4``, the location of the 2nd (since ``min_periods=2``) non-empty value): .. code-block:: ipython In [2]: pd.ewma(s, com=3., min_periods=2) Out[2]: 0 NaN 1 NaN 2 NaN 3 NaN 4 1.759644 5 2.383784 dtype: float64 - :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr` now have an optional ``adjust`` argument, just like :func:`ewma` does, affecting how the weights are calculated. The default value of ``adjust`` is ``True``, which is backwards-compatible. See :ref:`Exponentially weighted moment functions ` for details. (:issue:`7911`) - :func:`ewma`, :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, :func:`ewmcov`, and :func:`ewmcorr` now have an optional ``ignore_na`` argument. When ``ignore_na=False`` (the default), missing values are taken into account in the weights calculation. When ``ignore_na=True`` (which reproduces the pre-0.15.0 behavior), missing values are ignored in the weights calculation. (:issue:`7543`) .. code-block:: ipython In [7]: pd.ewma(pd.Series([None, 1., 8.]), com=2.) Out[7]: 0 NaN 1 1.0 2 5.2 dtype: float64 In [8]: pd.ewma(pd.Series([1., None, 8.]), com=2., ....: ignore_na=True) # pre-0.15.0 behavior Out[8]: 0 1.0 1 1.0 2 5.2 dtype: float64 In [9]: pd.ewma(pd.Series([1., None, 8.]), com=2., ....: ignore_na=False) # new default Out[9]: 0 1.000000 1 1.000000 2 5.846154 dtype: float64 .. warning:: By default (``ignore_na=False``) the :func:`ewm*()` functions' weights calculation in the presence of missing values is different than in pre-0.15.0 versions. To reproduce the pre-0.15.0 calculation of weights in the presence of missing values one must specify explicitly ``ignore_na=True``. - Bug in :func:`expanding_cov`, :func:`expanding_corr`, :func:`rolling_cov`, :func:`rolling_cor`, :func:`ewmcov`, and :func:`ewmcorr` returning results with columns sorted by name and producing an error for non-unique columns; now handles non-unique columns and returns columns in original order (except for the case of two DataFrames with ``pairwise=False``, where behavior is unchanged) (:issue:`7542`) - Bug in :func:`rolling_count` and :func:`expanding_*()` functions unnecessarily producing error message for zero-length data (:issue:`8056`) - Bug in :func:`rolling_apply` and :func:`expanding_apply` interpreting ``min_periods=0`` as ``min_periods=1`` (:issue:`8080`) - Bug in :func:`expanding_std` and :func:`expanding_var` for a single value producing a confusing error message (:issue:`7900`) - Bug in :func:`rolling_std` and :func:`rolling_var` for a single value producing ``0`` rather than ``NaN`` (:issue:`7900`) - Bug in :func:`ewmstd`, :func:`ewmvol`, :func:`ewmvar`, and :func:`ewmcov` calculation of de-biasing factors when ``bias=False`` (the default). Previously an incorrect constant factor was used, based on ``adjust=True``, ``ignore_na=True``, and an infinite number of observations. Now a different factor is used for each entry, based on the actual weights (analogous to the usual ``N/(N-1)`` factor). In particular, for a single point a value of ``NaN`` is returned when ``bias=False``, whereas previously a value of (approximately) ``0`` was returned. For example, consider the following pre-0.15.0 results for ``ewmvar(..., bias=False)``, and the corresponding debiasing factors: .. ipython:: python s = pd.Series([1., 2., 0., 4.]) .. code-block:: ipython In [89]: pd.ewmvar(s, com=2., bias=False) Out[89]: 0 -2.775558e-16 1 3.000000e-01 2 9.556787e-01 3 3.585799e+00 dtype: float64 In [90]: pd.ewmvar(s, com=2., bias=False) / pd.ewmvar(s, com=2., bias=True) Out[90]: 0 1.25 1 1.25 2 1.25 3 1.25 dtype: float64 Note that entry ``0`` is approximately 0, and the debiasing factors are a constant 1.25. By comparison, the following 0.15.0 results have a ``NaN`` for entry ``0``, and the debiasing factors are decreasing (towards 1.25): .. code-block:: ipython In [14]: pd.ewmvar(s, com=2., bias=False) Out[14]: 0 NaN 1 0.500000 2 1.210526 3 4.089069 dtype: float64 In [15]: pd.ewmvar(s, com=2., bias=False) / pd.ewmvar(s, com=2., bias=True) Out[15]: 0 NaN 1 2.083333 2 1.583333 3 1.425439 dtype: float64 See :ref:`Exponentially weighted moment functions ` for details. (:issue:`7912`) .. _whatsnew_0150.sql: Improvements in the SQL IO module ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Added support for a ``chunksize`` parameter to ``to_sql`` function. This allows DataFrame to be written in chunks and avoid packet-size overflow errors (:issue:`8062`). - Added support for a ``chunksize`` parameter to ``read_sql`` function. Specifying this argument will return an iterator through chunks of the query result (:issue:`2908`). - Added support for writing ``datetime.date`` and ``datetime.time`` object columns with ``to_sql`` (:issue:`6932`). - Added support for specifying a ``schema`` to read from/write to with ``read_sql_table`` and ``to_sql`` (:issue:`7441`, :issue:`7952`). For example: .. code-block:: python df.to_sql('table', engine, schema='other_schema') # noqa F821 pd.read_sql_table('table', engine, schema='other_schema') # noqa F821 - Added support for writing ``NaN`` values with ``to_sql`` (:issue:`2754`). - Added support for writing datetime64 columns with ``to_sql`` for all database flavors (:issue:`7103`). .. _whatsnew_0150.api: Backwards incompatible API changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. _whatsnew_0150.api_breaking: Breaking changes ^^^^^^^^^^^^^^^^ API changes related to ``Categorical`` (see :ref:`here ` for more details): - The ``Categorical`` constructor with two arguments changed from "codes/labels and levels" to "values and levels (now called 'categories')". This can lead to subtle bugs. If you use :class:`~pandas.Categorical` directly, please audit your code by changing it to use the :meth:`~pandas.Categorical.from_codes` constructor. An old function call like (prior to 0.15.0): .. code-block:: python pd.Categorical([0,1,0,2,1], levels=['a', 'b', 'c']) will have to adapted to the following to keep the same behaviour: .. code-block:: ipython In [2]: pd.Categorical.from_codes([0,1,0,2,1], categories=['a', 'b', 'c']) Out[2]: [a, b, a, c, b] Categories (3, object): [a, b, c] API changes related to the introduction of the ``Timedelta`` scalar (see :ref:`above ` for more details): - Prior to 0.15.0 :func:`to_timedelta` would return a ``Series`` for list-like/Series input, and a ``np.timedelta64`` for scalar input. It will now return a ``TimedeltaIndex`` for list-like input, ``Series`` for Series input, and ``Timedelta`` for scalar input. For API changes related to the rolling and expanding functions, see detailed overview :ref:`above `. Other notable API changes: - Consistency when indexing with ``.loc`` and a list-like indexer when no values are found. .. ipython:: python df = pd.DataFrame([['a'], ['b']], index=[1, 2]) df In prior versions there was a difference in these two constructs: - ``df.loc[[3]]`` would return a frame reindexed by 3 (with all ``np.nan`` values) - ``df.loc[[3],:]`` would raise ``KeyError``. Both will now raise a ``KeyError``. The rule is that *at least 1* indexer must be found when using a list-like and ``.loc`` (:issue:`7999`) Furthermore in prior versions these were also different: - ``df.loc[[1,3]]`` would return a frame reindexed by [1,3] - ``df.loc[[1,3],:]`` would raise ``KeyError``. Both will now return a frame reindex by [1,3]. E.g. .. code-block:: ipython In [3]: df.loc[[1, 3]] Out[3]: 0 1 a 3 NaN In [4]: df.loc[[1, 3], :] Out[4]: 0 1 a 3 NaN This can also be seen in multi-axis indexing with a ``Panel``. .. code-block:: python >>> p = pd.Panel(np.arange(2 * 3 * 4).reshape(2, 3, 4), ... items=['ItemA', 'ItemB'], ... major_axis=[1, 2, 3], ... minor_axis=['A', 'B', 'C', 'D']) >>> p Dimensions: 2 (items) x 3 (major_axis) x 4 (minor_axis) Items axis: ItemA to ItemB Major_axis axis: 1 to 3 Minor_axis axis: A to D The following would raise ``KeyError`` prior to 0.15.0: .. code-block:: ipython In [5]: Out[5]: ItemA ItemD 1 3 NaN 2 7 NaN 3 11 NaN Furthermore, ``.loc`` will raise If no values are found in a MultiIndex with a list-like indexer: .. ipython:: python :okexcept: s = pd.Series(np.arange(3, dtype='int64'), index=pd.MultiIndex.from_product([['A'], ['foo', 'bar', 'baz']], names=['one', 'two']) ).sort_index() s try: s.loc[['D']] except KeyError as e: print("KeyError: " + str(e)) - Assigning values to ``None`` now considers the dtype when choosing an 'empty' value (:issue:`7941`). Previously, assigning to ``None`` in numeric containers changed the dtype to object (or errored, depending on the call). It now uses ``NaN``: .. ipython:: python s = pd.Series([1., 2., 3.]) s.loc[0] = None s ``NaT`` is now used similarly for datetime containers. For object containers, we now preserve ``None`` values (previously these were converted to ``NaN`` values). .. ipython:: python s = pd.Series(["a", "b", "c"]) s.loc[0] = None s To insert a ``NaN``, you must explicitly use ``np.nan``. See the :ref:`docs `. - In prior versions, updating a pandas object inplace would not reflect in other python references to this object. (:issue:`8511`, :issue:`5104`) .. ipython:: python s = pd.Series([1, 2, 3]) s2 = s s += 1.5 Behavior prior to v0.15.0 .. code-block:: ipython # the original object In [5]: s Out[5]: 0 2.5 1 3.5 2 4.5 dtype: float64 # a reference to the original object In [7]: s2 Out[7]: 0 1 1 2 2 3 dtype: int64 This is now the correct behavior .. ipython:: python # the original object s # a reference to the original object s2 .. _whatsnew_0150.blanklines: - Made both the C-based and Python engines for ``read_csv`` and ``read_table`` ignore empty lines in input as well as white space-filled lines, as long as ``sep`` is not white space. This is an API change that can be controlled by the keyword parameter ``skip_blank_lines``. See :ref:`the docs ` (:issue:`4466`) - A timeseries/index localized to UTC when inserted into a Series/DataFrame will preserve the UTC timezone and inserted as ``object`` dtype rather than being converted to a naive ``datetime64[ns]`` (:issue:`8411`). - Bug in passing a ``DatetimeIndex`` with a timezone that was not being retained in DataFrame construction from a dict (:issue:`7822`) In prior versions this would drop the timezone, now it retains the timezone, but gives a column of ``object`` dtype: .. ipython:: python i = pd.date_range('1/1/2011', periods=3, freq='10s', tz='US/Eastern') i df = pd.DataFrame({'a': i}) df df.dtypes Previously this would have yielded a column of ``datetime64`` dtype, but without timezone info. The behaviour of assigning a column to an existing dataframe as ``df['a'] = i`` remains unchanged (this already returned an ``object`` column with a timezone). - When passing multiple levels to :meth:`~pandas.DataFrame.stack()`, it will now raise a ``ValueError`` when the levels aren't all level names or all level numbers (:issue:`7660`). See :ref:`Reshaping by stacking and unstacking `. - Raise a ``ValueError`` in ``df.to_hdf`` with 'fixed' format, if ``df`` has non-unique columns as the resulting file will be broken (:issue:`7761`) - ``SettingWithCopy`` raise/warnings (according to the option ``mode.chained_assignment``) will now be issued when setting a value on a sliced mixed-dtype DataFrame using chained-assignment. (:issue:`7845`, :issue:`7950`) .. code-block:: python In [1]: df = pd.DataFrame(np.arange(0, 9), columns=['count']) In [2]: df['group'] = 'b' In [3]: df.iloc[0:5]['group'] = 'a' /usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy - ``merge``, ``DataFrame.merge``, and ``ordered_merge`` now return the same type as the ``left`` argument (:issue:`7737`). - Previously an enlargement with a mixed-dtype frame would act unlike ``.append`` which will preserve dtypes (related :issue:`2578`, :issue:`8176`): .. ipython:: python df = pd.DataFrame([[True, 1], [False, 2]], columns=["female", "fitness"]) df df.dtypes # dtypes are now preserved df.loc[2] = df.loc[1] df df.dtypes - ``Series.to_csv()`` now returns a string when ``path=None``, matching the behaviour of ``DataFrame.to_csv()`` (:issue:`8215`). - ``read_hdf`` now raises ``IOError`` when a file that doesn't exist is passed in. Previously, a new, empty file was created, and a ``KeyError`` raised (:issue:`7715`). - ``DataFrame.info()`` now ends its output with a newline character (:issue:`8114`) - Concatenating no objects will now raise a ``ValueError`` rather than a bare ``Exception``. - Merge errors will now be sub-classes of ``ValueError`` rather than raw ``Exception`` (:issue:`8501`) - ``DataFrame.plot`` and ``Series.plot`` keywords are now have consistent orders (:issue:`8037`) .. _whatsnew_0150.refactoring: Internal refactoring ^^^^^^^^^^^^^^^^^^^^ In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray`` but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be a transparent change with only very limited API implications (:issue:`5080`, :issue:`7439`, :issue:`7796`, :issue:`8024`, :issue:`8367`, :issue:`7997`, :issue:`8522`): - you may need to unpickle pandas version < 0.15.0 pickles using ``pd.read_pickle`` rather than ``pickle.load``. See :ref:`pickle docs ` - when plotting with a ``PeriodIndex``, the matplotlib internal axes will now be arrays of ``Period`` rather than a ``PeriodIndex`` (this is similar to how a ``DatetimeIndex`` passes arrays of ``datetimes`` now) - MultiIndexes will now raise similarly to other pandas objects w.r.t. truth testing, see :ref:`here ` (:issue:`7897`). - When plotting a DatetimeIndex directly with matplotlib's ``plot`` function, the axis labels will no longer be formatted as dates but as integers (the internal representation of a ``datetime64``). **UPDATE** This is fixed in 0.15.1, see :ref:`here `. .. _whatsnew_0150.deprecations: Deprecations ^^^^^^^^^^^^ - The attributes ``Categorical`` ``labels`` and ``levels`` attributes are deprecated and renamed to ``codes`` and ``categories``. - The ``outtype`` argument to ``pd.DataFrame.to_dict`` has been deprecated in favor of ``orient``. (:issue:`7840`) - The ``convert_dummies`` method has been deprecated in favor of ``get_dummies`` (:issue:`8140`) - The ``infer_dst`` argument in ``tz_localize`` will be deprecated in favor of ``ambiguous`` to allow for more flexibility in dealing with DST transitions. Replace ``infer_dst=True`` with ``ambiguous='infer'`` for the same behavior (:issue:`7943`). See :ref:`the docs` for more details. - The top-level ``pd.value_range`` has been deprecated and can be replaced by ``.describe()`` (:issue:`8481`) .. _whatsnew_0150.index_set_ops: - The ``Index`` set operations ``+`` and ``-`` were deprecated in order to provide these for numeric type operations on certain index types. ``+`` can be replaced by ``.union()`` or ``|``, and ``-`` by ``.difference()``. Further the method name ``Index.diff()`` is deprecated and can be replaced by ``Index.difference()`` (:issue:`8226`) .. code-block:: python # + pd.Index(['a', 'b', 'c']) + pd.Index(['b', 'c', 'd']) # should be replaced by pd.Index(['a', 'b', 'c']).union(pd.Index(['b', 'c', 'd'])) .. code-block:: python # - pd.Index(['a', 'b', 'c']) - pd.Index(['b', 'c', 'd']) # should be replaced by pd.Index(['a', 'b', 'c']).difference(pd.Index(['b', 'c', 'd'])) - The ``infer_types`` argument to :func:`~pandas.read_html` now has no effect and is deprecated (:issue:`7762`, :issue:`7032`). .. _whatsnew_0150.prior_deprecations: Removal of prior version deprecations/changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Remove ``DataFrame.delevel`` method in favor of ``DataFrame.reset_index`` .. _whatsnew_0150.enhancements: Enhancements ~~~~~~~~~~~~ Enhancements in the importing/exporting of Stata files: - Added support for bool, uint8, uint16 and uint32 data types in ``to_stata`` (:issue:`7097`, :issue:`7365`) - Added conversion option when importing Stata files (:issue:`8527`) - ``DataFrame.to_stata`` and ``StataWriter`` check string length for compatibility with limitations imposed in dta files where fixed-width strings must contain 244 or fewer characters. Attempting to write Stata dta files with strings longer than 244 characters raises a ``ValueError``. (:issue:`7858`) - ``read_stata`` and ``StataReader`` can import missing data information into a ``DataFrame`` by setting the argument ``convert_missing`` to ``True``. When using this options, missing values are returned as ``StataMissingValue`` objects and columns containing missing values have ``object`` data type. (:issue:`8045`) Enhancements in the plotting functions: - Added ``layout`` keyword to ``DataFrame.plot``. You can pass a tuple of ``(rows, columns)``, one of which can be ``-1`` to automatically infer (:issue:`6667`, :issue:`8071`). - Allow to pass multiple axes to ``DataFrame.plot``, ``hist`` and ``boxplot`` (:issue:`5353`, :issue:`6970`, :issue:`7069`) - Added support for ``c``, ``colormap`` and ``colorbar`` arguments for ``DataFrame.plot`` with ``kind='scatter'`` (:issue:`7780`) - Histogram from ``DataFrame.plot`` with ``kind='hist'`` (:issue:`7809`), See :ref:`the docs`. - Boxplot from ``DataFrame.plot`` with ``kind='box'`` (:issue:`7998`), See :ref:`the docs`. Other: - ``read_csv`` now has a keyword parameter ``float_precision`` which specifies which floating-point converter the C engine should use during parsing, see :ref:`here ` (:issue:`8002`, :issue:`8044`) - Added ``searchsorted`` method to ``Series`` objects (:issue:`7447`) - :func:`describe` on mixed-types DataFrames is more flexible. Type-based column filtering is now possible via the ``include``/``exclude`` arguments. See the :ref:`docs ` (:issue:`8164`). .. ipython:: python df = pd.DataFrame({'catA': ['foo', 'foo', 'bar'] * 8, 'catB': ['a', 'b', 'c', 'd'] * 6, 'numC': np.arange(24), 'numD': np.arange(24.) + .5}) df.describe(include=["object"]) df.describe(include=["number", "object"], exclude=["float"]) Requesting all columns is possible with the shorthand 'all' .. ipython:: python df.describe(include='all') Without those arguments, ``describe`` will behave as before, including only numerical columns or, if none are, only categorical columns. See also the :ref:`docs ` - Added ``split`` as an option to the ``orient`` argument in ``pd.DataFrame.to_dict``. (:issue:`7840`) - The ``get_dummies`` method can now be used on DataFrames. By default only categorical columns are encoded as 0's and 1's, while other columns are left untouched. .. ipython:: python df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'], 'C': [1, 2, 3]}) pd.get_dummies(df) - ``PeriodIndex`` supports ``resolution`` as the same as ``DatetimeIndex`` (:issue:`7708`) - ``pandas.tseries.holiday`` has added support for additional holidays and ways to observe holidays (:issue:`7070`) - ``pandas.tseries.holiday.Holiday`` now supports a list of offsets in Python3 (:issue:`7070`) - ``pandas.tseries.holiday.Holiday`` now supports a days_of_week parameter (:issue:`7070`) - ``GroupBy.nth()`` now supports selecting multiple nth values (:issue:`7910`) .. ipython:: python business_dates = pd.date_range(start='4/1/2014', end='6/30/2014', freq='B') df = pd.DataFrame(1, index=business_dates, columns=['a', 'b']) # get the first, 4th, and last date index for each month df.groupby([df.index.year, df.index.month]).nth([0, 3, -1]) - ``Period`` and ``PeriodIndex`` supports addition/subtraction with ``timedelta``-likes (:issue:`7966`) If ``Period`` freq is ``D``, ``H``, ``T``, ``S``, ``L``, ``U``, ``N``, ``Timedelta``-like can be added if the result can have same freq. Otherwise, only the same ``offsets`` can be added. .. code-block:: ipython In [104]: idx = pd.period_range('2014-07-01 09:00', periods=5, freq='H') In [105]: idx Out[105]: PeriodIndex(['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00'], dtype='period[H]') In [106]: idx + pd.offsets.Hour(2) Out[106]: PeriodIndex(['2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00', '2014-07-01 14:00', '2014-07-01 15:00'], dtype='period[H]') In [107]: idx + pd.Timedelta('120m') Out[107]: PeriodIndex(['2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00', '2014-07-01 14:00', '2014-07-01 15:00'], dtype='period[H]') In [108]: idx = pd.period_range('2014-07', periods=5, freq='M') In [109]: idx Out[109]: PeriodIndex(['2014-07', '2014-08', '2014-09', '2014-10', '2014-11'], dtype='period[M]') In [110]: idx + pd.offsets.MonthEnd(3) Out[110]: PeriodIndex(['2014-10', '2014-11', '2014-12', '2015-01', '2015-02'], dtype='period[M]') - Added experimental compatibility with ``openpyxl`` for versions >= 2.0. The ``DataFrame.to_excel`` method ``engine`` keyword now recognizes ``openpyxl1`` and ``openpyxl2`` which will explicitly require openpyxl v1 and v2 respectively, failing if the requested version is not available. The ``openpyxl`` engine is a now a meta-engine that automatically uses whichever version of openpyxl is installed. (:issue:`7177`) - ``DataFrame.fillna`` can now accept a ``DataFrame`` as a fill value (:issue:`8377`) - Passing multiple levels to :meth:`~pandas.DataFrame.stack()` will now work when multiple level numbers are passed (:issue:`7660`). See :ref:`Reshaping by stacking and unstacking `. - :func:`set_names`, :func:`set_labels`, and :func:`set_levels` methods now take an optional ``level`` keyword argument to all modification of specific level(s) of a MultiIndex. Additionally :func:`set_names` now accepts a scalar string value when operating on an ``Index`` or on a specific level of a ``MultiIndex`` (:issue:`7792`) .. ipython:: python idx = pd.MultiIndex.from_product([['a'], range(3), list("pqr")], names=['foo', 'bar', 'baz']) idx.set_names('qux', level=0) idx.set_names(['qux', 'corge'], level=[0, 1]) idx.set_levels(['a', 'b', 'c'], level='bar') idx.set_levels([['a', 'b', 'c'], [1, 2, 3]], level=[1, 2]) - ``Index.isin`` now supports a ``level`` argument to specify which index level to use for membership tests (:issue:`7892`, :issue:`7890`) .. code-block:: ipython In [1]: idx = pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']]) In [2]: idx.values Out[2]: array([(0, 'a'), (0, 'b'), (0, 'c'), (1, 'a'), (1, 'b'), (1, 'c')], dtype=object) In [3]: idx.isin(['a', 'c', 'e'], level=1) Out[3]: array([ True, False, True, True, False, True], dtype=bool) - ``Index`` now supports ``duplicated`` and ``drop_duplicates``. (:issue:`4060`) .. ipython:: python idx = pd.Index([1, 2, 3, 4, 1, 2]) idx idx.duplicated() idx.drop_duplicates() - add ``copy=True`` argument to ``pd.concat`` to enable pass through of complete blocks (:issue:`8252`) - Added support for numpy 1.8+ data types (``bool_``, ``int_``, ``float_``, ``string_``) for conversion to R dataframe (:issue:`8400`) .. _whatsnew_0150.performance: Performance ~~~~~~~~~~~ - Performance improvements in ``DatetimeIndex.__iter__`` to allow faster iteration (:issue:`7683`) - Performance improvements in ``Period`` creation (and ``PeriodIndex`` setitem) (:issue:`5155`) - Improvements in Series.transform for significant performance gains (revised) (:issue:`6496`) - Performance improvements in ``StataReader`` when reading large files (:issue:`8040`, :issue:`8073`) - Performance improvements in ``StataWriter`` when writing large files (:issue:`8079`) - Performance and memory usage improvements in multi-key ``groupby`` (:issue:`8128`) - Performance improvements in groupby ``.agg`` and ``.apply`` where builtins max/min were not mapped to numpy/cythonized versions (:issue:`7722`) - Performance improvement in writing to sql (``to_sql``) of up to 50% (:issue:`8208`). - Performance benchmarking of groupby for large value of ngroups (:issue:`6787`) - Performance improvement in ``CustomBusinessDay``, ``CustomBusinessMonth`` (:issue:`8236`) - Performance improvement for ``MultiIndex.values`` for multi-level indexes containing datetimes (:issue:`8543`) .. _whatsnew_0150.bug_fixes: Bug fixes ~~~~~~~~~ - Bug in pivot_table, when using margins and a dict aggfunc (:issue:`8349`) - Bug in ``read_csv`` where ``squeeze=True`` would return a view (:issue:`8217`) - Bug in checking of table name in ``read_sql`` in certain cases (:issue:`7826`). - Bug in ``DataFrame.groupby`` where ``Grouper`` does not recognize level when frequency is specified (:issue:`7885`) - Bug in multiindexes dtypes getting mixed up when DataFrame is saved to SQL table (:issue:`8021`) - Bug in ``Series`` 0-division with a float and integer operand dtypes (:issue:`7785`) - Bug in ``Series.astype("unicode")`` not calling ``unicode`` on the values correctly (:issue:`7758`) - Bug in ``DataFrame.as_matrix()`` with mixed ``datetime64[ns]`` and ``timedelta64[ns]`` dtypes (:issue:`7778`) - Bug in ``HDFStore.select_column()`` not preserving UTC timezone info when selecting a ``DatetimeIndex`` (:issue:`7777`) - Bug in ``to_datetime`` when ``format='%Y%m%d'`` and ``coerce=True`` are specified, where previously an object array was returned (rather than a coerced time-series with ``NaT``), (:issue:`7930`) - Bug in ``DatetimeIndex`` and ``PeriodIndex`` in-place addition and subtraction cause different result from normal one (:issue:`6527`) - Bug in adding and subtracting ``PeriodIndex`` with ``PeriodIndex`` raise ``TypeError`` (:issue:`7741`) - Bug in ``combine_first`` with ``PeriodIndex`` data raises ``TypeError`` (:issue:`3367`) - Bug in MultiIndex slicing with missing indexers (:issue:`7866`) - Bug in MultiIndex slicing with various edge cases (:issue:`8132`) - Regression in MultiIndex indexing with a non-scalar type object (:issue:`7914`) - Bug in ``Timestamp`` comparisons with ``==`` and ``int64`` dtype (:issue:`8058`) - Bug in pickles contains ``DateOffset`` may raise ``AttributeError`` when ``normalize`` attribute is referred internally (:issue:`7748`) - Bug in ``Panel`` when using ``major_xs`` and ``copy=False`` is passed (deprecation warning fails because of missing ``warnings``) (:issue:`8152`). - Bug in pickle deserialization that failed for pre-0.14.1 containers with dup items trying to avoid ambiguity when matching block and manager items, when there's only one block there's no ambiguity (:issue:`7794`) - Bug in putting a ``PeriodIndex`` into a ``Series`` would convert to ``int64`` dtype, rather than ``object`` of ``Periods`` (:issue:`7932`) - Bug in ``HDFStore`` iteration when passing a where (:issue:`8014`) - Bug in ``DataFrameGroupby.transform`` when transforming with a passed non-sorted key (:issue:`8046`, :issue:`8430`) - Bug in repeated timeseries line and area plot may result in ``ValueError`` or incorrect kind (:issue:`7733`) - Bug in inference in a ``MultiIndex`` with ``datetime.date`` inputs (:issue:`7888`) - Bug in ``get`` where an ``IndexError`` would not cause the default value to be returned (:issue:`7725`) - Bug in ``offsets.apply``, ``rollforward`` and ``rollback`` may reset nanosecond (:issue:`7697`) - Bug in ``offsets.apply``, ``rollforward`` and ``rollback`` may raise ``AttributeError`` if ``Timestamp`` has ``dateutil`` tzinfo (:issue:`7697`) - Bug in sorting a MultiIndex frame with a ``Float64Index`` (:issue:`8017`) - Bug in inconsistent panel setitem with a rhs of a ``DataFrame`` for alignment (:issue:`7763`) - Bug in ``is_superperiod`` and ``is_subperiod`` cannot handle higher frequencies than ``S`` (:issue:`7760`, :issue:`7772`, :issue:`7803`) - Bug in 32-bit platforms with ``Series.shift`` (:issue:`8129`) - Bug in ``PeriodIndex.unique`` returns int64 ``np.ndarray`` (:issue:`7540`) - Bug in ``groupby.apply`` with a non-affecting mutation in the function (:issue:`8467`) - Bug in ``DataFrame.reset_index`` which has ``MultiIndex`` contains ``PeriodIndex`` or ``DatetimeIndex`` with tz raises ``ValueError`` (:issue:`7746`, :issue:`7793`) - Bug in ``DataFrame.plot`` with ``subplots=True`` may draw unnecessary minor xticks and yticks (:issue:`7801`) - Bug in ``StataReader`` which did not read variable labels in 117 files due to difference between Stata documentation and implementation (:issue:`7816`) - Bug in ``StataReader`` where strings were always converted to 244 characters-fixed width irrespective of underlying string size (:issue:`7858`) - Bug in ``DataFrame.plot`` and ``Series.plot`` may ignore ``rot`` and ``fontsize`` keywords (:issue:`7844`) - Bug in ``DatetimeIndex.value_counts`` doesn't preserve tz (:issue:`7735`) - Bug in ``PeriodIndex.value_counts`` results in ``Int64Index`` (:issue:`7735`) - Bug in ``DataFrame.join`` when doing left join on index and there are multiple matches (:issue:`5391`) - Bug in ``GroupBy.transform()`` where int groups with a transform that didn't preserve the index were incorrectly truncated (:issue:`7972`). - Bug in ``groupby`` where callable objects without name attributes would take the wrong path, and produce a ``DataFrame`` instead of a ``Series`` (:issue:`7929`) - Bug in ``groupby`` error message when a DataFrame grouping column is duplicated (:issue:`7511`) - Bug in ``read_html`` where the ``infer_types`` argument forced coercion of date-likes incorrectly (:issue:`7762`, :issue:`7032`). - Bug in ``Series.str.cat`` with an index which was filtered as to not include the first item (:issue:`7857`) - Bug in ``Timestamp`` cannot parse ``nanosecond`` from string (:issue:`7878`) - Bug in ``Timestamp`` with string offset and ``tz`` results incorrect (:issue:`7833`) - Bug in ``tslib.tz_convert`` and ``tslib.tz_convert_single`` may return different results (:issue:`7798`) - Bug in ``DatetimeIndex.intersection`` of non-overlapping timestamps with tz raises ``IndexError`` (:issue:`7880`) - Bug in alignment with TimeOps and non-unique indexes (:issue:`8363`) - Bug in ``GroupBy.filter()`` where fast path vs. slow path made the filter return a non scalar value that appeared valid but wasn't (:issue:`7870`). - Bug in ``date_range()``/``DatetimeIndex()`` when the timezone was inferred from input dates yet incorrect times were returned when crossing DST boundaries (:issue:`7835`, :issue:`7901`). - Bug in ``to_excel()`` where a negative sign was being prepended to positive infinity and was absent for negative infinity (:issue:`7949`) - Bug in area plot draws legend with incorrect ``alpha`` when ``stacked=True`` (:issue:`8027`) - ``Period`` and ``PeriodIndex`` addition/subtraction with ``np.timedelta64`` results in incorrect internal representations (:issue:`7740`) - Bug in ``Holiday`` with no offset or observance (:issue:`7987`) - Bug in ``DataFrame.to_latex`` formatting when columns or index is a ``MultiIndex`` (:issue:`7982`). - Bug in ``DateOffset`` around Daylight Savings Time produces unexpected results (:issue:`5175`). - Bug in ``DataFrame.shift`` where empty columns would throw ``ZeroDivisionError`` on numpy 1.7 (:issue:`8019`) - Bug in installation where ``html_encoding/*.html`` wasn't installed and therefore some tests were not running correctly (:issue:`7927`). - Bug in ``read_html`` where ``bytes`` objects were not tested for in ``_read`` (:issue:`7927`). - Bug in ``DataFrame.stack()`` when one of the column levels was a datelike (:issue:`8039`) - Bug in broadcasting numpy scalars with ``DataFrame`` (:issue:`8116`) - Bug in ``pivot_table`` performed with nameless ``index`` and ``columns`` raises ``KeyError`` (:issue:`8103`) - Bug in ``DataFrame.plot(kind='scatter')`` draws points and errorbars with different colors when the color is specified by ``c`` keyword (:issue:`8081`) - Bug in ``Float64Index`` where ``iat`` and ``at`` were not testing and were failing (:issue:`8092`). - Bug in ``DataFrame.boxplot()`` where y-limits were not set correctly when producing multiple axes (:issue:`7528`, :issue:`5517`). - Bug in ``read_csv`` where line comments were not handled correctly given a custom line terminator or ``delim_whitespace=True`` (:issue:`8122`). - Bug in ``read_html`` where empty tables caused a ``StopIteration`` (:issue:`7575`) - Bug in casting when setting a column in a same-dtype block (:issue:`7704`) - Bug in accessing groups from a ``GroupBy`` when the original grouper was a tuple (:issue:`8121`). - Bug in ``.at`` that would accept integer indexers on a non-integer index and do fallback (:issue:`7814`) - Bug with kde plot and NaNs (:issue:`8182`) - Bug in ``GroupBy.count`` with float32 data type were nan values were not excluded (:issue:`8169`). - Bug with stacked barplots and NaNs (:issue:`8175`). - Bug in resample with non evenly divisible offsets (e.g. '7s') (:issue:`8371`) - Bug in interpolation methods with the ``limit`` keyword when no values needed interpolating (:issue:`7173`). - Bug where ``col_space`` was ignored in ``DataFrame.to_string()`` when ``header=False`` (:issue:`8230`). - Bug with ``DatetimeIndex.asof`` incorrectly matching partial strings and returning the wrong date (:issue:`8245`). - Bug in plotting methods modifying the global matplotlib rcParams (:issue:`8242`). - Bug in ``DataFrame.__setitem__`` that caused errors when setting a dataframe column to a sparse array (:issue:`8131`) - Bug where ``Dataframe.boxplot()`` failed when entire column was empty (:issue:`8181`). - Bug with messed variables in ``radviz`` visualization (:issue:`8199`). - Bug in interpolation methods with the ``limit`` keyword when no values needed interpolating (:issue:`7173`). - Bug where ``col_space`` was ignored in ``DataFrame.to_string()`` when ``header=False`` (:issue:`8230`). - Bug in ``to_clipboard`` that would clip long column data (:issue:`8305`) - Bug in ``DataFrame`` terminal display: Setting max_column/max_rows to zero did not trigger auto-resizing of dfs to fit terminal width/height (:issue:`7180`). - Bug in OLS where running with "cluster" and "nw_lags" parameters did not work correctly, but also did not throw an error (:issue:`5884`). - Bug in ``DataFrame.dropna`` that interpreted non-existent columns in the subset argument as the 'last column' (:issue:`8303`) - Bug in ``Index.intersection`` on non-monotonic non-unique indexes (:issue:`8362`). - Bug in masked series assignment where mismatching types would break alignment (:issue:`8387`) - Bug in ``NDFrame.equals`` gives false negatives with dtype=object (:issue:`8437`) - Bug in assignment with indexer where type diversity would break alignment (:issue:`8258`) - Bug in ``NDFrame.loc`` indexing when row/column names were lost when target was a list/ndarray (:issue:`6552`) - Regression in ``NDFrame.loc`` indexing when rows/columns were converted to Float64Index if target was an empty list/ndarray (:issue:`7774`) - Bug in ``Series`` that allows it to be indexed by a ``DataFrame`` which has unexpected results. Such indexing is no longer permitted (:issue:`8444`) - Bug in item assignment of a ``DataFrame`` with MultiIndex columns where right-hand-side columns were not aligned (:issue:`7655`) - Suppress FutureWarning generated by NumPy when comparing object arrays containing NaN for equality (:issue:`7065`) - Bug in ``DataFrame.eval()`` where the dtype of the ``not`` operator (``~``) was not correctly inferred as ``bool``. .. _whatsnew_0.15.0.contributors: Contributors ~~~~~~~~~~~~ .. contributors:: v0.14.1..v0.15.0