Version 0.18.1 (May 3, 2016)
----------------------------

This is a minor bug-fix release from 0.18.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.

Highlights include:

- ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see :ref:`here `
- ``pd.to_datetime()`` has gained the ability to assemble dates from a ``DataFrame``, see :ref:`here `
- Method chaining improvements, see :ref:`here `.
- Custom business hour offset, see :ref:`here `.
- Many bug fixes in the handling of ``sparse``, see :ref:`here `
- Expanded the :ref:`Tutorials section ` with a feature on modern pandas, courtesy of `@TomAugsburger `__. (:issue:`13045`). .. contents:: What's new in v0.18.1 :local: :backlinks: none .. _whatsnew_0181.new_features: New features ~~~~~~~~~~~~ .. _whatsnew_0181.enhancements.custombusinesshour: Custom business hour ^^^^^^^^^^^^^^^^^^^^ The ``CustomBusinessHour`` is a mixture of ``BusinessHour`` and ``CustomBusinessDay`` which allows you to specify arbitrary holidays. For details, see :ref:`Custom Business Hour ` (:issue:`11514`) .. ipython:: python from pandas.tseries.offsets import CustomBusinessHour from pandas.tseries.holiday import USFederalHolidayCalendar bhour_us = CustomBusinessHour(calendar=USFederalHolidayCalendar()) Friday before MLK Day .. ipython:: python import datetime dt = datetime.datetime(2014, 1, 17, 15) dt + bhour_us Tuesday after MLK Day (Monday is skipped because it's a holiday) .. ipython:: python dt + bhour_us * 2 .. _whatsnew_0181.deferred_ops: Method ``.groupby(..)`` syntax with window and resample operations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``.groupby(...)`` has been enhanced to provide convenient syntax when working with ``.rolling(..)``, ``.expanding(..)`` and ``.resample(..)`` per group, see (:issue:`12486`, :issue:`12738`). You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. These return another deferred object (similar to what ``.rolling()`` and ``.expanding()`` do on ungrouped pandas objects). You can then operate on these ``RollingGroupby`` objects in a similar manner. Previously you would have to do this to get a rolling window mean per-group: .. ipython:: python df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)}) df .. code-block:: ipython In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean()) Out[1]: A 1 0 NaN 1 NaN 2 NaN 3 1.5 4 2.5 5 3.5 6 4.5 7 5.5 8 6.5 9 7.5 10 8.5 11 9.5 12 10.5 13 11.5 14 12.5 15 13.5 16 14.5 17 15.5 18 16.5 19 17.5 2 20 NaN 21 NaN 22 NaN 23 21.5 24 22.5 25 23.5 26 24.5 27 25.5 28 26.5 29 27.5 30 28.5 31 29.5 3 32 NaN 33 NaN 34 NaN 35 33.5 36 34.5 37 35.5 38 36.5 39 37.5 Name: B, dtype: float64 Now you can do: .. ipython:: python df.groupby("A").rolling(4).B.mean() For ``.resample(..)`` type of operations, previously you would have to: .. ipython:: python df = pd.DataFrame( { "date": pd.date_range(start="2016-01-01", periods=4, freq="W"), "group": [1, 1, 2, 2], "val": [5, 6, 7, 8], } ).set_index("date") df .. code-block:: ipython In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill()) Out[1]: group val group date 1 2016-01-03 1 5 2016-01-04 1 5 2016-01-05 1 5 2016-01-06 1 5 2016-01-07 1 5 2016-01-08 1 5 2016-01-09 1 5 2016-01-10 1 6 2 2016-01-17 2 7 2016-01-18 2 7 2016-01-19 2 7 2016-01-20 2 7 2016-01-21 2 7 2016-01-22 2 7 2016-01-23 2 7 2016-01-24 2 8 Now you can do: .. code-block:: ipython In[1]: df.groupby("group").resample("1D").ffill() Out[1]: group val group date 1 2016-01-03 1 5 2016-01-04 1 5 2016-01-05 1 5 2016-01-06 1 5 2016-01-07 1 5 2016-01-08 1 5 2016-01-09 1 5 2016-01-10 1 6 2 2016-01-17 2 7 2016-01-18 2 7 2016-01-19 2 7 2016-01-20 2 7 2016-01-21 2 7 2016-01-22 2 7 2016-01-23 2 7 2016-01-24 2 8 .. _whatsnew_0181.enhancements.method_chain: Method chaining improvements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following methods / indexers now accept a ``callable``. It is intended to make these more useful in method chains, see the :ref:`documentation `. (:issue:`11485`, :issue:`12533`) - ``.where()`` and ``.mask()`` - ``.loc[]``, ``iloc[]`` and ``.ix[]`` - ``[]`` indexing Methods ``.where()`` and ``.mask()`` """""""""""""""""""""""""""""""""""" These can accept a callable for the condition and ``other`` arguments. .. ipython:: python df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}) df.where(lambda x: x > 4, lambda x: x + 10) Methods ``.loc[]``, ``.iloc[]``, ``.ix[]`` """""""""""""""""""""""""""""""""""""""""" These can accept a callable, and a tuple of callable as a slicer. The callable can return a valid boolean indexer or anything which is valid for these indexer's input. .. ipython:: python # callable returns bool indexer df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10] # callable returns list of labels df.loc[lambda x: [1, 2], lambda x: ["A", "B"]] Indexing with ``[]`` """""""""""""""""""" Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Panel. The callable must return a valid input for ``[]`` indexing depending on its class and index type. .. ipython:: python df[lambda x: "A"] Using these methods / indexers, you can chain data selection operations without using temporary variable. .. ipython:: python bb = pd.read_csv("data/baseball.csv", index_col="id") (bb.groupby(["year", "team"]).sum(numeric_only=True).loc[lambda df: df.r > 100]) .. _whatsnew_0181.partial_string_indexing: Partial string indexing on ``DatetimeIndex`` when part of a ``MultiIndex`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiIndex`` (:issue:`10331`) .. code-block:: ipython In [20]: dft2 = pd.DataFrame( ....: np.random.randn(20, 1), ....: columns=["A"], ....: index=pd.MultiIndex.from_product( ....: [pd.date_range("20130101", periods=10, freq="12H"), ["a", "b"]] ....: ), ....: ) ....: In [21]: dft2 Out[21]: A 2013-01-01 00:00:00 a 0.469112 b -0.282863 2013-01-01 12:00:00 a -1.509059 b -1.135632 2013-01-02 00:00:00 a 1.212112 ... ... 2013-01-04 12:00:00 b 0.271860 2013-01-05 00:00:00 a -0.424972 b 0.567020 2013-01-05 12:00:00 a 0.276232 b -1.087401 [20 rows x 1 columns] In [22]: dft2.loc["2013-01-05"] Out[22]: A 2013-01-05 00:00:00 a -0.424972 b 0.567020 2013-01-05 12:00:00 a 0.276232 b -1.087401 [4 rows x 1 columns] On other levels .. code-block:: ipython In [26]: idx = pd.IndexSlice In [27]: dft2 = dft2.swaplevel(0, 1).sort_index() In [28]: dft2 Out[28]: A a 2013-01-01 00:00:00 0.469112 2013-01-01 12:00:00 -1.509059 2013-01-02 00:00:00 1.212112 2013-01-02 12:00:00 0.119209 2013-01-03 00:00:00 -0.861849 ... ... b 2013-01-03 12:00:00 1.071804 2013-01-04 00:00:00 -0.706771 2013-01-04 12:00:00 0.271860 2013-01-05 00:00:00 0.567020 2013-01-05 12:00:00 -1.087401 [20 rows x 1 columns] In [29]: dft2.loc[idx[:, "2013-01-05"], :] Out[29]: A a 2013-01-05 00:00:00 -0.424972 2013-01-05 12:00:00 0.276232 b 2013-01-05 00:00:00 0.567020 2013-01-05 12:00:00 -1.087401 [4 rows x 1 columns] .. _whatsnew_0181.enhancements.assembling: Assembling datetimes ^^^^^^^^^^^^^^^^^^^^ ``pd.to_datetime()`` has gained the ability to assemble datetimes from a passed in ``DataFrame`` or a dict. (:issue:`8158`). .. ipython:: python df = pd.DataFrame( {"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]} ) df Assembling using the passed frame. .. ipython:: python pd.to_datetime(df) You can pass only the columns that you need to assemble. .. ipython:: python pd.to_datetime(df[["year", "month", "day"]]) .. _whatsnew_0181.other: Other enhancements ^^^^^^^^^^^^^^^^^^ - ``pd.read_csv()`` now supports ``delim_whitespace=True`` for the Python engine (:issue:`12958`) - ``pd.read_csv()`` now supports opening ZIP files that contains a single CSV, via extension inference or explicit ``compression='zip'`` (:issue:`12175`) - ``pd.read_csv()`` now supports opening files using xz compression, via extension inference or explicit ``compression='xz'`` is specified; ``xz`` compressions is also supported by ``DataFrame.to_csv`` in the same way (:issue:`11852`) - ``pd.read_msgpack()`` now always gives writeable ndarrays even when compression is used (:issue:`12359`). - ``pd.read_msgpack()`` now supports serializing and de-serializing categoricals with msgpack (:issue:`12573`) - ``.to_json()`` now supports ``NDFrames`` that contain categorical and sparse data (:issue:`10778`) - ``interpolate()`` now supports ``method='akima'`` (:issue:`7588`). - ``pd.read_excel()`` now accepts path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path, in line with other ``read_*`` functions (:issue:`12655`) - Added ``.weekday_name`` property as a component to ``DatetimeIndex`` and the ``.dt`` accessor. (:issue:`11128`) - ``Index.take`` now handles ``allow_fill`` and ``fill_value`` consistently (:issue:`12631`) .. ipython:: python idx = pd.Index([1.0, 2.0, 3.0, 4.0], dtype="float") # default, allow_fill=True, fill_value=None idx.take([2, -1]) idx.take([2, -1], fill_value=True) - ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables ` (:issue:`10008`, :issue:`10103`) .. ipython:: python idx = pd.Index(["a|b", "a|c", "b|c"]) idx.str.get_dummies("|") - ``pd.crosstab()`` has gained a ``normalize`` argument for normalizing frequency tables (:issue:`12569`). Examples in the updated docs :ref:`here `. - ``.resample(..).interpolate()`` is now supported (:issue:`12925`) - ``.isin()`` now accepts passed ``sets`` (:issue:`12988`) .. _whatsnew_0181.sparse: Sparse changes ~~~~~~~~~~~~~~ These changes conform sparse handling to return the correct types and work to make a smoother experience with indexing. ``SparseArray.take`` now returns a scalar for scalar input, ``SparseArray`` for others. Furthermore, it handles a negative indexer with the same rule as ``Index`` (:issue:`10560`, :issue:`12796`) .. code-block:: python s = pd.SparseArray([np.nan, np.nan, 1, 2, 3, np.nan, 4, 5, np.nan, 6]) s.take(0) s.take([1, 2, 3]) - Bug in ``SparseSeries[]`` indexing with ``Ellipsis`` raises ``KeyError`` (:issue:`9467`) - Bug in ``SparseArray[]`` indexing with tuples are not handled properly (:issue:`12966`) - Bug in ``SparseSeries.loc[]`` with list-like input raises ``TypeError`` (:issue:`10560`) - Bug in ``SparseSeries.iloc[]`` with scalar input may raise ``IndexError`` (:issue:`10560`) - Bug in ``SparseSeries.loc[]``, ``.iloc[]`` with ``slice`` returns ``SparseArray``, rather than ``SparseSeries`` (:issue:`10560`) - Bug in ``SparseDataFrame.loc[]``, ``.iloc[]`` may results in dense ``Series``, rather than ``SparseSeries`` (:issue:`12787`) - Bug in ``SparseArray`` addition ignores ``fill_value`` of right hand side (:issue:`12910`) - Bug in ``SparseArray`` mod raises ``AttributeError`` (:issue:`12910`) - Bug in ``SparseArray`` pow calculates ``1 ** np.nan`` as ``np.nan`` which must be 1 (:issue:`12910`) - Bug in ``SparseArray`` comparison output may incorrect result or raise ``ValueError`` (:issue:`12971`) - Bug in ``SparseSeries.__repr__`` raises ``TypeError`` when it is longer than ``max_rows`` (:issue:`10560`) - Bug in ``SparseSeries.shape`` ignores ``fill_value`` (:issue:`10452`) - Bug in ``SparseSeries`` and ``SparseArray`` may have different ``dtype`` from its dense values (:issue:`12908`) - Bug in ``SparseSeries.reindex`` incorrectly handle ``fill_value`` (:issue:`12797`) - Bug in ``SparseArray.to_frame()`` results in ``DataFrame``, rather than ``SparseDataFrame`` (:issue:`9850`) - Bug in ``SparseSeries.value_counts()`` does not count ``fill_value`` (:issue:`6749`) - Bug in ``SparseArray.to_dense()`` does not preserve ``dtype`` (:issue:`10648`) - Bug in ``SparseArray.to_dense()`` incorrectly handle ``fill_value`` (:issue:`12797`) - Bug in ``pd.concat()`` of ``SparseSeries`` results in dense (:issue:`10536`) - Bug in ``pd.concat()`` of ``SparseDataFrame`` incorrectly handle ``fill_value`` (:issue:`9765`) - Bug in ``pd.concat()`` of ``SparseDataFrame`` may raise ``AttributeError`` (:issue:`12174`) - Bug in ``SparseArray.shift()`` may raise ``NameError`` or ``TypeError`` (:issue:`12908`) .. _whatsnew_0181.api: API changes ~~~~~~~~~~~ .. _whatsnew_0181.api.groubynth: Method ``.groupby(..).nth()`` changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The index in ``.groupby(..).nth()`` output is now more consistent when the ``as_index`` argument is passed (:issue:`11039`): .. ipython:: python df = pd.DataFrame({"A": ["a", "b", "a"], "B": [1, 2, 3]}) df Previous behavior: .. code-block:: ipython In [3]: df.groupby('A', as_index=True)['B'].nth(0) Out[3]: 0 1 1 2 Name: B, dtype: int64 In [4]: df.groupby('A', as_index=False)['B'].nth(0) Out[4]: 0 1 1 2 Name: B, dtype: int64 New behavior: .. ipython:: python df.groupby("A", as_index=True)["B"].nth(0) df.groupby("A", as_index=False)["B"].nth(0) Furthermore, previously, a ``.groupby`` would always sort, regardless if ``sort=False`` was passed with ``.nth()``. .. ipython:: python np.random.seed(1234) df = pd.DataFrame(np.random.randn(100, 2), columns=["a", "b"]) df["c"] = np.random.randint(0, 4, 100) Previous behavior: .. code-block:: ipython In [4]: df.groupby('c', sort=True).nth(1) Out[4]: a b c 0 -0.334077 0.002118 1 0.036142 -2.074978 2 -0.720589 0.887163 3 0.859588 -0.636524 In [5]: df.groupby('c', sort=False).nth(1) Out[5]: a b c 0 -0.334077 0.002118 1 0.036142 -2.074978 2 -0.720589 0.887163 3 0.859588 -0.636524 New behavior: .. ipython:: python df.groupby("c", sort=True).nth(1) df.groupby("c", sort=False).nth(1) .. _whatsnew_0181.numpy_compatibility: NumPy function compatibility ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Compatibility between pandas array-like methods (e.g. ``sum`` and ``take``) and their ``numpy`` counterparts has been greatly increased by augmenting the signatures of the ``pandas`` methods so as to accept arguments that can be passed in from ``numpy``, even if they are not necessarily used in the ``pandas`` implementation (:issue:`12644`, :issue:`12638`, :issue:`12687`) - ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`) - Bug in numpy compatibility of ``np.round()`` on a ``Series`` (:issue:`12600`) An example of this signature augmentation is illustrated below: .. code-block:: python sp = pd.SparseDataFrame([1, 2, 3]) sp Previous behaviour: .. code-block:: ipython In [2]: np.cumsum(sp, axis=0) ... TypeError: cumsum() takes at most 2 arguments (4 given) New behaviour: .. code-block:: python np.cumsum(sp, axis=0) .. _whatsnew_0181.apply_resample: Using ``.apply`` on GroupBy resampling ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using ``apply`` on resampling groupby operations (using a ``pd.TimeGrouper``) now has the same output types as similar ``apply`` calls on other groupby operations. (:issue:`11742`). .. ipython:: python df = pd.DataFrame( {"date": pd.to_datetime(["10/10/2000", "11/10/2000"]), "value": [10, 13]} ) df Previous behavior: .. code-block:: ipython In [1]: df.groupby(pd.TimeGrouper(key='date', ...: freq='M')).apply(lambda x: x.value.sum()) Out[1]: ... TypeError: cannot concatenate a non-NDFrame object # Output is a Series In [2]: df.groupby(pd.TimeGrouper(key='date', ...: freq='M')).apply(lambda x: x[['value']].sum()) Out[2]: date 2000-10-31 value 10 2000-11-30 value 13 dtype: int64 New behavior: .. code-block:: ipython # Output is a Series In [55]: df.groupby(pd.TimeGrouper(key='date', ...: freq='M')).apply(lambda x: x.value.sum()) Out[55]: date 2000-10-31 10 2000-11-30 13 Freq: M, dtype: int64 # Output is a DataFrame In [56]: df.groupby(pd.TimeGrouper(key='date', ...: freq='M')).apply(lambda x: x[['value']].sum()) Out[56]: value date 2000-10-31 10 2000-11-30 13 .. _whatsnew_0181.read_csv_exceptions: Changes in ``read_csv`` exceptions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In order to standardize the ``read_csv`` API for both the ``c`` and ``python`` engines, both will now raise an ``EmptyDataError``, a subclass of ``ValueError``, in response to empty columns or header (:issue:`12493`, :issue:`12506`) Previous behaviour: .. code-block:: ipython In [1]: import io In [2]: df = pd.read_csv(io.StringIO(''), engine='c') ... ValueError: No columns to parse from file In [3]: df = pd.read_csv(io.StringIO(''), engine='python') ... StopIteration New behaviour: .. code-block:: ipython In [1]: df = pd.read_csv(io.StringIO(''), engine='c') ... pandas.io.common.EmptyDataError: No columns to parse from file In [2]: df = pd.read_csv(io.StringIO(''), engine='python') ... pandas.io.common.EmptyDataError: No columns to parse from file In addition to this error change, several others have been made as well: - ``CParserError`` now sub-classes ``ValueError`` instead of just a ``Exception`` (:issue:`12551`) - A ``CParserError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine cannot parse a column (:issue:`12506`) - A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when the ``c`` engine encounters a ``NaN`` value in an integer column (:issue:`12506`) - A ``ValueError`` is now raised instead of a generic ``Exception`` in ``read_csv`` when ``true_values`` is specified, and the ``c`` engine encounters an element in a column containing unencodable bytes (:issue:`12506`) - ``pandas.parser.OverflowError`` exception has been removed and has been replaced with Python's built-in ``OverflowError`` exception (:issue:`12506`) - ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`) .. _whatsnew_0181.api.to_datetime: Method ``to_datetime`` error changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'``. Furthermore, an ``OutOfBoundsDateime`` exception will be raised when an out-of-range value is encountered for that unit when ``errors='raise'``. (:issue:`11758`, :issue:`13052`, :issue:`13059`) Previous behaviour: .. code-block:: ipython In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce') Out[27]: NaT In [28]: pd.to_datetime(11111111, unit='D', errors='ignore') OverflowError: Python int too large to convert to C long In [29]: pd.to_datetime(11111111, unit='D', errors='raise') OverflowError: Python int too large to convert to C long New behaviour: .. code-block:: ipython In [2]: pd.to_datetime(1420043460, unit='s', errors='coerce') Out[2]: Timestamp('2014-12-31 16:31:00') In [3]: pd.to_datetime(11111111, unit='D', errors='ignore') Out[3]: 11111111 In [4]: pd.to_datetime(11111111, unit='D', errors='raise') OutOfBoundsDatetime: cannot convert input with unit 'D' .. _whatsnew_0181.api.other: Other API changes ^^^^^^^^^^^^^^^^^ - ``.swaplevel()`` for ``Series``, ``DataFrame``, ``Panel``, and ``MultiIndex`` now features defaults for its first two parameters ``i`` and ``j`` that swap the two innermost levels of the index. (:issue:`12934`) - ``.searchsorted()`` for ``Index`` and ``TimedeltaIndex`` now accept a ``sorter`` argument to maintain compatibility with numpy's ``searchsorted`` function (:issue:`12238`) - ``Period`` and ``PeriodIndex`` now raises ``IncompatibleFrequency`` error which inherits ``ValueError`` rather than raw ``ValueError`` (:issue:`12615`) - ``Series.apply`` for category dtype now applies the passed function to each of the ``.categories`` (and not the ``.codes``), and returns a ``category`` dtype if possible (:issue:`12473`) - ``read_csv`` will now raise a ``TypeError`` if ``parse_dates`` is neither a boolean, list, or dictionary (matches the doc-string) (:issue:`5636`) - The default for ``.query()/.eval()`` is now ``engine=None``, which will use ``numexpr`` if it's installed; otherwise it will fallback to the ``python`` engine. This mimics the pre-0.18.1 behavior if ``numexpr`` is installed (and which, previously, if numexpr was not installed, ``.query()/.eval()`` would raise). (:issue:`12749`) - ``pd.show_versions()`` now includes ``pandas_datareader`` version (:issue:`12740`) - Provide a proper ``__name__`` and ``__qualname__`` attributes for generic functions (:issue:`12021`) - ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`) - ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`) - Compat with ``scipy`` > 0.17 for deprecated ``piecewise_polynomial`` interpolation method; support for the replacement ``from_derivatives`` method (:issue:`12887`) .. _whatsnew_0181.deprecations: Deprecations ^^^^^^^^^^^^ - The method name ``Index.sym_diff()`` is deprecated and can be replaced by ``Index.symmetric_difference()`` (:issue:`12591`) - The method name ``Categorical.sort()`` is deprecated in favor of ``Categorical.sort_values()`` (:issue:`12882`) .. _whatsnew_0181.performance: Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ - Improved speed of SAS reader (:issue:`12656`, :issue:`12961`) - Performance improvements in ``.groupby(..).cumcount()`` (:issue:`11039`) - Improved memory usage in ``pd.read_csv()`` when using ``skiprows=an_integer`` (:issue:`13005`) - Improved performance of ``DataFrame.to_sql`` when checking case sensitivity for tables. Now only checks if table has been created correctly when table name is not lower case. (:issue:`12876`) - Improved performance of ``Period`` construction and time series plotting (:issue:`12903`, :issue:`11831`). - Improved performance of ``.str.encode()`` and ``.str.decode()`` methods (:issue:`13008`) - Improved performance of ``to_numeric`` if input is numeric dtype (:issue:`12777`) - Improved performance of sparse arithmetic with ``IntIndex`` (:issue:`13036`) .. _whatsnew_0181.bug_fixes: Bug fixes ~~~~~~~~~ - ``usecols`` parameter in ``pd.read_csv`` is now respected even when the lines of a CSV file are not even (:issue:`12203`) - Bug in ``groupby.transform(..)`` when ``axis=1`` is specified with a non-monotonic ordered index (:issue:`12713`) - Bug in ``Period`` and ``PeriodIndex`` creation raises ``KeyError`` if ``freq="Minute"`` is specified. Note that "Minute" freq is deprecated in v0.17.0, and recommended to use ``freq="T"`` instead (:issue:`11854`) - Bug in ``.resample(...).count()`` with a ``PeriodIndex`` always raising a ``TypeError`` (:issue:`12774`) - Bug in ``.resample(...)`` with a ``PeriodIndex`` casting to a ``DatetimeIndex`` when empty (:issue:`12868`) - Bug in ``.resample(...)`` with a ``PeriodIndex`` when resampling to an existing frequency (:issue:`12770`) - Bug in printing data which contains ``Period`` with different ``freq`` raises ``ValueError`` (:issue:`12615`) - Bug in ``Series`` construction with ``Categorical`` and ``dtype='category'`` is specified (:issue:`12574`) - Bugs in concatenation with a coercible dtype was too aggressive, resulting in different dtypes in output formatting when an object was longer than ``display.max_rows`` (:issue:`12411`, :issue:`12045`, :issue:`11594`, :issue:`10571`, :issue:`12211`) - Bug in ``float_format`` option with option not being validated as a callable. (:issue:`12706`) - Bug in ``GroupBy.filter`` when ``dropna=False`` and no groups fulfilled the criteria (:issue:`12768`) - Bug in ``__name__`` of ``.cum*`` functions (:issue:`12021`) - Bug in ``.astype()`` of a ``Float64Inde/Int64Index`` to an ``Int64Index`` (:issue:`12881`) - Bug in round tripping an integer based index in ``.to_json()/.read_json()`` when ``orient='index'`` (the default) (:issue:`12866`) - Bug in plotting ``Categorical`` dtypes cause error when attempting stacked bar plot (:issue:`13019`) - Compat with >= ``numpy`` 1.11 for ``NaT`` comparisons (:issue:`12969`) - Bug in ``.drop()`` with a non-unique ``MultiIndex``. (:issue:`12701`) - Bug in ``.concat`` of datetime tz-aware and naive DataFrames (:issue:`12467`) - Bug in correctly raising a ``ValueError`` in ``.resample(..).fillna(..)`` when passing a non-string (:issue:`12952`) - Bug fixes in various encoding and header processing issues in ``pd.read_sas()`` (:issue:`12659`, :issue:`12654`, :issue:`12647`, :issue:`12809`) - Bug in ``pd.crosstab()`` where would silently ignore ``aggfunc`` if ``values=None`` (:issue:`12569`). - Potential segfault in ``DataFrame.to_json`` when serialising ``datetime.time`` (:issue:`11473`). - Potential segfault in ``DataFrame.to_json`` when attempting to serialise 0d array (:issue:`11299`). - Segfault in ``to_json`` when attempting to serialise a ``DataFrame`` or ``Series`` with non-ndarray values; now supports serialization of ``category``, ``sparse``, and ``datetime64[ns, tz]`` dtypes (:issue:`10778`). - Bug in ``DataFrame.to_json`` with unsupported dtype not passed to default handler (:issue:`12554`). - Bug in ``.align`` not returning the sub-class (:issue:`12983`) - Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`) - Bug in ``ABCPanel`` in which ``Panel4D`` was not being considered as a valid instance of this generic type (:issue:`12810`) - Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`) - Bug in ``Timestamp.__repr__`` that caused ``pprint`` to fail in nested structures (:issue:`12622`) - Bug in ``Timedelta.min`` and ``Timedelta.max``, the properties now report the true minimum/maximum ``timedeltas`` as recognized by pandas. See the :ref:`documentation `. 