Version 0.14.1 (July 11, 2014)#

This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.

Highlights include:
- New methods select_dtypes() to select columns based on the dtype and sem() to calculate the standard error of the mean.
- Support for dateutil timezones (see docs).
- Support for ignoring full line comments in the read_csv() text parser.
- New documentation section on Options and Settings.
- Lots of bug fixes.
Enhancements
API Changes
Performance Improvements
Experimental Changes
Bug Fixes

API changes#

Openpyxl now raises a ValueError on construction of the openpyxl writer instead of warning on pandas import (GH 7284).
For StringMethods.extract, when no match is found, the result - only containing NaN values - now also has dtype=object instead of float (GH 7242)
Period objects no longer raise a TypeError when compared using == with another object that isn’t a Period. Instead when comparing a Period with another object using == if the other object isn’t a Period False is returned. (GH 7376)
Previously, the behaviour on resetting the time or not in offsets.apply, rollforward and rollback operations differed between offsets. With the support of the normalize keyword for all offsets(see below) with a default value of False (preserve time), the behaviour changed for certain offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd, BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):
```
In [6]: from pandas.tseries import offsets

In [7]: d = pd.Timestamp('2014-01-01 09:00')

# old behaviour < 0.14.1
In [8]: d + offsets.MonthEnd()
Out[8]: pd.Timestamp('2014-01-31 00:00:00')
```
Starting from 0.14.1 all offsets preserve time by default. The old behaviour can be obtained with normalize=True
```
# new behaviour
In [1]: d + offsets.MonthEnd()
Out[1]: Timestamp('2014-01-31 09:00:00')

In [2]: d + offsets.MonthEnd(normalize=True)
Out[2]: Timestamp('2014-01-31 00:00:00')
```
Note that for the other offsets the default behaviour did not change.
Add back #N/A N/A as a default NA value in text parsing, (regression from 0.12) (GH 5521)
Raise a TypeError on inplace-setting with a .where and a non np.nan value as this is inconsistent with a set-item expression like df[mask] = None (GH 7656)

Enhancements#

Add dropna argument to value_counts and nunique (GH 5569).
Add select_dtypes() method to allow selection of columns based on dtype (GH 7316). See the docs.

All offsets supports the normalize keyword to specify whether offsets.apply, rollforward and rollback resets the time (hour, minute, etc) or not (default False, preserves time) (GH 7156):

import pandas.tseries.offsets as offsets

day = offsets.Day()
day.apply(pd.Timestamp("2014-01-01 09:00"))

day = offsets.Day(normalize=True)
day.apply(pd.Timestamp("2014-01-01 09:00"))

PeriodIndex is represented as the same format as DatetimeIndex (GH 7601)
StringMethods now work on empty Series (GH 7242)
The file parsers read_csv and read_table now ignore line comments provided by the parameter comment, which accepts only a single character for the C reader. In particular, they allow for comments before file data begins (GH 2685)
Add NotImplementedError for simultaneous use of chunksize and nrows for read_csv() (GH 6774).
Tests for basic reading of public S3 buckets now exist (GH 7281).
read_html now sports an encoding argument that is passed to the underlying parser library. You can use this to read non-ascii encoded web pages (GH 7323).
read_excel now supports reading from URLs in the same way that read_csv does. (GH 6809)

Support for dateutil timezones, which can now be used in the same way as pytz timezones across pandas. (GH 4688)

In [3]: rng = pd.date_range(
   ...:     "3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London"
   ...: )
   ...: 

In [4]: rng.tz
Out[4]: tzfile('/usr/share/zoneinfo/Europe/London')

See the docs.

Implemented sem (standard error of the mean) operation for Series, DataFrame, Panel, and Groupby (GH 6897)
Add nlargest and nsmallest to the Series groupby allowlist, which means you can now use these methods on a SeriesGroupBy object (GH 7053).
All offsets apply, rollforward and rollback can now handle np.datetime64, previously results in ApplyTypeError (GH 7452)
Period and PeriodIndex can contain NaT in its values (GH 7485)
Support pickling Series, DataFrame and Panel objects with non-unique labels along item axis (index, columns and items respectively) (GH 7370).
Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index with all null elements (GH 7431)

Performance#

Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: int64, timedelta64, datetime64 (GH 7223)
Improvements in Series.transform for significant performance gains (GH 6496)
Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (GH 7383)
Regression in groupby aggregation of datetime64 dtypes (GH 7555)
Improvements in MultiIndex.from_product for large iterables (GH 7627)

Experimental#

pandas.io.data.Options has a new method, get_all_data method, and now consistently returns a MultiIndexed DataFrame (GH 5602)
io.gbq.read_gbq and io.gbq.to_gbq were refactored to remove the dependency on the Google bq.py command line client. This submodule now uses httplib2 and the Google apiclient and oauth2client API client libraries which should be more stable and, therefore, reliable than bq.py. See the docs. (GH 6937).

Bug fixes#

Bug in DataFrame.where with a symmetric shaped frame and a passed other of a DataFrame (GH 7506)
Bug in Panel indexing with a MultiIndex axis (GH 7516)
Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (GH 7523)
Bug in setitem with list-of-lists and single vs mixed types (GH 7551:)
Bug in time ops with non-aligned Series (GH 7500)
Bug in timedelta inference when assigning an incomplete Series (GH 7592)
Bug in groupby .nth with a Series and integer-like column name (GH 7559)
Bug in Series.get with a boolean accessor (GH 7407)
Bug in value_counts where NaT did not qualify as missing (NaN) (GH 7423)
Bug in to_timedelta that accepted invalid units and misinterpreted ‘m/h’ (GH 7611, GH 6423)
Bug in line plot doesn’t set correct xlim if secondary_y=True (GH 7459)
Bug in grouped hist and scatter plots use old figsize default (GH 7394)
Bug in plotting subplots with DataFrame.plot, hist clears passed ax even if the number of subplots is one (GH 7391).
Bug in plotting subplots with DataFrame.boxplot with by kw raises ValueError if the number of subplots exceeds 1 (GH 7391).
Bug in subplots displays ticklabels and labels in different rule (GH 5897)
Bug in Panel.apply with a MultiIndex as an axis (GH 7469)
Bug in DatetimeIndex.insert doesn’t preserve name and tz (GH 7299)
Bug in DatetimeIndex.asobject doesn’t preserve name (GH 7299)
Bug in MultiIndex slicing with datetimelike ranges (strings and Timestamps), (GH 7429)
Bug in Index.min and max doesn’t handle nan and NaT properly (GH 7261)
Bug in PeriodIndex.min/max results in int (GH 7609)
Bug in resample where fill_method was ignored if you passed how (GH 2073)
Bug in TimeGrouper doesn’t exclude column specified by key (GH 7227)
Bug in DataFrame and Series bar and barh plot raises TypeError when bottom and left keyword is specified (GH 7226)
Bug in DataFrame.hist raises TypeError when it contains non numeric column (GH 7277)
Bug in Index.delete does not preserve name and freq attributes (GH 7302)
Bug in DataFrame.query()/eval where local string variables with the @ sign were being treated as temporaries attempting to be deleted (GH 7300).
Bug in Float64Index which didn’t allow duplicates (GH 7149).
Bug in DataFrame.replace() where truthy values were being replaced (GH 7140).
Bug in StringMethods.extract() where a single match group Series would use the matcher’s name instead of the group name (GH 7313).
Bug in isnull() when mode.use_inf_as_null == True where isnull wouldn’t test True when it encountered an inf/-inf (GH 7315).
Bug in inferred_freq results in None for eastern hemisphere timezones (GH 7310)
Bug in Easter returns incorrect date when offset is negative (GH 7195)
Bug in broadcasting with .div, integer dtypes and divide-by-zero (GH 7325)
Bug in CustomBusinessDay.apply raises NameError when np.datetime64 object is passed (GH 7196)
Bug in MultiIndex.append, concat and pivot_table don’t preserve timezone (GH 6606)
Bug in .loc with a list of indexers on a single-multi index level (that is not nested) (GH 7349)
Bug in Series.map when mapping a dict with tuple keys of different lengths (GH 7333)
Bug all StringMethods now work on empty Series (GH 7242)
Fix delegation of read_sql to read_sql_query when query does not contain ‘select’ (GH 7324).
Bug where a string column name assignment to a DataFrame with a Float64Index raised a TypeError during a call to np.isnan (GH 7366).
Bug where NDFrame.replace() didn’t correctly replace objects with Period values (GH 7379).
Bug in .ix getitem should always return a Series (GH 7150)
Bug in MultiIndex slicing with incomplete indexers (GH 7399)
Bug in MultiIndex slicing with a step in a sliced level (GH 7400)
Bug where negative indexers in DatetimeIndex were not correctly sliced (GH 7408)
Bug where NaT wasn’t repr’d correctly in a MultiIndex (GH 7406, GH 7409).
Bug where bool objects were converted to nan in convert_objects (GH 7416).
Bug in quantile ignoring the axis keyword argument (GH 7306)
Bug where nanops._maybe_null_out doesn’t work with complex numbers (GH 7353)
Bug in several nanops functions when axis==0 for 1-dimensional nan arrays (GH 7354)
Bug where nanops.nanmedian doesn’t work when axis==None (GH 7352)
Bug where nanops._has_infs doesn’t work with many dtypes (GH 7357)
Bug in StataReader.data where reading a 0-observation dta failed (GH 7369)
Bug in StataReader when reading Stata 13 (117) files containing fixed width strings (GH 7360)
Bug in StataWriter where encoding was ignored (GH 7286)
Bug in DatetimeIndex comparison doesn’t handle NaT properly (GH 7529)
Bug in passing input with tzinfo to some offsets apply, rollforward or rollback resets tzinfo or raises ValueError (GH 7465)
Bug in DatetimeIndex.to_period, PeriodIndex.asobject, PeriodIndex.to_timestamp doesn’t preserve name (GH 7485)
Bug in DatetimeIndex.to_period and PeriodIndex.to_timestamp handle NaT incorrectly (GH 7228)
Bug in offsets.apply, rollforward and rollback may return normal datetime (GH 7502)
Bug in resample raises ValueError when target contains NaT (GH 7227)
Bug in Timestamp.tz_localize resets nanosecond info (GH 7534)
Bug in DatetimeIndex.asobject raises ValueError when it contains NaT (GH 7539)
Bug in Timestamp.__new__ doesn’t preserve nanosecond properly (GH 7610)
Bug in Index.astype(float) where it would return an object dtype Index (GH 7464).
Bug in DataFrame.reset_index loses tz (GH 3950)
Bug in DatetimeIndex.freqstr raises AttributeError when freq is None (GH 7606)
Bug in GroupBy.size created by TimeGrouper raises AttributeError (GH 7453)
Bug in single column bar plot is misaligned (GH 7498).
Bug in area plot with tz-aware time series raises ValueError (GH 7471)
Bug in non-monotonic Index.union may preserve name incorrectly (GH 7458)
Bug in DatetimeIndex.intersection doesn’t preserve timezone (GH 4690)
Bug in rolling_var where a window larger than the array would raise an error(GH 7297)
Bug with last plotted timeseries dictating xlim (GH 2960)
Bug with secondary_y axis not being considered for timeseries xlim (GH 3490)
Bug in Float64Index assignment with a non scalar indexer (GH 7586)
Bug in pandas.core.strings.str_contains does not properly match in a case insensitive fashion when regex=False and case=False (GH 7505)
Bug in expanding_cov, expanding_corr, rolling_cov, and rolling_corr for two arguments with mismatched index (GH 7512)
Bug in to_sql taking the boolean column as text column (GH 7678)
Bug in grouped hist doesn’t handle rot kw and sharex kw properly (GH 7234)
Bug in .loc performing fallback integer indexing with object dtype indices (GH 7496)
Bug (regression) in PeriodIndex constructor when passed Series objects (GH 7701).

Contributors#

A total of 46 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

Andrew Rosenfeld
Andy Hayden
Benjamin Adams +
Benjamin M. Gross +
Brian Quistorff +
Brian Wignall +
DSM
Daniel Waeber
David Bew +
David Stephens
Jacob Schaer
Jan Schulz
John David Reaver
John W. O’Brien
Joris Van den Bossche
Julien Danjou +
K.-Michael Aye
Kevin Sheppard
Kyle Meyer
Matt Wittmann
Matthew Brett +
Michael Mueller +
Mortada Mehyar
Phillip Cloud
Rob Levy +
Schaer, Jacob C +
Stephan Hoyer
Thomas Kluyver
Todd Jennings
Tom Augspurger
TomAugspurger
bwignall
clham
dsm054 +
helger +
immerrr
jaimefrio
jreback
lexual
onesandzeroes
rockg
sanguineturtle +
seth-p +
sinhrks
unknown
yelite +