This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
New methods select_dtypes() to select columns based on the dtype and sem() to calculate the standard error of the mean.
select_dtypes()
sem()
Support for dateutil timezones (see docs).
Support for ignoring full line comments in the read_csv() text parser.
read_csv()
New documentation section on Options and Settings.
Lots of bug fixes.
Enhancements
API Changes
Performance Improvements
Experimental Changes
Bug Fixes
Openpyxl now raises a ValueError on construction of the openpyxl writer instead of warning on pandas import (GH7284).
For StringMethods.extract, when no match is found, the result - only containing NaN values - now also has dtype=object instead of float (GH7242)
StringMethods.extract
NaN
dtype=object
float
Period objects no longer raise a TypeError when compared using == with another object that isn’t a Period. Instead when comparing a Period with another object using == if the other object isn’t a Period False is returned. (GH7376)
Period
TypeError
==
False
Previously, the behaviour on resetting the time or not in offsets.apply, rollforward and rollback operations differed between offsets. With the support of the normalize keyword for all offsets(see below) with a default value of False (preserve time), the behaviour changed for certain offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd, BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):
offsets.apply
rollforward
rollback
normalize
In [6]: from pandas.tseries import offsets In [7]: d = pd.Timestamp('2014-01-01 09:00') # old behaviour < 0.14.1 In [8]: d + offsets.MonthEnd() Out[8]: pd.Timestamp('2014-01-31 00:00:00')
Starting from 0.14.1 all offsets preserve time by default. The old behaviour can be obtained with normalize=True
normalize=True
# new behaviour In [1]: d + offsets.MonthEnd() Out[1]: Timestamp('2014-01-31 09:00:00') In [2]: d + offsets.MonthEnd(normalize=True) Out[2]: Timestamp('2014-01-31 00:00:00')
Note that for the other offsets the default behaviour did not change.
Add back #N/A N/A as a default NA value in text parsing, (regression from 0.12) (GH5521)
#N/A N/A
Raise a TypeError on inplace-setting with a .where and a non np.nan value as this is inconsistent with a set-item expression like df[mask] = None (GH7656)
.where
np.nan
df[mask] = None
Add dropna argument to value_counts and nunique (GH5569).
dropna
value_counts
nunique
Add select_dtypes() method to allow selection of columns based on dtype (GH7316). See the docs.
All offsets supports the normalize keyword to specify whether offsets.apply, rollforward and rollback resets the time (hour, minute, etc) or not (default False, preserves time) (GH7156):
offsets
import pandas.tseries.offsets as offsets day = offsets.Day() day.apply(pd.Timestamp("2014-01-01 09:00")) day = offsets.Day(normalize=True) day.apply(pd.Timestamp("2014-01-01 09:00"))
PeriodIndex is represented as the same format as DatetimeIndex (GH7601)
PeriodIndex
DatetimeIndex
StringMethods now work on empty Series (GH7242)
StringMethods
The file parsers read_csv and read_table now ignore line comments provided by the parameter comment, which accepts only a single character for the C reader. In particular, they allow for comments before file data begins (GH2685)
read_csv
read_table
comment
Add NotImplementedError for simultaneous use of chunksize and nrows for read_csv() (GH6774).
NotImplementedError
chunksize
nrows
Tests for basic reading of public S3 buckets now exist (GH7281).
read_html now sports an encoding argument that is passed to the underlying parser library. You can use this to read non-ascii encoded web pages (GH7323).
read_html
encoding
read_excel now supports reading from URLs in the same way that read_csv does. (GH6809)
read_excel
Support for dateutil timezones, which can now be used in the same way as pytz timezones across pandas. (GH4688)
In [3]: rng = pd.date_range( ...: "3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London" ...: ) ...: In [4]: rng.tz Out[4]: tzfile('/usr/share/zoneinfo/Europe/London')
See the docs.
Implemented sem (standard error of the mean) operation for Series, DataFrame, Panel, and Groupby (GH6897)
sem
Series
DataFrame
Panel
Groupby
Add nlargest and nsmallest to the Series groupby allowlist, which means you can now use these methods on a SeriesGroupBy object (GH7053).
nlargest
nsmallest
groupby
SeriesGroupBy
All offsets apply, rollforward and rollback can now handle np.datetime64, previously results in ApplyTypeError (GH7452)
apply
np.datetime64
ApplyTypeError
Period and PeriodIndex can contain NaT in its values (GH7485)
NaT
Support pickling Series, DataFrame and Panel objects with non-unique labels along item axis (index, columns and items respectively) (GH7370).
index
columns
items
Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index with all null elements (GH7431)
Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: int64, timedelta64, datetime64 (GH7223)
int64
timedelta64
datetime64
Improvements in Series.transform for significant performance gains (GH6496)
Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (GH7383)
Regression in groupby aggregation of datetime64 dtypes (GH7555)
Improvements in MultiIndex.from_product for large iterables (GH7627)
MultiIndex.from_product
pandas.io.data.Options has a new method, get_all_data method, and now consistently returns a MultiIndexed DataFrame (GH5602)
pandas.io.data.Options
get_all_data
io.gbq.read_gbq and io.gbq.to_gbq were refactored to remove the dependency on the Google bq.py command line client. This submodule now uses httplib2 and the Google apiclient and oauth2client API client libraries which should be more stable and, therefore, reliable than bq.py. See the docs. (GH6937).
io.gbq.read_gbq
io.gbq.to_gbq
bq.py
httplib2
apiclient
oauth2client
Bug in DataFrame.where with a symmetric shaped frame and a passed other of a DataFrame (GH7506)
DataFrame.where
Bug in Panel indexing with a MultiIndex axis (GH7516)
Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (GH7523)
Bug in setitem with list-of-lists and single vs mixed types (GH7551:)
Bug in time ops with non-aligned Series (GH7500)
Bug in timedelta inference when assigning an incomplete Series (GH7592)
Bug in groupby .nth with a Series and integer-like column name (GH7559)
.nth
Bug in Series.get with a boolean accessor (GH7407)
Series.get
Bug in value_counts where NaT did not qualify as missing (NaN) (GH7423)
Bug in to_timedelta that accepted invalid units and misinterpreted ‘m/h’ (GH7611, GH6423)
to_timedelta
Bug in line plot doesn’t set correct xlim if secondary_y=True (GH7459)
xlim
secondary_y=True
Bug in grouped hist and scatter plots use old figsize default (GH7394)
hist
scatter
figsize
Bug in plotting subplots with DataFrame.plot, hist clears passed ax even if the number of subplots is one (GH7391).
DataFrame.plot
ax
Bug in plotting subplots with DataFrame.boxplot with by kw raises ValueError if the number of subplots exceeds 1 (GH7391).
DataFrame.boxplot
by
ValueError
Bug in subplots displays ticklabels and labels in different rule (GH5897)
ticklabels
labels
Bug in Panel.apply with a MultiIndex as an axis (GH7469)
Panel.apply
Bug in DatetimeIndex.insert doesn’t preserve name and tz (GH7299)
DatetimeIndex.insert
name
tz
Bug in DatetimeIndex.asobject doesn’t preserve name (GH7299)
DatetimeIndex.asobject
Bug in MultiIndex slicing with datetimelike ranges (strings and Timestamps), (GH7429)
Bug in Index.min and max doesn’t handle nan and NaT properly (GH7261)
Index.min
max
nan
Bug in PeriodIndex.min/max results in int (GH7609)
PeriodIndex.min/max
int
Bug in resample where fill_method was ignored if you passed how (GH2073)
resample
fill_method
how
Bug in TimeGrouper doesn’t exclude column specified by key (GH7227)
TimeGrouper
key
Bug in DataFrame and Series bar and barh plot raises TypeError when bottom and left keyword is specified (GH7226)
bottom
left
Bug in DataFrame.hist raises TypeError when it contains non numeric column (GH7277)
DataFrame.hist
Bug in Index.delete does not preserve name and freq attributes (GH7302)
Index.delete
freq
Bug in DataFrame.query()/eval where local string variables with the @ sign were being treated as temporaries attempting to be deleted (GH7300).
DataFrame.query()
eval
Bug in Float64Index which didn’t allow duplicates (GH7149).
Float64Index
Bug in DataFrame.replace() where truthy values were being replaced (GH7140).
DataFrame.replace()
Bug in StringMethods.extract() where a single match group Series would use the matcher’s name instead of the group name (GH7313).
StringMethods.extract()
Bug in isnull() when mode.use_inf_as_null == True where isnull wouldn’t test True when it encountered an inf/-inf (GH7315).
isnull()
mode.use_inf_as_null == True
True
inf
-inf
Bug in inferred_freq results in None for eastern hemisphere timezones (GH7310)
Bug in Easter returns incorrect date when offset is negative (GH7195)
Easter
Bug in broadcasting with .div, integer dtypes and divide-by-zero (GH7325)
.div
Bug in CustomBusinessDay.apply raises NameError when np.datetime64 object is passed (GH7196)
CustomBusinessDay.apply
NameError
Bug in MultiIndex.append, concat and pivot_table don’t preserve timezone (GH6606)
MultiIndex.append
concat
pivot_table
Bug in .loc with a list of indexers on a single-multi index level (that is not nested) (GH7349)
.loc
Bug in Series.map when mapping a dict with tuple keys of different lengths (GH7333)
Series.map
Bug all StringMethods now work on empty Series (GH7242)
Fix delegation of read_sql to read_sql_query when query does not contain ‘select’ (GH7324).
read_sql
read_sql_query
Bug where a string column name assignment to a DataFrame with a Float64Index raised a TypeError during a call to np.isnan (GH7366).
np.isnan
Bug where NDFrame.replace() didn’t correctly replace objects with Period values (GH7379).
NDFrame.replace()
Bug in .ix getitem should always return a Series (GH7150)
.ix
Bug in MultiIndex slicing with incomplete indexers (GH7399)
Bug in MultiIndex slicing with a step in a sliced level (GH7400)
Bug where negative indexers in DatetimeIndex were not correctly sliced (GH7408)
Bug where NaT wasn’t repr’d correctly in a MultiIndex (GH7406, GH7409).
MultiIndex
Bug where bool objects were converted to nan in convert_objects (GH7416).
convert_objects
Bug in quantile ignoring the axis keyword argument (GH7306)
quantile
Bug where nanops._maybe_null_out doesn’t work with complex numbers (GH7353)
nanops._maybe_null_out
Bug in several nanops functions when axis==0 for 1-dimensional nan arrays (GH7354)
nanops
axis==0
Bug where nanops.nanmedian doesn’t work when axis==None (GH7352)
nanops.nanmedian
axis==None
Bug where nanops._has_infs doesn’t work with many dtypes (GH7357)
nanops._has_infs
Bug in StataReader.data where reading a 0-observation dta failed (GH7369)
StataReader.data
Bug in StataReader when reading Stata 13 (117) files containing fixed width strings (GH7360)
StataReader
Bug in StataWriter where encoding was ignored (GH7286)
StataWriter
Bug in DatetimeIndex comparison doesn’t handle NaT properly (GH7529)
Bug in passing input with tzinfo to some offsets apply, rollforward or rollback resets tzinfo or raises ValueError (GH7465)
tzinfo
Bug in DatetimeIndex.to_period, PeriodIndex.asobject, PeriodIndex.to_timestamp doesn’t preserve name (GH7485)
DatetimeIndex.to_period
PeriodIndex.asobject
PeriodIndex.to_timestamp
Bug in DatetimeIndex.to_period and PeriodIndex.to_timestamp handle NaT incorrectly (GH7228)
Bug in offsets.apply, rollforward and rollback may return normal datetime (GH7502)
datetime
Bug in resample raises ValueError when target contains NaT (GH7227)
Bug in Timestamp.tz_localize resets nanosecond info (GH7534)
Timestamp.tz_localize
nanosecond
Bug in DatetimeIndex.asobject raises ValueError when it contains NaT (GH7539)
Bug in Timestamp.__new__ doesn’t preserve nanosecond properly (GH7610)
Timestamp.__new__
Bug in Index.astype(float) where it would return an object dtype Index (GH7464).
Index.astype(float)
object
Index
Bug in DataFrame.reset_index loses tz (GH3950)
DataFrame.reset_index
Bug in DatetimeIndex.freqstr raises AttributeError when freq is None (GH7606)
DatetimeIndex.freqstr
AttributeError
None
Bug in GroupBy.size created by TimeGrouper raises AttributeError (GH7453)
GroupBy.size
Bug in single column bar plot is misaligned (GH7498).
Bug in area plot with tz-aware time series raises ValueError (GH7471)
Bug in non-monotonic Index.union may preserve name incorrectly (GH7458)
Index.union
Bug in DatetimeIndex.intersection doesn’t preserve timezone (GH4690)
DatetimeIndex.intersection
Bug in rolling_var where a window larger than the array would raise an error(GH7297)
rolling_var
Bug with last plotted timeseries dictating xlim (GH2960)
Bug with secondary_y axis not being considered for timeseries xlim (GH3490)
secondary_y
Bug in Float64Index assignment with a non scalar indexer (GH7586)
Bug in pandas.core.strings.str_contains does not properly match in a case insensitive fashion when regex=False and case=False (GH7505)
pandas.core.strings.str_contains
regex=False
case=False
Bug in expanding_cov, expanding_corr, rolling_cov, and rolling_corr for two arguments with mismatched index (GH7512)
expanding_cov
expanding_corr
rolling_cov
rolling_corr
Bug in to_sql taking the boolean column as text column (GH7678)
to_sql
Bug in grouped hist doesn’t handle rot kw and sharex kw properly (GH7234)
rot
sharex
Bug in .loc performing fallback integer indexing with object dtype indices (GH7496)
Bug (regression) in PeriodIndex constructor when passed Series objects (GH7701).
A total of 46 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Andrew Rosenfeld
Andy Hayden
Benjamin Adams +
Benjamin M. Gross +
Brian Quistorff +
Brian Wignall +
DSM
Daniel Waeber
David Bew +
David Stephens
Jacob Schaer
Jan Schulz
John David Reaver
John W. O’Brien
Joris Van den Bossche
Julien Danjou +
K.-Michael Aye
Kevin Sheppard
Kyle Meyer
Matt Wittmann
Matthew Brett +
Michael Mueller +
Mortada Mehyar
Phillip Cloud
Rob Levy +
Schaer, Jacob C +
Stephan Hoyer
Thomas Kluyver
Todd Jennings
Tom Augspurger
TomAugspurger
bwignall
clham
dsm054 +
helger +
immerrr
jaimefrio
jreback
lexual
onesandzeroes
rockg
sanguineturtle +
seth-p +
sinhrks
unknown
yelite +