Version 0.14.1 (July 11, 2014)¶
This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
New methods
select_dtypes()to select columns based on the dtype andsem()to calculate the standard error of the mean.Support for dateutil timezones (see docs).
Support for ignoring full line comments in the
read_csv()text parser.New documentation section on Options and Settings.
Lots of bug fixes.
API changes¶
Openpyxl now raises a ValueError on construction of the openpyxl writer instead of warning on pandas import (GH7284).
For
StringMethods.extract, when no match is found, the result - only containingNaNvalues - now also hasdtype=objectinstead offloat(GH7242)Periodobjects no longer raise aTypeErrorwhen compared using==with another object that isn’t aPeriod. Instead when comparing aPeriodwith another object using==if the other object isn’t aPeriodFalseis returned. (GH7376)Previously, the behaviour on resetting the time or not in
offsets.apply,rollforwardandrollbackoperations differed between offsets. With the support of thenormalizekeyword for all offsets(see below) with a default value of False (preserve time), the behaviour changed for certain offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd, BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):In [6]: from pandas.tseries import offsets In [7]: d = pd.Timestamp('2014-01-01 09:00') # old behaviour < 0.14.1 In [8]: d + offsets.MonthEnd() Out[8]: pd.Timestamp('2014-01-31 00:00:00')
Starting from 0.14.1 all offsets preserve time by default. The old behaviour can be obtained with
normalize=True# new behaviour In [1]: d + offsets.MonthEnd() Out[1]: Timestamp('2014-01-31 09:00:00') In [2]: d + offsets.MonthEnd(normalize=True) Out[2]: Timestamp('2014-01-31 00:00:00')
Note that for the other offsets the default behaviour did not change.
Add back
#N/A N/Aas a default NA value in text parsing, (regression from 0.12) (GH5521)Raise a
TypeErroron inplace-setting with a.whereand a nonnp.nanvalue as this is inconsistent with a set-item expression likedf[mask] = None(GH7656)
Enhancements¶
Add
dropnaargument tovalue_countsandnunique(GH5569).Add
select_dtypes()method to allow selection of columns based on dtype (GH7316). See the docs.All
offsetssupports thenormalizekeyword to specify whetheroffsets.apply,rollforwardandrollbackresets the time (hour, minute, etc) or not (defaultFalse, preserves time) (GH7156):import pandas.tseries.offsets as offsets day = offsets.Day() day.apply(pd.Timestamp("2014-01-01 09:00")) day = offsets.Day(normalize=True) day.apply(pd.Timestamp("2014-01-01 09:00"))
PeriodIndexis represented as the same format asDatetimeIndex(GH7601)StringMethodsnow work on empty Series (GH7242)The file parsers
read_csvandread_tablenow ignore line comments provided by the parametercomment, which accepts only a single character for the C reader. In particular, they allow for comments before file data begins (GH2685)Add
NotImplementedErrorfor simultaneous use ofchunksizeandnrowsfor read_csv() (GH6774).Tests for basic reading of public S3 buckets now exist (GH7281).
read_htmlnow sports anencodingargument that is passed to the underlying parser library. You can use this to read non-ascii encoded web pages (GH7323).read_excelnow supports reading from URLs in the same way thatread_csvdoes. (GH6809)Support for dateutil timezones, which can now be used in the same way as pytz timezones across pandas. (GH4688)
In [3]: rng = pd.date_range( ...: "3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London" ...: ) ...: In [4]: rng.tz Out[4]: tzfile('/usr/share/zoneinfo/Europe/London')
See the docs.
Implemented
sem(standard error of the mean) operation forSeries,DataFrame,Panel, andGroupby(GH6897)Add
nlargestandnsmallestto theSeriesgroupbyallowlist, which means you can now use these methods on aSeriesGroupByobject (GH7053).All offsets
apply,rollforwardandrollbackcan now handlenp.datetime64, previously results inApplyTypeError(GH7452)PeriodandPeriodIndexcan containNaTin its values (GH7485)Support pickling
Series,DataFrameandPanelobjects with non-unique labels along item axis (index,columnsanditemsrespectively) (GH7370).Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index with all null elements (GH7431)
Performance¶
Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes:
int64,timedelta64,datetime64(GH7223)Improvements in Series.transform for significant performance gains (GH6496)
Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (GH7383)
Regression in groupby aggregation of datetime64 dtypes (GH7555)
Improvements in
MultiIndex.from_productfor large iterables (GH7627)
Experimental¶
pandas.io.data.Optionshas a new method,get_all_datamethod, and now consistently returns a MultiIndexedDataFrame(GH5602)io.gbq.read_gbqandio.gbq.to_gbqwere refactored to remove the dependency on the Googlebq.pycommand line client. This submodule now useshttplib2and the Googleapiclientandoauth2clientAPI client libraries which should be more stable and, therefore, reliable thanbq.py. See the docs. (GH6937).
Bug fixes¶
Bug in
DataFrame.wherewith a symmetric shaped frame and a passed other of a DataFrame (GH7506)Bug in Panel indexing with a MultiIndex axis (GH7516)
Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (GH7523)
Bug in setitem with list-of-lists and single vs mixed types (GH7551:)
Bug in time ops with non-aligned Series (GH7500)
Bug in timedelta inference when assigning an incomplete Series (GH7592)
Bug in groupby
.nthwith a Series and integer-like column name (GH7559)Bug in
Series.getwith a boolean accessor (GH7407)Bug in
value_countswhereNaTdid not qualify as missing (NaN) (GH7423)Bug in
to_timedeltathat accepted invalid units and misinterpreted ‘m/h’ (GH7611, GH6423)Bug in line plot doesn’t set correct
xlimifsecondary_y=True(GH7459)Bug in grouped
histandscatterplots use oldfigsizedefault (GH7394)Bug in plotting subplots with
DataFrame.plot,histclears passedaxeven if the number of subplots is one (GH7391).Bug in plotting subplots with
DataFrame.boxplotwithbykw raisesValueErrorif the number of subplots exceeds 1 (GH7391).Bug in subplots displays
ticklabelsandlabelsin different rule (GH5897)Bug in
Panel.applywith a MultiIndex as an axis (GH7469)Bug in
DatetimeIndex.insertdoesn’t preservenameandtz(GH7299)Bug in
DatetimeIndex.asobjectdoesn’t preservename(GH7299)Bug in MultiIndex slicing with datetimelike ranges (strings and Timestamps), (GH7429)
Bug in
Index.minandmaxdoesn’t handlenanandNaTproperly (GH7261)Bug in
PeriodIndex.min/maxresults inint(GH7609)Bug in
resamplewherefill_methodwas ignored if you passedhow(GH2073)Bug in
TimeGrouperdoesn’t exclude column specified bykey(GH7227)Bug in
DataFrameandSeriesbar and barh plot raisesTypeErrorwhenbottomandleftkeyword is specified (GH7226)Bug in
DataFrame.histraisesTypeErrorwhen it contains non numeric column (GH7277)Bug in
Index.deletedoes not preservenameandfreqattributes (GH7302)Bug in
DataFrame.query()/evalwhere local string variables with the @ sign were being treated as temporaries attempting to be deleted (GH7300).Bug in
Float64Indexwhich didn’t allow duplicates (GH7149).Bug in
DataFrame.replace()where truthy values were being replaced (GH7140).Bug in
StringMethods.extract()where a single match group Series would use the matcher’s name instead of the group name (GH7313).Bug in
isnull()whenmode.use_inf_as_null == Truewhere isnull wouldn’t testTruewhen it encountered aninf/-inf(GH7315).Bug in inferred_freq results in None for eastern hemisphere timezones (GH7310)
Bug in
Easterreturns incorrect date when offset is negative (GH7195)Bug in broadcasting with
.div, integer dtypes and divide-by-zero (GH7325)Bug in
CustomBusinessDay.applyraisesNameErrorwhennp.datetime64object is passed (GH7196)Bug in
MultiIndex.append,concatandpivot_tabledon’t preserve timezone (GH6606)Bug in
.locwith a list of indexers on a single-multi index level (that is not nested) (GH7349)Bug in
Series.mapwhen mapping a dict with tuple keys of different lengths (GH7333)Bug all
StringMethodsnow work on empty Series (GH7242)Fix delegation of
read_sqltoread_sql_querywhen query does not contain ‘select’ (GH7324).Bug where a string column name assignment to a
DataFramewith aFloat64Indexraised aTypeErrorduring a call tonp.isnan(GH7366).Bug where
NDFrame.replace()didn’t correctly replace objects withPeriodvalues (GH7379).Bug in
.ixgetitem should always return a Series (GH7150)Bug in MultiIndex slicing with incomplete indexers (GH7399)
Bug in MultiIndex slicing with a step in a sliced level (GH7400)
Bug where negative indexers in
DatetimeIndexwere not correctly sliced (GH7408)Bug where
NaTwasn’t repr’d correctly in aMultiIndex(GH7406, GH7409).Bug where bool objects were converted to
naninconvert_objects(GH7416).Bug in
quantileignoring the axis keyword argument (GH7306)Bug where
nanops._maybe_null_outdoesn’t work with complex numbers (GH7353)Bug in several
nanopsfunctions whenaxis==0for 1-dimensionalnanarrays (GH7354)Bug where
nanops.nanmediandoesn’t work whenaxis==None(GH7352)Bug where
nanops._has_infsdoesn’t work with many dtypes (GH7357)Bug in
StataReader.datawhere reading a 0-observation dta failed (GH7369)Bug in
StataReaderwhen reading Stata 13 (117) files containing fixed width strings (GH7360)Bug in
StataWriterwhere encoding was ignored (GH7286)Bug in
DatetimeIndexcomparison doesn’t handleNaTproperly (GH7529)Bug in passing input with
tzinfoto some offsetsapply,rollforwardorrollbackresetstzinfoor raisesValueError(GH7465)Bug in
DatetimeIndex.to_period,PeriodIndex.asobject,PeriodIndex.to_timestampdoesn’t preservename(GH7485)Bug in
DatetimeIndex.to_periodandPeriodIndex.to_timestamphandleNaTincorrectly (GH7228)Bug in
offsets.apply,rollforwardandrollbackmay return normaldatetime(GH7502)Bug in
resampleraisesValueErrorwhen target containsNaT(GH7227)Bug in
Timestamp.tz_localizeresetsnanosecondinfo (GH7534)Bug in
DatetimeIndex.asobjectraisesValueErrorwhen it containsNaT(GH7539)Bug in
Timestamp.__new__doesn’t preserve nanosecond properly (GH7610)Bug in
Index.astype(float)where it would return anobjectdtypeIndex(GH7464).Bug in
DataFrame.reset_indexlosestz(GH3950)Bug in
DatetimeIndex.freqstrraisesAttributeErrorwhenfreqisNone(GH7606)Bug in
GroupBy.sizecreated byTimeGrouperraisesAttributeError(GH7453)Bug in single column bar plot is misaligned (GH7498).
Bug in area plot with tz-aware time series raises
ValueError(GH7471)Bug in non-monotonic
Index.unionmay preservenameincorrectly (GH7458)Bug in
DatetimeIndex.intersectiondoesn’t preserve timezone (GH4690)Bug in
rolling_varwhere a window larger than the array would raise an error(GH7297)Bug with last plotted timeseries dictating
xlim(GH2960)Bug with
secondary_yaxis not being considered for timeseriesxlim(GH3490)Bug in
Float64Indexassignment with a non scalar indexer (GH7586)Bug in
pandas.core.strings.str_containsdoes not properly match in a case insensitive fashion whenregex=Falseandcase=False(GH7505)Bug in
expanding_cov,expanding_corr,rolling_cov, androlling_corrfor two arguments with mismatched index (GH7512)Bug in
to_sqltaking the boolean column as text column (GH7678)Bug in grouped
histdoesn’t handlerotkw andsharexkw properly (GH7234)Bug in
.locperforming fallback integer indexing withobjectdtype indices (GH7496)Bug (regression) in
PeriodIndexconstructor when passedSeriesobjects (GH7701).
Contributors¶
A total of 46 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Andrew Rosenfeld
Andy Hayden
Benjamin Adams +
Benjamin M. Gross +
Brian Quistorff +
Brian Wignall +
DSM
Daniel Waeber
David Bew +
David Stephens
Jacob Schaer
Jan Schulz
John David Reaver
John W. O’Brien
Joris Van den Bossche
Julien Danjou +
K.-Michael Aye
Kevin Sheppard
Kyle Meyer
Matt Wittmann
Matthew Brett +
Michael Mueller +
Mortada Mehyar
Phillip Cloud
Rob Levy +
Schaer, Jacob C +
Stephan Hoyer
Thomas Kluyver
Todd Jennings
Tom Augspurger
TomAugspurger
bwignall
clham
dsm054 +
helger +
immerrr
jaimefrio
jreback
lexual
onesandzeroes
rockg
sanguineturtle +
seth-p +
sinhrks
unknown
yelite +