Version 0.14.1 (July 11, 2014)#
This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
- Highlights include: - New methods - select_dtypes()to select columns based on the dtype and- sem()to calculate the standard error of the mean.
- Support for dateutil timezones (see docs). 
- Support for ignoring full line comments in the - read_csv()text parser.
- New documentation section on Options and Settings. 
- Lots of bug fixes. 
 
API changes#
- Openpyxl now raises a ValueError on construction of the openpyxl writer instead of warning on pandas import (GH7284). 
- For - StringMethods.extract, when no match is found, the result - only containing- NaNvalues - now also has- dtype=objectinstead of- float(GH7242)
- Periodobjects no longer raise a- TypeErrorwhen compared using- ==with another object that isn’t a- Period. Instead when comparing a- Periodwith another object using- ==if the other object isn’t a- Period- Falseis returned. (GH7376)
- Previously, the behaviour on resetting the time or not in - offsets.apply,- rollforwardand- rollbackoperations differed between offsets. With the support of the- normalizekeyword for all offsets(see below) with a default value of False (preserve time), the behaviour changed for certain offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd, BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):- In [6]: from pandas.tseries import offsets In [7]: d = pd.Timestamp('2014-01-01 09:00') # old behaviour < 0.14.1 In [8]: d + offsets.MonthEnd() Out[8]: pd.Timestamp('2014-01-31 00:00:00') - Starting from 0.14.1 all offsets preserve time by default. The old behaviour can be obtained with - normalize=True- # new behaviour In [1]: d + offsets.MonthEnd() Out[1]: Timestamp('2014-01-31 09:00:00') In [2]: d + offsets.MonthEnd(normalize=True) Out[2]: Timestamp('2014-01-31 00:00:00') - Note that for the other offsets the default behaviour did not change. 
- Add back - #N/A N/Aas a default NA value in text parsing, (regression from 0.12) (GH5521)
- Raise a - TypeErroron inplace-setting with a- .whereand a non- np.nanvalue as this is inconsistent with a set-item expression like- df[mask] = None(GH7656)
Enhancements#
- Add - dropnaargument to- value_countsand- nunique(GH5569).
- Add - select_dtypes()method to allow selection of columns based on dtype (GH7316). See the docs.
- All - offsetssupports the- normalizekeyword to specify whether- offsets.apply,- rollforwardand- rollbackresets the time (hour, minute, etc) or not (default- False, preserves time) (GH7156):- import pandas.tseries.offsets as offsets day = offsets.Day() day.apply(pd.Timestamp("2014-01-01 09:00")) day = offsets.Day(normalize=True) day.apply(pd.Timestamp("2014-01-01 09:00")) 
- PeriodIndexis represented as the same format as- DatetimeIndex(GH7601)
- StringMethodsnow work on empty Series (GH7242)
- The file parsers - read_csvand- read_tablenow ignore line comments provided by the parameter- comment, which accepts only a single character for the C reader. In particular, they allow for comments before file data begins (GH2685)
- Add - NotImplementedErrorfor simultaneous use of- chunksizeand- nrowsfor read_csv() (GH6774).
- Tests for basic reading of public S3 buckets now exist (GH7281). 
- read_htmlnow sports an- encodingargument that is passed to the underlying parser library. You can use this to read non-ascii encoded web pages (GH7323).
- read_excelnow supports reading from URLs in the same way that- read_csvdoes. (GH6809)
- Support for dateutil timezones, which can now be used in the same way as pytz timezones across pandas. (GH4688) - In [3]: rng = pd.date_range( ...: "3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London" ...: ) ...: In [4]: rng.tz Out[4]: tzfile('/usr/share/zoneinfo/Europe/London') - See the docs. 
- Implemented - sem(standard error of the mean) operation for- Series,- DataFrame,- Panel, and- Groupby(GH6897)
- Add - nlargestand- nsmallestto the- Series- groupbyallowlist, which means you can now use these methods on a- SeriesGroupByobject (GH7053).
- All offsets - apply,- rollforwardand- rollbackcan now handle- np.datetime64, previously results in- ApplyTypeError(GH7452)
- Periodand- PeriodIndexcan contain- NaTin its values (GH7485)
- Support pickling - Series,- DataFrameand- Panelobjects with non-unique labels along item axis (- index,- columnsand- itemsrespectively) (GH7370).
- Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index with all null elements (GH7431) 
Performance#
- Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes: - int64,- timedelta64,- datetime64(GH7223)
- Improvements in Series.transform for significant performance gains (GH6496) 
- Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (GH7383) 
- Regression in groupby aggregation of datetime64 dtypes (GH7555) 
- Improvements in - MultiIndex.from_productfor large iterables (GH7627)
Experimental#
- pandas.io.data.Optionshas a new method,- get_all_datamethod, and now consistently returns a MultiIndexed- DataFrame(GH5602)
- io.gbq.read_gbqand- io.gbq.to_gbqwere refactored to remove the dependency on the Google- bq.pycommand line client. This submodule now uses- httplib2and the Google- apiclientand- oauth2clientAPI client libraries which should be more stable and, therefore, reliable than- bq.py. See the docs. (GH6937).
Bug fixes#
- Bug in - DataFrame.wherewith a symmetric shaped frame and a passed other of a DataFrame (GH7506)
- Bug in Panel indexing with a MultiIndex axis (GH7516) 
- Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (GH7523) 
- Bug in setitem with list-of-lists and single vs mixed types (GH7551:) 
- Bug in time ops with non-aligned Series (GH7500) 
- Bug in timedelta inference when assigning an incomplete Series (GH7592) 
- Bug in groupby - .nthwith a Series and integer-like column name (GH7559)
- Bug in - Series.getwith a boolean accessor (GH7407)
- Bug in - value_countswhere- NaTdid not qualify as missing (- NaN) (GH7423)
- Bug in - to_timedeltathat accepted invalid units and misinterpreted ‘m/h’ (GH7611, GH6423)
- Bug in line plot doesn’t set correct - xlimif- secondary_y=True(GH7459)
- Bug in grouped - histand- scatterplots use old- figsizedefault (GH7394)
- Bug in plotting subplots with - DataFrame.plot,- histclears passed- axeven if the number of subplots is one (GH7391).
- Bug in plotting subplots with - DataFrame.boxplotwith- bykw raises- ValueErrorif the number of subplots exceeds 1 (GH7391).
- Bug in subplots displays - ticklabelsand- labelsin different rule (GH5897)
- Bug in - Panel.applywith a MultiIndex as an axis (GH7469)
- Bug in - DatetimeIndex.insertdoesn’t preserve- nameand- tz(GH7299)
- Bug in - DatetimeIndex.asobjectdoesn’t preserve- name(GH7299)
- Bug in MultiIndex slicing with datetimelike ranges (strings and Timestamps), (GH7429) 
- Bug in - Index.minand- maxdoesn’t handle- nanand- NaTproperly (GH7261)
- Bug in - PeriodIndex.min/maxresults in- int(GH7609)
- Bug in - resamplewhere- fill_methodwas ignored if you passed- how(GH2073)
- Bug in - TimeGrouperdoesn’t exclude column specified by- key(GH7227)
- Bug in - DataFrameand- Seriesbar and barh plot raises- TypeErrorwhen- bottomand- leftkeyword is specified (GH7226)
- Bug in - DataFrame.histraises- TypeErrorwhen it contains non numeric column (GH7277)
- Bug in - Index.deletedoes not preserve- nameand- freqattributes (GH7302)
- Bug in - DataFrame.query()/- evalwhere local string variables with the @ sign were being treated as temporaries attempting to be deleted (GH7300).
- Bug in - Float64Indexwhich didn’t allow duplicates (GH7149).
- Bug in - DataFrame.replace()where truthy values were being replaced (GH7140).
- Bug in - StringMethods.extract()where a single match group Series would use the matcher’s name instead of the group name (GH7313).
- Bug in - isnull()when- mode.use_inf_as_null == Truewhere isnull wouldn’t test- Truewhen it encountered an- inf/- -inf(GH7315).
- Bug in inferred_freq results in None for eastern hemisphere timezones (GH7310) 
- Bug in - Easterreturns incorrect date when offset is negative (GH7195)
- Bug in broadcasting with - .div, integer dtypes and divide-by-zero (GH7325)
- Bug in - CustomBusinessDay.applyraises- NameErrorwhen- np.datetime64object is passed (GH7196)
- Bug in - MultiIndex.append,- concatand- pivot_tabledon’t preserve timezone (GH6606)
- Bug in - .locwith a list of indexers on a single-multi index level (that is not nested) (GH7349)
- Bug in - Series.mapwhen mapping a dict with tuple keys of different lengths (GH7333)
- Bug all - StringMethodsnow work on empty Series (GH7242)
- Fix delegation of - read_sqlto- read_sql_querywhen query does not contain ‘select’ (GH7324).
- Bug where a string column name assignment to a - DataFramewith a- Float64Indexraised a- TypeErrorduring a call to- np.isnan(GH7366).
- Bug where - NDFrame.replace()didn’t correctly replace objects with- Periodvalues (GH7379).
- Bug in - .ixgetitem should always return a Series (GH7150)
- Bug in MultiIndex slicing with incomplete indexers (GH7399) 
- Bug in MultiIndex slicing with a step in a sliced level (GH7400) 
- Bug where negative indexers in - DatetimeIndexwere not correctly sliced (GH7408)
- Bug where - NaTwasn’t repr’d correctly in a- MultiIndex(GH7406, GH7409).
- Bug where bool objects were converted to - nanin- convert_objects(GH7416).
- Bug in - quantileignoring the axis keyword argument (GH7306)
- Bug where - nanops._maybe_null_outdoesn’t work with complex numbers (GH7353)
- Bug in several - nanopsfunctions when- axis==0for 1-dimensional- nanarrays (GH7354)
- Bug where - nanops.nanmediandoesn’t work when- axis==None(GH7352)
- Bug where - nanops._has_infsdoesn’t work with many dtypes (GH7357)
- Bug in - StataReader.datawhere reading a 0-observation dta failed (GH7369)
- Bug in - StataReaderwhen reading Stata 13 (117) files containing fixed width strings (GH7360)
- Bug in - StataWriterwhere encoding was ignored (GH7286)
- Bug in - DatetimeIndexcomparison doesn’t handle- NaTproperly (GH7529)
- Bug in passing input with - tzinfoto some offsets- apply,- rollforwardor- rollbackresets- tzinfoor raises- ValueError(GH7465)
- Bug in - DatetimeIndex.to_period,- PeriodIndex.asobject,- PeriodIndex.to_timestampdoesn’t preserve- name(GH7485)
- Bug in - DatetimeIndex.to_periodand- PeriodIndex.to_timestamphandle- NaTincorrectly (GH7228)
- Bug in - offsets.apply,- rollforwardand- rollbackmay return normal- datetime(GH7502)
- Bug in - resampleraises- ValueErrorwhen target contains- NaT(GH7227)
- Bug in - Timestamp.tz_localizeresets- nanosecondinfo (GH7534)
- Bug in - DatetimeIndex.asobjectraises- ValueErrorwhen it contains- NaT(GH7539)
- Bug in - Timestamp.__new__doesn’t preserve nanosecond properly (GH7610)
- Bug in - Index.astype(float)where it would return an- objectdtype- Index(GH7464).
- Bug in - DataFrame.reset_indexloses- tz(GH3950)
- Bug in - DatetimeIndex.freqstrraises- AttributeErrorwhen- freqis- None(GH7606)
- Bug in - GroupBy.sizecreated by- TimeGrouperraises- AttributeError(GH7453)
- Bug in single column bar plot is misaligned (GH7498). 
- Bug in area plot with tz-aware time series raises - ValueError(GH7471)
- Bug in non-monotonic - Index.unionmay preserve- nameincorrectly (GH7458)
- Bug in - DatetimeIndex.intersectiondoesn’t preserve timezone (GH4690)
- Bug in - rolling_varwhere a window larger than the array would raise an error(GH7297)
- Bug with last plotted timeseries dictating - xlim(GH2960)
- Bug with - secondary_yaxis not being considered for timeseries- xlim(GH3490)
- Bug in - Float64Indexassignment with a non scalar indexer (GH7586)
- Bug in - pandas.core.strings.str_containsdoes not properly match in a case insensitive fashion when- regex=Falseand- case=False(GH7505)
- Bug in - expanding_cov,- expanding_corr,- rolling_cov, and- rolling_corrfor two arguments with mismatched index (GH7512)
- Bug in - to_sqltaking the boolean column as text column (GH7678)
- Bug in grouped - histdoesn’t handle- rotkw and- sharexkw properly (GH7234)
- Bug in - .locperforming fallback integer indexing with- objectdtype indices (GH7496)
- Bug (regression) in - PeriodIndexconstructor when passed- Seriesobjects (GH7701).
Contributors#
A total of 46 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Andrew Rosenfeld 
- Andy Hayden 
- Benjamin Adams + 
- Benjamin M. Gross + 
- Brian Quistorff + 
- Brian Wignall + 
- DSM 
- Daniel Waeber 
- David Bew + 
- David Stephens 
- Jacob Schaer 
- Jan Schulz 
- John David Reaver 
- John W. O’Brien 
- Joris Van den Bossche 
- Julien Danjou + 
- K.-Michael Aye 
- Kevin Sheppard 
- Kyle Meyer 
- Matt Wittmann 
- Matthew Brett + 
- Michael Mueller + 
- Mortada Mehyar 
- Phillip Cloud 
- Rob Levy + 
- Schaer, Jacob C + 
- Stephan Hoyer 
- Thomas Kluyver 
- Todd Jennings 
- Tom Augspurger 
- TomAugspurger 
- bwignall 
- clham 
- dsm054 + 
- helger + 
- immerrr 
- jaimefrio 
- jreback 
- lexual 
- onesandzeroes 
- rockg 
- sanguineturtle + 
- seth-p + 
- sinhrks 
- unknown 
- yelite +