Version 0.17.1 (November 21, 2015)#
Note
We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization). This will help ensure the success of development of pandas as a world-class open-source project.
This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
- Support for Conditional HTML Formatting, see here 
- Releasing the GIL on the csv reader & other ops, see here 
- Fixed regression in - DataFrame.drop_duplicatesfrom 0.16.2, causing incorrect results on integer values (GH11376)
What’s new in v0.17.1
New features#
Conditional HTML formatting#
Warning
This is a new feature and is under active development. We’ll be adding features an possibly making breaking changes in future releases. Feedback is welcome in GH11610
We’ve added experimental support for conditional HTML formatting:
the visual styling of a DataFrame based on the data.
The styling is accomplished with HTML and CSS.
Accesses the styler class with the pandas.DataFrame.style, attribute,
an instance of Styler with your data attached.
Here’s a quick example:
In [1]: np.random.seed(123) In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde")) In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)
We can render the HTML to get the following table.
| a | b | c | d | e | |
|---|---|---|---|---|---|
| 0 | -1.085631 | 0.997345 | 0.282978 | -1.506295 | -0.5786 | 
| 1 | 1.651437 | -2.426679 | -0.428913 | 1.265936 | -0.86674 | 
| 2 | -0.678886 | -0.094709 | 1.49139 | -0.638902 | -0.443982 | 
| 3 | -0.434351 | 2.20593 | 2.186786 | 1.004054 | 0.386186 | 
| 4 | 0.737369 | 1.490732 | -0.935834 | 1.175829 | -1.253881 | 
| 5 | -0.637752 | 0.907105 | -1.428681 | -0.140069 | -0.861755 | 
| 6 | -0.255619 | -2.798589 | -1.771533 | -0.699877 | 0.927462 | 
| 7 | -0.173636 | 0.002846 | 0.688223 | -0.879536 | 0.283627 | 
| 8 | -0.805367 | -1.727669 | -0.3909 | 0.573806 | 0.338589 | 
| 9 | -0.01183 | 2.392365 | 0.412912 | 0.978736 | 2.238143 | 
Styler interacts nicely with the Jupyter Notebook.
See the documentation for more.
Enhancements#
- DatetimeIndexnow supports conversion to strings with- astype(str)(GH10442)
- Support for - compression(gzip/bz2) in- pandas.DataFrame.to_csv()(GH7615)
- pd.read_*functions can now also accept- pathlib.Path, or- py:py._path.local.LocalPathobjects for the- filepath_or_bufferargument. (GH11033) - The- DataFrameand- Seriesfunctions- .to_csv(),- .to_html()and- .to_latex()can now handle paths beginning with tildes (e.g.- ~/Documents/) (GH11438)
- DataFramenow uses the fields of a- namedtupleas columns, if columns are not supplied (GH11181)
- DataFrame.itertuples()now returns- namedtupleobjects, when possible. (GH11269, GH11625)
- Added - axvlines_kwdsto parallel coordinates plot (GH10709)
- Option to - .info()and- .memory_usage()to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (GH11595)- In [4]: df = pd.DataFrame({"A": ["foo"] * 1000}) # noqa: F821 In [5]: df["B"] = df["A"].astype("category") # shows the '+' as we have object dtypes In [6]: df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1000 non-null object 1 B 1000 non-null category dtypes: category(1), object(1) memory usage: 9.0+ KB # we have an accurate memory assessment (but can be expensive to compute this) In [7]: df.info(memory_usage="deep") <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1000 non-null object 1 B 1000 non-null category dtypes: category(1), object(1) memory usage: 59.9 KB 
- Indexnow has a- fillnamethod (GH10089)- In [8]: pd.Index([1, np.nan, 3]).fillna(2) Out[8]: Index([1.0, 2.0, 3.0], dtype='float64') 
- Series of type - categorynow make- .str.<...>and- .dt.<...>accessor methods / properties available, if the categories are of that type. (GH10661)- In [9]: s = pd.Series(list("aabb")).astype("category") In [10]: s Out[10]: 0 a 1 a 2 b 3 b Length: 4, dtype: category Categories (2, object): ['a', 'b'] In [11]: s.str.contains("a") Out[11]: 0 True 1 True 2 False 3 False Length: 4, dtype: bool In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category") In [13]: date Out[13]: 0 2015-01-01 1 2015-01-02 2 2015-01-03 3 2015-01-04 4 2015-01-05 Length: 5, dtype: category Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05] In [14]: date.dt.day Out[14]: 0 1 1 2 2 3 3 4 4 5 Length: 5, dtype: int32 
- pivot_tablenow has a- margins_nameargument so you can use something other than the default of ‘All’ (GH3335)
- Implement export of - datetime64[ns, tz]dtypes with a fixed HDF5 store (GH11411)
- Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax ( - {x, y}) instead of Legacy Python syntax (- set([x, y])) (GH11215)
- Improve the error message in - pandas.io.gbq.to_gbq()when a streaming insert fails (GH11285) and when the DataFrame does not match the schema of the destination table (GH11359)
API changes#
- raise - NotImplementedErrorin- Index.shiftfor non-supported index types (GH8038)
- minand- maxreductions on- datetime64and- timedelta64dtyped series now result in- NaTand not- nan(GH11245).
- Indexing with a null key will raise a - TypeError, instead of a- ValueError(GH11356)
- Series.ptpwill now ignore missing values by default (GH11163)
Deprecations#
Performance improvements#
- Checking monotonic-ness before sorting on an index (GH11080) 
- Series.dropnaperformance improvement when its dtype can’t contain- NaN(GH11159)
- Release the GIL on most datetime field operations (e.g. - DatetimeIndex.year,- Series.dt.year), normalization, and conversion to and from- Period,- DatetimeIndex.to_periodand- PeriodIndex.to_timestamp(GH11263)
- Release the GIL on some rolling algos: - rolling_median,- rolling_mean,- rolling_max,- rolling_min,- rolling_var,- rolling_kurt,- rolling_skew(GH11450)
- Release the GIL when reading and parsing text files in - read_csv,- read_table(GH11272)
- Improved performance of - rolling_median(GH11450)
- Improved performance of - to_excel(GH11352)
- Performance bug in repr of - Categoricalcategories, which was rendering the strings before chopping them for display (GH11305)
- Performance improvement in - Categorical.remove_unused_categories, (GH11643).
- Improved performance of - Seriesconstructor with no data and- DatetimeIndex(GH11433)
- Improved performance of - shift,- cumprod, and- cumsumwith groupby (GH4095)
Bug fixes#
- SparseArray.__iter__()now does not cause- PendingDeprecationWarningin Python 3.5 (GH11622)
- Regression from 0.16.2 for output formatting of long floats/nan, restored in (GH11302) 
- Series.sort_index()now correctly handles the- inplaceoption (GH11402)
- Incorrectly distributed .c file in the build on - PyPiwhen reading a csv of floats and passing- na_values=<a scalar>would show an exception (GH11374)
- Bug in - .to_latex()output broken when the index has a name (GH10660)
- Bug in - HDFStore.appendwith strings whose encoded length exceeded the max unencoded length (GH11234)
- Bug in merging - datetime64[ns, tz]dtypes (GH11405)
- Bug in - HDFStore.selectwhen comparing with a numpy scalar in a where clause (GH11283)
- Bug in using - DataFrame.ixwith a MultiIndex indexer (GH11372)
- Bug in - date_rangewith ambiguous endpoints (GH11626)
- Prevent adding new attributes to the accessors - .str,- .dtand- .cat. Retrieving such a value was not possible, so error out on setting it. (GH10673)
- Bug in tz-conversions with an ambiguous time and - .dtaccessors (GH11295)
- Bug in output formatting when using an index of ambiguous times (GH11619) 
- Bug in comparisons of Series vs list-likes (GH11339) 
- Bug in - DataFrame.replacewith a- datetime64[ns, tz]and a non-compat to_replace (GH11326, GH11153)
- Bug in - isnullwhere- numpy.datetime64('NaT')in a- numpy.arraywas not determined to be null(GH11206)
- Bug in list-like indexing with a mixed-integer Index (GH11320) 
- Bug in - pivot_tablewith- margins=Truewhen indexes are of- Categoricaldtype (GH10993)
- Bug in - DataFrame.plotcannot use hex strings colors (GH10299)
- Regression in - DataFrame.drop_duplicatesfrom 0.16.2, causing incorrect results on integer values (GH11376)
- Bug in - pd.evalwhere unary ops in a list error (GH11235)
- Bug in - describe()dropping column names for hierarchical indexes (GH11517)
- Bug in - DataFrame.pct_change()not propagating- axiskeyword on- .fillnamethod (GH11150)
- Bug in - .to_csv()when a mix of integer and string column names are passed as the- columnsparameter (GH11637)
- Bug in indexing with a - range, (GH11652)
- Bug in inference of numpy scalars and preserving dtype when setting columns (GH11638) 
- Bug in - to_sqlusing unicode column names giving UnicodeEncodeError with (GH11431).
- Fix regression in setting of - xticksin- plot(GH11529).
- Bug in - holiday.dateswhere observance rules could not be applied to holiday and doc enhancement (GH11477, GH11533)
- Fix plotting issues when having plain - Axesinstances instead of- SubplotAxes(GH11520, GH11556).
- Bug in - DataFrame.to_latex()produces an extra rule when- header=False(GH7124)
- Bug in - df.groupby(...).apply(func)when a func returns a- Seriescontaining a new datetimelike column (GH11324)
- Bug in - pandas.jsonwhen file to load is big (GH11344)
- Bugs in - to_excelwith duplicate columns (GH11007, GH10982, GH10970)
- Fixed a bug that prevented the construction of an empty series of dtype - datetime64[ns, tz](GH11245).
- Bug in - read_excelwith MultiIndex containing integers (GH11317)
- Bug in - to_excelwith openpyxl 2.2+ and merging (GH11408)
- Bug in - DataFrame.to_dict()produces a- np.datetime64object instead of- Timestampwhen only datetime is present in data (GH11327)
- Bug in - DataFrame.corr()raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (GH11560)
- Bug in the link-time error caused by C - inlinefunctions on FreeBSD 10+ (with- clang) (GH10510)
- Bug in - DataFrame.to_csvin passing through arguments for formatting- MultiIndexes, including- date_format(GH7791)
- Bug in - DataFrame.join()with- how='right'producing a- TypeError(GH11519)
- Bug in - Series.quantilewith empty list results has- Indexwith- objectdtype (GH11588)
- Bug in - pd.mergeresults in empty- Int64Indexrather than- Index(dtype=object)when the merge result is empty (GH11588)
- Bug in - Categorical.remove_unused_categorieswhen having- NaNvalues (GH11599)
- Bug in - DataFrame.to_sparse()loses column names for MultiIndexes (GH11600)
- Bug in - DataFrame.round()with non-unique column index producing a Fatal Python error (GH11611)
- Bug in - DataFrame.round()with- decimalsbeing a non-unique indexed Series producing extra columns (GH11618)
Contributors#
A total of 63 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Aleksandr Drozd + 
- Alex Chase + 
- Anthonios Partheniou 
- BrenBarn + 
- Brian J. McGuirk + 
- Chris 
- Christian Berendt + 
- Christian Perez + 
- Cody Piersall + 
- Data & Code Expert Experimenting with Code on Data 
- DrIrv + 
- Evan Wright 
- Guillaume Gay 
- Hamed Saljooghinejad + 
- Iblis Lin + 
- Jake VanderPlas 
- Jan Schulz 
- Jean-Mathieu Deschenes + 
- Jeff Reback 
- Jimmy Callin + 
- Joris Van den Bossche 
- K.-Michael Aye 
- Ka Wo Chen 
- Loïc Séguin-C + 
- Luo Yicheng + 
- Magnus Jöud + 
- Manuel Leonhardt + 
- Matthew Gilbert 
- Maximilian Roos 
- Michael + 
- Nicholas Stahl + 
- Nicolas Bonnotte + 
- Pastafarianist + 
- Petra Chong + 
- Phil Schaf + 
- Philipp A + 
- Rob deCarvalho + 
- Roman Khomenko + 
- Rémy Léone + 
- Sebastian Bank + 
- Sinhrks 
- Stephan Hoyer 
- Thierry Moisan 
- Tom Augspurger 
- Tux1 + 
- Varun + 
- Wieland Hoffmann + 
- Winterflower 
- Yoav Ram + 
- Younggun Kim 
- Zeke + 
- ajcr 
- azuranski + 
- behzad nouri 
- cel4 
- emilydolson + 
- hironow + 
- lexual 
- llllllllll + 
- rockg 
- silentquasar + 
- sinhrks 
- taeold +