Note
We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization). This will help ensure the success of development of pandas as a world-class open-source project.
This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
Support for Conditional HTML Formatting, see here
Releasing the GIL on the csv reader & other ops, see here
Fixed regression in DataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values (GH11376)
DataFrame.drop_duplicates
What’s new in v0.17.1
New features
Conditional HTML formatting
Enhancements
API changes
Deprecations
Performance improvements
Bug fixes
Contributors
Warning
This is a new feature and is under active development. We’ll be adding features an possibly making breaking changes in future releases. Feedback is welcome.
We’ve added experimental support for conditional HTML formatting: the visual styling of a DataFrame based on the data. The styling is accomplished with HTML and CSS. Accesses the styler class with the pandas.DataFrame.style, attribute, an instance of Styler with your data attached.
pandas.DataFrame.style
Styler
Here’s a quick example:
In [1]: np.random.seed(123) In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde")) In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)
We can render the HTML to get the following table.
Styler interacts nicely with the Jupyter Notebook. See the documentation for more.
DatetimeIndex now supports conversion to strings with astype(str) (GH10442)
DatetimeIndex
astype(str)
Support for compression (gzip/bz2) in pandas.DataFrame.to_csv() (GH7615)
compression
pandas.DataFrame.to_csv()
pd.read_* functions can now also accept pathlib.Path, or py._path.local.LocalPath objects for the filepath_or_buffer argument. (GH11033) - The DataFrame and Series functions .to_csv(), .to_html() and .to_latex() can now handle paths beginning with tildes (e.g. ~/Documents/) (GH11438)
pd.read_*
pathlib.Path
py._path.local.LocalPath
filepath_or_buffer
DataFrame
Series
.to_csv()
.to_html()
.to_latex()
~/Documents/
DataFrame now uses the fields of a namedtuple as columns, if columns are not supplied (GH11181)
namedtuple
DataFrame.itertuples() now returns namedtuple objects, when possible. (GH11269, GH11625)
DataFrame.itertuples()
Added axvlines_kwds to parallel coordinates plot (GH10709)
axvlines_kwds
Option to .info() and .memory_usage() to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (GH11595)
.info()
.memory_usage()
In [4]: df = pd.DataFrame({"A": ["foo"] * 1000}) # noqa: F821 In [5]: df["B"] = df["A"].astype("category") # shows the '+' as we have object dtypes In [6]: df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1000 non-null object 1 B 1000 non-null category dtypes: category(1), object(1) memory usage: 9.0+ KB # we have an accurate memory assessment (but can be expensive to compute this) In [7]: df.info(memory_usage="deep") <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1000 non-null object 1 B 1000 non-null category dtypes: category(1), object(1) memory usage: 59.9 KB
Index now has a fillna method (GH10089)
Index
fillna
In [8]: pd.Index([1, np.nan, 3]).fillna(2) Out[8]: Float64Index([1.0, 2.0, 3.0], dtype='float64')
Series of type category now make .str.<...> and .dt.<...> accessor methods / properties available, if the categories are of that type. (GH10661)
category
.str.<...>
.dt.<...>
In [9]: s = pd.Series(list("aabb")).astype("category") In [10]: s Out[10]: 0 a 1 a 2 b 3 b Length: 4, dtype: category Categories (2, object): ['a', 'b'] In [11]: s.str.contains("a") Out[11]: 0 True 1 True 2 False 3 False Length: 4, dtype: bool In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category") In [13]: date Out[13]: 0 2015-01-01 1 2015-01-02 2 2015-01-03 3 2015-01-04 4 2015-01-05 Length: 5, dtype: category Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05] In [14]: date.dt.day Out[14]: 0 1 1 2 2 3 3 4 4 5 Length: 5, dtype: int64
pivot_table now has a margins_name argument so you can use something other than the default of ‘All’ (GH3335)
pivot_table
margins_name
Implement export of datetime64[ns, tz] dtypes with a fixed HDF5 store (GH11411)
datetime64[ns, tz]
Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax ({x, y}) instead of Legacy Python syntax (set([x, y])) (GH11215)
{x, y}
set([x, y])
Improve the error message in pandas.io.gbq.to_gbq() when a streaming insert fails (GH11285) and when the DataFrame does not match the schema of the destination table (GH11359)
pandas.io.gbq.to_gbq()
raise NotImplementedError in Index.shift for non-supported index types (GH8038)
NotImplementedError
Index.shift
min and max reductions on datetime64 and timedelta64 dtyped series now result in NaT and not nan (GH11245).
min
max
datetime64
timedelta64
NaT
nan
Indexing with a null key will raise a TypeError, instead of a ValueError (GH11356)
TypeError
ValueError
Series.ptp will now ignore missing values by default (GH11163)
Series.ptp
The pandas.io.ga module which implements google-analytics support is deprecated and will be removed in a future version (GH11308)
pandas.io.ga
google-analytics
Deprecate the engine keyword in .to_csv(), which will be removed in a future version (GH11274)
engine
Checking monotonic-ness before sorting on an index (GH11080)
Series.dropna performance improvement when its dtype can’t contain NaN (GH11159)
Series.dropna
NaN
Release the GIL on most datetime field operations (e.g. DatetimeIndex.year, Series.dt.year), normalization, and conversion to and from Period, DatetimeIndex.to_period and PeriodIndex.to_timestamp (GH11263)
DatetimeIndex.year
Series.dt.year
Period
DatetimeIndex.to_period
PeriodIndex.to_timestamp
Release the GIL on some rolling algos: rolling_median, rolling_mean, rolling_max, rolling_min, rolling_var, rolling_kurt, rolling_skew (GH11450)
rolling_median
rolling_mean
rolling_max
rolling_min
rolling_var
rolling_kurt
rolling_skew
Release the GIL when reading and parsing text files in read_csv, read_table (GH11272)
read_csv
read_table
Improved performance of rolling_median (GH11450)
Improved performance of to_excel (GH11352)
to_excel
Performance bug in repr of Categorical categories, which was rendering the strings before chopping them for display (GH11305)
Categorical
Performance improvement in Categorical.remove_unused_categories, (GH11643).
Categorical.remove_unused_categories
Improved performance of Series constructor with no data and DatetimeIndex (GH11433)
Improved performance of shift, cumprod, and cumsum with groupby (GH4095)
shift
cumprod
cumsum
SparseArray.__iter__() now does not cause PendingDeprecationWarning in Python 3.5 (GH11622)
SparseArray.__iter__()
PendingDeprecationWarning
Regression from 0.16.2 for output formatting of long floats/nan, restored in (GH11302)
Series.sort_index() now correctly handles the inplace option (GH11402)
Series.sort_index()
inplace
Incorrectly distributed .c file in the build on PyPi when reading a csv of floats and passing na_values=<a scalar> would show an exception (GH11374)
PyPi
na_values=<a scalar>
Bug in .to_latex() output broken when the index has a name (GH10660)
Bug in HDFStore.append with strings whose encoded length exceeded the max unencoded length (GH11234)
HDFStore.append
Bug in merging datetime64[ns, tz] dtypes (GH11405)
Bug in HDFStore.select when comparing with a numpy scalar in a where clause (GH11283)
HDFStore.select
Bug in using DataFrame.ix with a MultiIndex indexer (GH11372)
DataFrame.ix
Bug in date_range with ambiguous endpoints (GH11626)
date_range
Prevent adding new attributes to the accessors .str, .dt and .cat. Retrieving such a value was not possible, so error out on setting it. (GH10673)
.str
.dt
.cat
Bug in tz-conversions with an ambiguous time and .dt accessors (GH11295)
Bug in output formatting when using an index of ambiguous times (GH11619)
Bug in comparisons of Series vs list-likes (GH11339)
Bug in DataFrame.replace with a datetime64[ns, tz] and a non-compat to_replace (GH11326, GH11153)
DataFrame.replace
Bug in isnull where numpy.datetime64('NaT') in a numpy.array was not determined to be null(GH11206)
isnull
numpy.datetime64('NaT')
numpy.array
Bug in list-like indexing with a mixed-integer Index (GH11320)
Bug in pivot_table with margins=True when indexes are of Categorical dtype (GH10993)
margins=True
Bug in DataFrame.plot cannot use hex strings colors (GH10299)
DataFrame.plot
Regression in DataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values (GH11376)
Bug in pd.eval where unary ops in a list error (GH11235)
pd.eval
Bug in squeeze() with zero length arrays (GH11230, GH8999)
squeeze()
Bug in describe() dropping column names for hierarchical indexes (GH11517)
describe()
Bug in DataFrame.pct_change() not propagating axis keyword on .fillna method (GH11150)
DataFrame.pct_change()
axis
.fillna
Bug in .to_csv() when a mix of integer and string column names are passed as the columns parameter (GH11637)
columns
Bug in indexing with a range, (GH11652)
range
Bug in inference of numpy scalars and preserving dtype when setting columns (GH11638)
Bug in to_sql using unicode column names giving UnicodeEncodeError with (GH11431).
to_sql
Fix regression in setting of xticks in plot (GH11529).
xticks
plot
Bug in holiday.dates where observance rules could not be applied to holiday and doc enhancement (GH11477, GH11533)
holiday.dates
Fix plotting issues when having plain Axes instances instead of SubplotAxes (GH11520, GH11556).
Axes
SubplotAxes
Bug in DataFrame.to_latex() produces an extra rule when header=False (GH7124)
DataFrame.to_latex()
header=False
Bug in df.groupby(...).apply(func) when a func returns a Series containing a new datetimelike column (GH11324)
df.groupby(...).apply(func)
Bug in pandas.json when file to load is big (GH11344)
pandas.json
Bugs in to_excel with duplicate columns (GH11007, GH10982, GH10970)
Fixed a bug that prevented the construction of an empty series of dtype datetime64[ns, tz] (GH11245).
Bug in read_excel with MultiIndex containing integers (GH11317)
read_excel
Bug in to_excel with openpyxl 2.2+ and merging (GH11408)
Bug in DataFrame.to_dict() produces a np.datetime64 object instead of Timestamp when only datetime is present in data (GH11327)
DataFrame.to_dict()
np.datetime64
Timestamp
Bug in DataFrame.corr() raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (GH11560)
DataFrame.corr()
Bug in the link-time error caused by C inline functions on FreeBSD 10+ (with clang) (GH10510)
inline
clang
Bug in DataFrame.to_csv in passing through arguments for formatting MultiIndexes, including date_format (GH7791)
DataFrame.to_csv
MultiIndexes
date_format
Bug in DataFrame.join() with how='right' producing a TypeError (GH11519)
DataFrame.join()
how='right'
Bug in Series.quantile with empty list results has Index with object dtype (GH11588)
Series.quantile
object
Bug in pd.merge results in empty Int64Index rather than Index(dtype=object) when the merge result is empty (GH11588)
pd.merge
Int64Index
Index(dtype=object)
Bug in Categorical.remove_unused_categories when having NaN values (GH11599)
Bug in DataFrame.to_sparse() loses column names for MultiIndexes (GH11600)
DataFrame.to_sparse()
Bug in DataFrame.round() with non-unique column index producing a Fatal Python error (GH11611)
DataFrame.round()
Bug in DataFrame.round() with decimals being a non-unique indexed Series producing extra columns (GH11618)
decimals
A total of 63 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Aleksandr Drozd +
Alex Chase +
Anthonios Partheniou
BrenBarn +
Brian J. McGuirk +
Chris
Christian Berendt +
Christian Perez +
Cody Piersall +
Data & Code Expert Experimenting with Code on Data
DrIrv +
Evan Wright
Guillaume Gay
Hamed Saljooghinejad +
Iblis Lin +
Jake VanderPlas
Jan Schulz
Jean-Mathieu Deschenes +
Jeff Reback
Jimmy Callin +
Joris Van den Bossche
K.-Michael Aye
Ka Wo Chen
Loïc Séguin-C +
Luo Yicheng +
Magnus Jöud +
Manuel Leonhardt +
Matthew Gilbert
Maximilian Roos
Michael +
Nicholas Stahl +
Nicolas Bonnotte +
Pastafarianist +
Petra Chong +
Phil Schaf +
Philipp A +
Rob deCarvalho +
Roman Khomenko +
Rémy Léone +
Sebastian Bank +
Sinhrks
Stephan Hoyer
Thierry Moisan
Tom Augspurger
Tux1 +
Varun +
Wieland Hoffmann +
Winterflower
Yoav Ram +
Younggun Kim
Zeke +
ajcr
azuranski +
behzad nouri
cel4
emilydolson +
hironow +
lexual
llllllllll +
rockg
silentquasar +
sinhrks
taeold +