v0.17.1 (November 21, 2015)¶
Note
We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization). This will help ensure the success of development of pandas as a world-class open-source project.
This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
- Support for Conditional HTML Formatting, see here
- Releasing the GIL on the csv reader & other ops, see here
- Fixed regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (GH11376)
What’s new in v0.17.1
New features¶
Conditional HTML formatting¶
Warning
This is a new feature and is under active development. We’ll be adding features an possibly making breaking changes in future releases. Feedback is welcome.
We’ve added experimental support for conditional HTML formatting:
the visual styling of a DataFrame based on the data.
The styling is accomplished with HTML and CSS.
Accesses the styler class with the pandas.DataFrame.style
, attribute,
an instance of Styler
with your data attached.
Here’s a quick example:
In [1]: np.random.seed(123) In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list('abcde')) In [3]: html = df.style.background_gradient(cmap='viridis', low=.5)
We can render the HTML to get the following table.
a | b | c | d | e | |
---|---|---|---|---|---|
0 | -1.085631 | 0.997345 | 0.282978 | -1.506295 | -0.5786 |
1 | 1.651437 | -2.426679 | -0.428913 | 1.265936 | -0.86674 |
2 | -0.678886 | -0.094709 | 1.49139 | -0.638902 | -0.443982 |
3 | -0.434351 | 2.20593 | 2.186786 | 1.004054 | 0.386186 |
4 | 0.737369 | 1.490732 | -0.935834 | 1.175829 | -1.253881 |
5 | -0.637752 | 0.907105 | -1.428681 | -0.140069 | -0.861755 |
6 | -0.255619 | -2.798589 | -1.771533 | -0.699877 | 0.927462 |
7 | -0.173636 | 0.002846 | 0.688223 | -0.879536 | 0.283627 |
8 | -0.805367 | -1.727669 | -0.3909 | 0.573806 | 0.338589 |
9 | -0.01183 | 2.392365 | 0.412912 | 0.978736 | 2.238143 |
Styler
interacts nicely with the Jupyter Notebook.
See the documentation for more.
Enhancements¶
DatetimeIndex
now supports conversion to strings withastype(str)
(GH10442)Support for
compression
(gzip/bz2) inpandas.DataFrame.to_csv()
(GH7615)pd.read_*
functions can now also acceptpathlib.Path
, orpy._path.local.LocalPath
objects for thefilepath_or_buffer
argument. (GH11033) - TheDataFrame
andSeries
functions.to_csv()
,.to_html()
and.to_latex()
can now handle paths beginning with tildes (e.g.~/Documents/
) (GH11438)DataFrame
now uses the fields of anamedtuple
as columns, if columns are not supplied (GH11181)DataFrame.itertuples()
now returnsnamedtuple
objects, when possible. (GH11269, GH11625)Added
axvlines_kwds
to parallel coordinates plot (GH10709)Option to
.info()
and.memory_usage()
to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (GH11595)In [4]: df = pd.DataFrame({'A': ['foo'] * 1000}) # noqa: F821 In [5]: df['B'] = df['A'].astype('category') # shows the '+' as we have object dtypes In [6]: df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): A 1000 non-null object B 1000 non-null category dtypes: category(1), object(1) memory usage: 9.0+ KB # we have an accurate memory assessment (but can be expensive to compute this) In [7]: df.info(memory_usage='deep') <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): A 1000 non-null object B 1000 non-null category dtypes: category(1), object(1) memory usage: 75.5 KB
Index
now has afillna
method (GH10089)In [8]: pd.Index([1, np.nan, 3]).fillna(2) Out[8]: Float64Index([1.0, 2.0, 3.0], dtype='float64')
Series of type
category
now make.str.<...>
and.dt.<...>
accessor methods / properties available, if the categories are of that type. (GH10661)In [9]: s = pd.Series(list('aabb')).astype('category') In [10]: s Out[10]: 0 a 1 a 2 b 3 b Length: 4, dtype: category Categories (2, object): [a, b] In [11]: s.str.contains("a") Out[11]: 0 True 1 True 2 False 3 False Length: 4, dtype: bool In [12]: date = pd.Series(pd.date_range('1/1/2015', periods=5)).astype('category') In [13]: date Out[13]: 0 2015-01-01 1 2015-01-02 2 2015-01-03 3 2015-01-04 4 2015-01-05 Length: 5, dtype: category Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05] In [14]: date.dt.day Out[14]: 0 1 1 2 2 3 3 4 4 5 Length: 5, dtype: int64
pivot_table
now has amargins_name
argument so you can use something other than the default of ‘All’ (GH3335)Implement export of
datetime64[ns, tz]
dtypes with a fixed HDF5 store (GH11411)Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax (
{x, y}
) instead of Legacy Python syntax (set([x, y])
) (GH11215)Improve the error message in
pandas.io.gbq.to_gbq()
when a streaming insert fails (GH11285) and when the DataFrame does not match the schema of the destination table (GH11359)
API changes¶
- raise
NotImplementedError
inIndex.shift
for non-supported index types (GH8038) min
andmax
reductions ondatetime64
andtimedelta64
dtyped series now result inNaT
and notnan
(GH11245).- Indexing with a null key will raise a
TypeError
, instead of aValueError
(GH11356) Series.ptp
will now ignore missing values by default (GH11163)
Performance improvements¶
- Checking monotonic-ness before sorting on an index (GH11080)
Series.dropna
performance improvement when its dtype can’t containNaN
(GH11159)- Release the GIL on most datetime field operations (e.g.
DatetimeIndex.year
,Series.dt.year
), normalization, and conversion to and fromPeriod
,DatetimeIndex.to_period
andPeriodIndex.to_timestamp
(GH11263) - Release the GIL on some rolling algos:
rolling_median
,rolling_mean
,rolling_max
,rolling_min
,rolling_var
,rolling_kurt
,rolling_skew
(GH11450) - Release the GIL when reading and parsing text files in
read_csv
,read_table
(GH11272) - Improved performance of
rolling_median
(GH11450) - Improved performance of
to_excel
(GH11352) - Performance bug in repr of
Categorical
categories, which was rendering the strings before chopping them for display (GH11305) - Performance improvement in
Categorical.remove_unused_categories
, (GH11643). - Improved performance of
Series
constructor with no data andDatetimeIndex
(GH11433) - Improved performance of
shift
,cumprod
, andcumsum
with groupby (GH4095)
Bug fixes¶
SparseArray.__iter__()
now does not causePendingDeprecationWarning
in Python 3.5 (GH11622)- Regression from 0.16.2 for output formatting of long floats/nan, restored in (GH11302)
Series.sort_index()
now correctly handles theinplace
option (GH11402)- Incorrectly distributed .c file in the build on
PyPi
when reading a csv of floats and passingna_values=<a scalar>
would show an exception (GH11374) - Bug in
.to_latex()
output broken when the index has a name (GH10660) - Bug in
HDFStore.append
with strings whose encoded length exceeded the max unencoded length (GH11234) - Bug in merging
datetime64[ns, tz]
dtypes (GH11405) - Bug in
HDFStore.select
when comparing with a numpy scalar in a where clause (GH11283) - Bug in using
DataFrame.ix
with a MultiIndex indexer (GH11372) - Bug in
date_range
with ambiguous endpoints (GH11626) - Prevent adding new attributes to the accessors
.str
,.dt
and.cat
. Retrieving such a value was not possible, so error out on setting it. (GH10673) - Bug in tz-conversions with an ambiguous time and
.dt
accessors (GH11295) - Bug in output formatting when using an index of ambiguous times (GH11619)
- Bug in comparisons of Series vs list-likes (GH11339)
- Bug in
DataFrame.replace
with adatetime64[ns, tz]
and a non-compat to_replace (GH11326, GH11153) - Bug in
isnull
wherenumpy.datetime64('NaT')
in anumpy.array
was not determined to be null(GH11206) - Bug in list-like indexing with a mixed-integer Index (GH11320)
- Bug in
pivot_table
withmargins=True
when indexes are ofCategorical
dtype (GH10993) - Bug in
DataFrame.plot
cannot use hex strings colors (GH10299) - Regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (GH11376) - Bug in
pd.eval
where unary ops in a list error (GH11235) - Bug in
squeeze()
with zero length arrays (GH11230, GH8999) - Bug in
describe()
dropping column names for hierarchical indexes (GH11517) - Bug in
DataFrame.pct_change()
not propagatingaxis
keyword on.fillna
method (GH11150) - Bug in
.to_csv()
when a mix of integer and string column names are passed as thecolumns
parameter (GH11637) - Bug in indexing with a
range
, (GH11652) - Bug in inference of numpy scalars and preserving dtype when setting columns (GH11638)
- Bug in
to_sql
using unicode column names giving UnicodeEncodeError with (GH11431). - Fix regression in setting of
xticks
inplot
(GH11529). - Bug in
holiday.dates
where observance rules could not be applied to holiday and doc enhancement (GH11477, GH11533) - Fix plotting issues when having plain
Axes
instances instead ofSubplotAxes
(GH11520, GH11556). - Bug in
DataFrame.to_latex()
produces an extra rule whenheader=False
(GH7124) - Bug in
df.groupby(...).apply(func)
when a func returns aSeries
containing a new datetimelike column (GH11324) - Bug in
pandas.json
when file to load is big (GH11344) - Bugs in
to_excel
with duplicate columns (GH11007, GH10982, GH10970) - Fixed a bug that prevented the construction of an empty series of dtype
datetime64[ns, tz]
(GH11245). - Bug in
read_excel
with MultiIndex containing integers (GH11317) - Bug in
to_excel
with openpyxl 2.2+ and merging (GH11408) - Bug in
DataFrame.to_dict()
produces anp.datetime64
object instead ofTimestamp
when only datetime is present in data (GH11327) - Bug in
DataFrame.corr()
raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (GH11560) - Bug in the link-time error caused by C
inline
functions on FreeBSD 10+ (withclang
) (GH10510) - Bug in
DataFrame.to_csv
in passing through arguments for formattingMultiIndexes
, includingdate_format
(GH7791) - Bug in
DataFrame.join()
withhow='right'
producing aTypeError
(GH11519) - Bug in
Series.quantile
with empty list results hasIndex
withobject
dtype (GH11588) - Bug in
pd.merge
results in emptyInt64Index
rather thanIndex(dtype=object)
when the merge result is empty (GH11588) - Bug in
Categorical.remove_unused_categories
when havingNaN
values (GH11599) - Bug in
DataFrame.to_sparse()
loses column names for MultiIndexes (GH11600) - Bug in
DataFrame.round()
with non-unique column index producing a Fatal Python error (GH11611) - Bug in
DataFrame.round()
withdecimals
being a non-unique indexed Series producing extra columns (GH11618)
Contributors¶
A total of 63 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Aleksandr Drozd +
- Alex Chase +
- Anthonios Partheniou
- BrenBarn +
- Brian J. McGuirk +
- Chris
- Christian Berendt +
- Christian Perez +
- Cody Piersall +
- Data & Code Expert Experimenting with Code on Data
- DrIrv +
- Evan Wright
- Guillaume Gay
- Hamed Saljooghinejad +
- Iblis Lin +
- Jake VanderPlas
- Jan Schulz
- Jean-Mathieu Deschenes +
- Jeff Reback
- Jimmy Callin +
- Joris Van den Bossche
- K.-Michael Aye
- Ka Wo Chen
- Loïc Séguin-C +
- Luo Yicheng +
- Magnus Jöud +
- Manuel Leonhardt +
- Matthew Gilbert
- Maximilian Roos
- Michael +
- Nicholas Stahl +
- Nicolas Bonnotte +
- Pastafarianist +
- Petra Chong +
- Phil Schaf +
- Philipp A +
- Rob deCarvalho +
- Roman Khomenko +
- Rémy Léone +
- Sebastian Bank +
- Sinhrks
- Stephan Hoyer
- Thierry Moisan
- Tom Augspurger
- Tux1 +
- Varun +
- Wieland Hoffmann +
- Winterflower
- Yoav Ram +
- Younggun Kim
- Zeke +
- ajcr
- azuranski +
- behzad nouri
- cel4
- emilydolson +
- hironow +
- lexual
- llllllllll +
- rockg
- silentquasar +
- sinhrks
- taeold +