What’s New¶
These are new features and improvements of note in each release.
v0.20.2 (June 4, 2017)¶
This is a minor bug-fix release in the 0.20.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend that all users upgrade to this version.
What’s new in v0.20.2
Enhancements¶
- Unblocked access to additional compression types supported in pytables: ‘blosc:blosclz, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’ (GH14478)
Series
provides ato_latex
method (GH16180)- A new groupby method
ngroup()
, parallel to the existingcumcount()
, has been added to return the group order (GH11642); see here.
Performance Improvements¶
- Performance regression fix when indexing with a list-like (GH16285)
- Performance regression fix for MultiIndexes (GH16319, GH16346)
- Improved performance of
.clip()
with scalar arguments (GH15400) - Improved performance of groupby with categorical groupers (GH16413)
- Improved performance of
MultiIndex.remove_unused_levels()
(GH16556)
Bug Fixes¶
- Silenced a warning on some Windows environments about “tput: terminal attributes: No such device or address” when detecting the terminal size. This fix only applies to python 3 (GH16496)
- Bug in using
pathlib.Path
orpy.path.local
objects with io functions (GH16291) - Bug in
Index.symmetric_difference()
on two equal MultiIndex’s, results in aTypeError
(:issue 13490) - Bug in
DataFrame.update()
withoverwrite=False
andNaN values
(GH15593) - Passing an invalid engine to
read_csv()
now raises an informativeValueError
rather thanUnboundLocalError
. (GH16511) - Bug in
unique()
on an array of tuples (GH16519) - Bug in
cut()
whenlabels
are set, resulting in incorrect label ordering (GH16459) - Fixed a compatibility issue with IPython 6.0’s tab completion showing deprecation warnings on
Categoricals
(GH16409)
Conversion¶
- Bug in
to_numeric()
in which empty data inputs were causing a segfault of the interpreter (GH16302) - Silence numpy warnings when broadcasting
DataFrame
toSeries
with comparison ops (GH16378, GH16306)
Indexing¶
- Bug in
DataFrame.reset_index(level=)
with single level index (GH16263) - Bug in partial string indexing with a monotonic, but not strictly-monotonic, index incorrectly reversing the slice bounds (GH16515)
- Bug in
MultiIndex.remove_unused_levels()
that would not return aMultiIndex
equal to the original. (GH16556)
I/O¶
- Bug in
read_csv()
whencomment
is passed in a space delimited text file (GH16472) - Bug in
read_csv()
not raising an exception with nonexistent columns inusecols
when it had the correct length (GH14671) - Bug that would force importing of the clipboard routines unnecessarily, potentially causing an import error on startup (GH16288)
- Bug that raised
IndexError
when HTML-rendering an emptyDataFrame
(GH15953) - Bug in
read_csv()
in which tarfile object inputs were raising an error in Python 2.x for the C engine (GH16530) - Bug where
DataFrame.to_html()
ignored theindex_names
parameter (GH16493) - Bug where
pd.read_hdf()
returns numpy strings for index names (GH13492) - Bug in
HDFStore.select_as_multiple()
where start/stop arguments were not respected (GH16209)
Plotting¶
Groupby/Resample/Rolling¶
Reshaping¶
- Bug in
DataFrame.stack
with unsorted levels inMultiIndex
columns (GH16323) - Bug in
pd.wide_to_long()
where no error was raised wheni
was not a unique identifier (GH16382) - Bug in
Series.isin(..)
with a list of tuples (GH16394) - Bug in construction of a
DataFrame
with mixed dtypes including an all-NaT column. (GH16395) - Bug in
DataFrame.agg()
andSeries.agg()
with aggregating on non-callable attributes (GH16405)
Numeric¶
- Bug in
.interpolate()
, wherelimit_direction
was not respected whenlimit=None
(default) was passed (GH16282)
v0.20.1 (May 5, 2017)¶
This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- New
.agg()
API for Series/DataFrame similar to the groupby-rolling-resample API’s, see here - Integration with the
feather-format
, including a new top-levelpd.read_feather()
andDataFrame.to_feather()
method, see here. - The
.ix
indexer has been deprecated, see here Panel
has been deprecated, see here- Addition of an
IntervalIndex
andInterval
scalar type, see here - Improved user API when grouping by index levels in
.groupby()
, see here - Improved support for
UInt64
dtypes, see here - A new orient for JSON serialization,
orient='table'
, that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see here - Experimental support for exporting styled DataFrames (
DataFrame.style
) to Excel, see here - Window binary corr/cov operations now return a MultiIndexed
DataFrame
rather than aPanel
, asPanel
is now deprecated, see here - Support for S3 handling now uses
s3fs
, see here - Google BigQuery support now uses the
pandas-gbq
library, see here
Warning
Pandas has changed the internal structure and layout of the codebase.
This can affect imports that are not from the top-level pandas.*
namespace, please see the changes here.
Check the API Changes and deprecations before updating.
Note
This is a combined release for 0.20.0 and and 0.20.1.
Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas’ utils
routines. (GH16250)
What’s new in v0.20.0
- New features
agg
API for DataFrame/Seriesdtype
keyword for data IO.to_datetime()
has gained anorigin
parameter- Groupby Enhancements
- Better support for compressed URLs in
read_csv
- Pickle file I/O now supports compression
- UInt64 Support Improved
- GroupBy on Categoricals
- Table Schema Output
- SciPy sparse matrix from/to SparseDataFrame
- Excel output for styled DataFrames
- IntervalIndex
- Other Enhancements
- Backwards incompatible API changes
- Possible incompatibility for HDF5 formats created with pandas < 0.13.0
- Map on Index types now return other Index types
- Accessing datetime fields of Index now return Index
- pd.unique will now be consistent with extension types
- S3 File Handling
- Partial String Indexing Changes
- Concat of different float dtypes will not automatically upcast
- Pandas Google BigQuery support has moved
- Memory Usage for Index is more Accurate
- DataFrame.sort_index changes
- Groupby Describe Formatting
- Window Binary Corr/Cov operations return a MultiIndex DataFrame
- HDFStore where string comparison
- Index.intersection and inner join now preserve the order of the left Index
- Pivot Table always returns a DataFrame
- Other API Changes
- Reorganization of the library: Privacy Changes
- Deprecations
- Removal of prior version deprecations/changes
- Performance Improvements
- Bug Fixes
New features¶
agg
API for DataFrame/Series¶
Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
from groupby, window operations, and resampling. This allows aggregation operations in a concise way
by using agg()
and transform()
. The full documentation
is here (GH1623).
Here is a sample
In [1]: df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
...: index=pd.date_range('1/1/2000', periods=10))
...:
In [2]: df.iloc[3:7] = np.nan
In [3]: df
Out[3]:
A B C
2000-01-01 1.474071 -0.064034 -1.282782
2000-01-02 0.781836 -1.071357 0.441153
2000-01-03 2.353925 0.583787 0.221471
2000-01-04 NaN NaN NaN
2000-01-05 NaN NaN NaN
2000-01-06 NaN NaN NaN
2000-01-07 NaN NaN NaN
2000-01-08 0.901805 1.171216 0.520260
2000-01-09 -1.197071 -1.066969 -0.303421
2000-01-10 -0.858447 0.306996 -0.028665
One can operate using string function names, callables, lists, or dictionaries of these.
Using a single function is equivalent to .apply
.
In [4]: df.agg('sum')
Out[4]:
A 3.456119
B -0.140361
C -0.431984
dtype: float64
Multiple aggregations with a list of functions.
In [5]: df.agg(['sum', 'min'])
Out[5]:
A B C
sum 3.456119 -0.140361 -0.431984
min -1.197071 -1.071357 -1.282782
Using a dict provides the ability to apply specific aggregations per column.
You will get a matrix-like output of all of the aggregators. The output has one column
per unique function. Those functions applied to a particular column will be NaN
:
In [6]: df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
Out[6]:
A B
max NaN 1.171216
min -1.197071 -1.071357
sum 3.456119 NaN
The API also supports a .transform()
function for broadcasting results.
In [7]: df.transform(['abs', lambda x: x - x.min()])
Out[7]:
A B C
abs <lambda> abs <lambda> abs <lambda>
2000-01-01 1.474071 2.671143 0.064034 1.007322 1.282782 0.000000
2000-01-02 0.781836 1.978907 1.071357 0.000000 0.441153 1.723935
2000-01-03 2.353925 3.550996 0.583787 1.655143 0.221471 1.504252
2000-01-04 NaN NaN NaN NaN NaN NaN
2000-01-05 NaN NaN NaN NaN NaN NaN
2000-01-06 NaN NaN NaN NaN NaN NaN
2000-01-07 NaN NaN NaN NaN NaN NaN
2000-01-08 0.901805 2.098877 1.171216 2.242573 0.520260 1.803042
2000-01-09 1.197071 0.000000 1.066969 0.004388 0.303421 0.979361
2000-01-10 0.858447 0.338624 0.306996 1.378353 0.028665 1.254117
When presented with mixed dtypes that cannot be aggregated, .agg()
will only take the valid
aggregations. This is similiar to how groupby .agg()
works. (GH15015)
In [8]: df = pd.DataFrame({'A': [1, 2, 3],
...: 'B': [1., 2., 3.],
...: 'C': ['foo', 'bar', 'baz'],
...: 'D': pd.date_range('20130101', periods=3)})
...:
In [9]: df.dtypes
Out[9]:
A int64
B float64
C object
D datetime64[ns]
dtype: object
In [10]: df.agg(['min', 'sum'])
Out[10]:
A B C D
min 1 1.0 bar 2013-01-01
sum 6 6.0 foobarbaz NaT
dtype
keyword for data IO¶
The 'python'
engine for read_csv()
, as well as the read_fwf()
function for parsing
fixed-width text files and read_excel()
for parsing Excel files, now accept the dtype
keyword argument for specifying the types of specific columns (GH14295). See the io docs for more information.
In [11]: data = "a b\n1 2\n3 4"
In [12]: pd.read_fwf(StringIO(data)).dtypes
Out[12]:
a int64
b int64
dtype: object
In [13]: pd.read_fwf(StringIO(data), dtype={'a':'float64', 'b':'object'}).dtypes