What’s New¶
These are new features and improvements of note in each release.
v0.23.3 (July 7, 2018)¶
This release fixes a build issue with the sdist for Python 3.7 (GH21785) There are no other changes.
v0.23.2¶
This is a minor bug-fix release in the 0.23.x series and includes some small regression fixes and bug fixes. We recommend that all users upgrade to this version.
Note
Pandas 0.23.2 is first pandas release that’s compatible with Python 3.7 (GH20552)
What’s new in v0.23.2
Logical Reductions over Entire DataFrame¶
DataFrame.all()
and DataFrame.any()
now accept axis=None
to reduce over all axes to a scalar (GH19976)
In [1]: df = pd.DataFrame({"A": [1, 2], "B": [True, False]})
In [2]: df.all(axis=None)
Out[2]: False
This also provides compatibility with NumPy 1.15, which now dispatches to DataFrame.all
.
With NumPy 1.15 and pandas 0.23.1 or earlier, numpy.all()
will no longer reduce over every axis:
>>> # NumPy 1.15, pandas 0.23.1
>>> np.any(pd.DataFrame({"A": [False], "B": [False]}))
A False
B False
dtype: bool
With pandas 0.23.2, that will correctly return False, as it did with NumPy < 1.15.
In [3]: np.any(pd.DataFrame({"A": [False], "B": [False]}))
Out[3]: False
Fixed Regressions¶
- Fixed regression in
to_csv()
when handling file-like object incorrectly (GH21471) - Re-allowed duplicate level names of a
MultiIndex
. Accessing a level that has a duplicate name by name still raises an error (GH19029). - Bug in both
DataFrame.first_valid_index()
andSeries.first_valid_index()
raised for a row index having duplicate values (GH21441) - Fixed printing of DataFrames with hierarchical columns with long names (GH21180)
- Fixed regression in
reindex()
andgroupby()
with a MultiIndex or multiple keys that contains categorical datetime-like values (GH21390). - Fixed regression in unary negative operations with object dtype (GH21380)
- Bug in
Timestamp.ceil()
andTimestamp.floor()
when timestamp is a multiple of the rounding frequency (GH21262) - Fixed regression in
to_clipboard()
that defaulted to copying dataframes with space delimited instead of tab delimited (GH21104)
Build Changes¶
- The source and binary distributions no longer include test data files, resulting in smaller download sizes. Tests relying on these data files will be skipped when using
pandas.test()
. (GH19320)
Bug Fixes¶
Conversion
- Bug in constructing
Index
with an iterator or generator (GH21470) - Bug in
Series.nlargest()
for signed and unsigned integer dtypes when the minimum value is present (GH21426)
Indexing
- Bug in
Index.get_indexer_non_unique()
with categorical key (GH21448) - Bug in comparison operations for
MultiIndex
where error was raised on equality / inequality comparison involving a MultiIndex withnlevels == 1
(GH21149) - Bug in
DataFrame.drop()
behaviour is not consistent for unique and non-unique indexes (GH21494) - Bug in
DataFrame.duplicated()
with a large number of columns causing a ‘maximum recursion depth exceeded’ (GH21524).
I/O
- Bug in
read_csv()
that caused it to incorrectly raise an error whennrows=0
,low_memory=True
, andindex_col
was notNone
(GH21141) - Bug in
json_normalize()
when formatting therecord_prefix
with integer columns (GH21536)
Categorical
Timezones
- Bug in
Timestamp
andDatetimeIndex
where passing aTimestamp
localized after a DST transition would return a datetime before the DST transition (GH20854) - Bug in comparing
DataFrame`s with tz-aware :class:`DatetimeIndex
columns with a DST transition that raised aKeyError
(GH19970)
Timedelta
v0.23.1¶
This is a minor bug-fix release in the 0.23.x series and includes some small regression fixes and bug fixes. We recommend that all users upgrade to this version.
What’s new in v0.23.1
Fixed Regressions¶
Comparing Series with datetime.date
We’ve reverted a 0.23.0 change to comparing a Series
holding datetimes and a datetime.date
object (GH21152).
In pandas 0.22 and earlier, comparing a Series holding datetimes and datetime.date
objects would coerce the datetime.date
to a datetime before comapring.
This was inconsistent with Python, NumPy, and DatetimeIndex
, which never consider a datetime and datetime.date
equal.
In 0.23.0, we unified operations between DatetimeIndex and Series, and in the process changed comparisons between a Series of datetimes and datetime.date
without warning.
We’ve temporarily restored the 0.22.0 behavior, so datetimes and dates may again compare equal, but restore the 0.23.0 behavior in a future release.
To summarize, here’s the behavior in 0.22.0, 0.23.0, 0.23.1:
# 0.22.0... Silently coerce the datetime.date
>>> Series(pd.date_range('2017', periods=2)) == datetime.date(2017, 1, 1)
0 True
1 False
dtype: bool
# 0.23.0... Do not coerce the datetime.date
>>> Series(pd.date_range('2017', periods=2)) == datetime.date(2017, 1, 1)
0 False
1 False
dtype: bool
# 0.23.1... Coerce the datetime.date with a warning
>>> Series(pd.date_range('2017', periods=2)) == datetime.date(2017, 1, 1)
/bin/python:1: FutureWarning: Comparing Series of datetimes with 'datetime.date'. Currently, the
'datetime.date' is coerced to a datetime. In the future pandas will
not coerce, and the values not compare equal to the 'datetime.date'.
To retain the current behavior, convert the 'datetime.date' to a
datetime with 'pd.Timestamp'.
#!/bin/python3
0 True
1 False
dtype: bool
In addition, ordering comparisons will raise a TypeError
in the future.
Other Fixes
- Reverted the ability of
to_sql()
to perform multivalue inserts as this caused regression in certain cases (GH21103). In the future this will be made configurable. - Fixed regression in the
DatetimeIndex.date
andDatetimeIndex.time
attributes in case of timezone-aware data:DatetimeIndex.time
returned a tz-aware time instead of tz-naive (GH21267) andDatetimeIndex.date
returned incorrect date when the input date has a non-UTC timezone (GH21230). - Fixed regression in
pandas.io.json.json_normalize()
when called withNone
values in nested levels in JSON, and to not drop keys with value as None (GH21158, GH21356). - Bug in
to_csv()
causes encoding error when compression and encoding are specified (GH21241, GH21118) - Bug preventing pandas from being importable with -OO optimization (GH21071)
- Bug in
Categorical.fillna()
incorrectly raising aTypeError
when value the individual categories are iterable and value is an iterable (GH21097, GH19788) - Fixed regression in constructors coercing NA values like
None
to strings when passingdtype=str
(GH21083) - Regression in
pivot_table()
where an orderedCategorical
with missing values for the pivot’sindex
would give a mis-aligned result (GH21133) - Fixed regression in merging on boolean index/columns (GH21119).
Performance Improvements¶
Bug Fixes¶
Groupby/Resample/Rolling
- Bug in
DataFrame.agg()
where applying multiple aggregation functions to aDataFrame
with duplicated column names would cause a stack overflow (GH21063) - Bug in
pandas.core.groupby.GroupBy.ffill()
andpandas.core.groupby.GroupBy.bfill()
where the fill within a grouping would not always be applied as intended due to the implementations’ use of a non-stable sort (GH21207) - Bug in
pandas.core.groupby.GroupBy.rank()
where results did not scale to 100% when specifyingmethod='dense'
andpct=True
- Bug in
pandas.DataFrame.rolling()
andpandas.Series.rolling()
which incorrectly accepted a 0 window size rather than raising (GH21286)
Data-type specific
- Bug in
Series.str.replace()
where the method throws TypeError on Python 3.5.2 (GH21078) - Bug in
Timedelta
: where passing a float with a unit would prematurely round the float precision (GH14156) - Bug in
pandas.testing.assert_index_equal()
which raisedAssertionError
incorrectly, when comparing twoCategoricalIndex
objects with paramcheck_categorical=False
(GH19776)
Sparse
- Bug in
SparseArray.shape
which previously only returned the shapeSparseArray.sp_values
(GH21126)
Indexing
- Bug in
Series.reset_index()
where appropriate error was not raised with an invalid level name (GH20925) - Bug in
interval_range()
whenstart
/periods
orend
/periods
are specified with floatstart
orend
(GH21161) - Bug in
MultiIndex.set_names()
where error raised for aMultiIndex
withnlevels == 1
(GH21149) - Bug in
IntervalIndex
constructors where creating anIntervalIndex
from categorical data was not fully supported (GH21243, GH21253) - Bug in
MultiIndex.sort_index()
which was not guaranteed to sort correctly withlevel=1
; this was also causing data misalignment in particularDataFrame.stack()
operations (GH20994, GH20945, GH21052)
Plotting
- New keywords (sharex, sharey) to turn on/off sharing of x/y-axis by subplots generated with pandas.DataFrame().groupby().boxplot() (GH20968)
I/O
- Bug in IO methods specifying
compression='zip'
which produced uncompressed zip archives (GH17778, GH21144) - Bug in
DataFrame.to_stata()
which prevented exporting DataFrames to buffers and most file-like objects (GH21041) - Bug in
read_stata()
andStataReader
which did not correctly decode utf-8 strings on Python 3 from Stata 14 files (dta version 118) (GH21244) - Bug in IO JSON
read_json()
reading empty JSON schema withorient='table'
back toDataFrame
caused an error (GH21287)
Reshaping
- Bug in
concat()
where error was raised in concatenatingSeries
with numpy scalar and tuple names (GH21015) - Bug in
concat()
warning message providing the wrong guidance for future behavior (GH21101)
Other
v0.23.0 (May 15, 2018)¶
This is a major release from 0.22.0 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Round-trippable JSON format with ‘table’ orient.
- Instantiation from dicts respects order for Python 3.6+.
- Dependent column arguments for assign.
- Merging / sorting on a combination of columns and index levels.
- Extending Pandas with custom types.
- Excluding unobserved categories from groupby.
- Changes to make output shape of DataFrame.apply consistent.
Check the API Changes and deprecations before updating.
Warning
Starting January 1, 2019, pandas feature releases will support Python 3 only. See Plan for dropping Python 2.7 for more.
What’s new in v0.23.0
- New features
- JSON read/write round-trippable with
orient='table'
.assign()
accepts dependent arguments- Merging on a combination of columns and index levels
- Sorting by a combination of columns and index levels
- Extending Pandas with Custom Types (Experimental)
- New
observed
keyword for excluding unobserved categories ingroupby
- Rolling/Expanding.apply() accepts
raw=False
to pass aSeries
to the function DataFrame.interpolate
has gained thelimit_area
kwargget_dummies
now supportsdtype
argument- Timedelta mod method
.rank()
handlesinf
values whenNaN
are presentSeries.str.cat
has gained thejoin
kwargDataFrame.astype
performs column-wise conversion toCategorical
- Other Enhancements
- JSON read/write round-trippable with
- Backwards incompatible API changes
- Dependencies have increased minimum versions
- Instantiation from dicts preserves dict insertion order for python 3.6+
- Deprecate Panel
- pandas.core.common removals
- Changes to make output of
DataFrame.apply
consistent - Concatenation will no longer sort
- Build Changes
- Index Division By Zero Fills Correctly
- Extraction of matching patterns from strings
- Default value for the
ordered
parameter ofCategoricalDtype
- Better pretty-printing of DataFrames in a terminal
- Datetimelike API Changes
- Other API Changes
- Deprecations
- Removal of prior version deprecations/changes
- Performance Improvements
- Documentation Changes
- Bug Fixes
New features¶
JSON read/write round-trippable with orient='table'
¶
A DataFrame
can now be written to and subsequently read back via JSON while preserving metadata through usage of the orient='table'
argument (see GH18912 and GH9146). Previously, none of the available orient
values guaranteed the preservation of dtypes and index names, amongst other metadata.
In [1]: df = pd.DataFrame({'foo': [1, 2, 3, 4],
...: 'bar': ['a', 'b', 'c', 'd'],
...: 'baz': pd.date_range('2018-01-01', freq='d', periods=4),
...: 'qux': pd.Categorical(['a', 'b', 'c', 'c'])
...: }, index=pd.Index(range(4), name='idx'))
...:
In [2]: df
Out[2]:
foo bar baz qux
idx
0 1 a 2018-01-01 a
1 2 b 2018-01-02 b
2 3 c 2018-01-03 c
3 4 d 2018-01-04 c
In [3]: df.dtypes