What’s new in 1.2.1 (January 20, 2021)¶
These are the changes in pandas 1.2.1. See Release notes for a full changelog including other versions of pandas.
Fixed regressions¶
Fixed regression in
to_csv()that created corrupted zip files when there were more rows thanchunksize(GH38714)Fixed regression in
to_csv()openingcodecs.StreamReaderWriterin binary mode instead of in text mode (GH39247)Fixed regression in
read_csv()and other read functions were the encoding error policy (errors) did not default to"replace"when no encoding was specified (GH38989)Fixed regression in
read_excel()with non-rawbyte file handles (GH38788)Fixed regression in
DataFrame.to_stata()not removing the created file when an error occured (GH39202)Fixed regression in
DataFrame.__setitem__raisingValueErrorwhen expandingDataFrameand new column is from type"0 - name"(GH39010)Fixed regression in setting with
DataFrame.loc()raisingValueErrorwhenDataFramehas unsortedMultiIndexcolumns and indexer is a scalar (GH38601)Fixed regression in setting with
DataFrame.loc()raisingKeyErrorwithMultiIndexand list-like columns indexer enlargingDataFrame(GH39147)Fixed regression in
groupby()withCategoricalgrouping column not showing unused categories forgrouped.indices(GH38642)Fixed regression in
GroupBy.sem()where the presence of non-numeric columns would cause an error instead of being dropped (GH38774)Fixed regression in
DataFrameGroupBy.diff()raising forint8andint16columns (GH39050)Fixed regression in
DataFrame.groupby()when aggregating anExtensionDTypethat could fail for non-numeric values (GH38980)Fixed regression in
Rolling.skew()andRolling.kurt()modifying the object inplace (GH38908)Fixed regression in
DataFrame.any()andDataFrame.all()not returning a result for tz-awaredatetime64columns (GH38723)Fixed regression in
DataFrame.apply()withaxis=1using str accessor in apply function (GH38979)Fixed regression in
DataFrame.replace()raisingValueErrorwhenDataFramehas dtypebytes(GH38900)Fixed regression in
Series.fillna()that raisedRecursionErrorwithdatetime64[ns, UTC]dtype (GH38851)Fixed regression in comparisons between
NaTanddatetime.dateobjects incorrectly returningTrue(GH39151)Fixed regression in calling NumPy
accumulate()ufuncs on DataFrames, e.g.np.maximum.accumulate(df)(GH39259)Fixed regression in repr of float-like strings of an
objectdtype having trailing 0’s truncated after the decimal (GH38708)Fixed regression that raised
AttributeErrorwith PyArrow versions [0.16.0, 1.0.0) (GH38801)Fixed regression in
pandas.testing.assert_frame_equal()raisingTypeErrorwithcheck_like=TruewhenIndexor columns have mixed dtype (GH39168)
We have reverted a commit that resulted in several plotting related regressions in pandas 1.2.0 (GH38969, GH38736, GH38865, GH38947 and GH39126). As a result, bugs reported as fixed in pandas 1.2.0 related to inconsistent tick labeling in bar plots are again present (GH26186 and GH11465)
Calling NumPy ufuncs on non-aligned DataFrames¶
Before pandas 1.2.0, calling a NumPy ufunc on non-aligned DataFrames (or DataFrame / Series combination) would ignore the indices, only match the inputs by shape, and use the index/columns of the first DataFrame for the result:
>>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1])
... df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2])
>>> df1
a b
0 1 3
1 2 4
>>> df2
a b
1 1 3
2 2 4
>>> np.add(df1, df2)
a b
0 2 6
1 4 8
This contrasts with how other pandas operations work, which first align the inputs:
>>> df1 + df2
a b
0 NaN NaN
1 3.0 7.0
2 NaN NaN
In pandas 1.2.0, we refactored how NumPy ufuncs are called on DataFrames, and this started to align the inputs first (GH39184), as happens in other pandas operations and as it happens for ufuncs called on Series objects.
For pandas 1.2.1, we restored the previous behaviour to avoid a breaking
change, but the above example of np.add(df1, df2) with non-aligned inputs
will now to raise a warning, and a future pandas 2.0 release will start
aligning the inputs first (GH39184). Calling a NumPy ufunc on Series
objects (eg np.add(s1, s2)) already aligns and continues to do so.
To avoid the warning and keep the current behaviour of ignoring the indices, convert one of the arguments to a NumPy array:
>>> np.add(df1, np.asarray(df2))
a b
0 2 6
1 4 8
To obtain the future behaviour and silence the warning, you can align manually before passing the arguments to the ufunc:
>>> df1, df2 = df1.align(df2)
>>> np.add(df1, df2)
a b
0 NaN NaN
1 3.0 7.0
2 NaN NaN
Bug fixes¶
Bug in
read_csv()withfloat_precision="high"caused segfault or wrong parsing of long exponent strings. This resulted in a regression in some cases as the default forfloat_precisionwas changed in pandas 1.2.0 (GH38753)Bug in
read_csv()not closing an opened file handle when acsv.ErrororUnicodeDecodeErroroccurred while initializing (GH39024)Bug in
pandas.testing.assert_index_equal()raisingTypeErrorwithcheck_order=FalsewhenIndexhas mixed dtype (GH39168)
Other¶
The deprecated attributes
_AXIS_NAMESand_AXIS_NUMBERSofDataFrameandSerieswill no longer show up indirorinspect.getmemberscalls (GH38740)Bumped minimum fastparquet version to 0.4.0 to avoid
AttributeErrorfrom numba (GH38344)Bumped minimum pymysql version to 0.8.1 to avoid test failures (GH38344)
Fixed build failure on MacOS 11 in Python 3.9.1 (GH38766)
Added reference to backwards incompatible
check_freqarg oftesting.assert_frame_equal()andtesting.assert_series_equal()in pandas 1.1.0 whats new (GH34050)
Contributors¶
A total of 20 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Ada Draginda +
Andrew Wieteska
Bryan Cutler
Fangchen Li
Joris Van den Bossche
Matthew Roeschke
Matthew Zeitlin +
MeeseeksMachine
Micael Jarniac
Omar Afifi +
Pandas Development Team
Richard Shadrach
Simon Hawkins
Terji Petersen
Torsten Wörtwein
WANG Aiyong
jbrockmendel
kylekeppler
mzeitlin11
patrick