These are the changes in pandas 1.2.1. See Release notes for a full changelog including other versions of pandas.
Fixed regression in to_csv() that created corrupted zip files when there were more rows than chunksize (GH38714)
to_csv()
chunksize
Fixed regression in to_csv() opening codecs.StreamReaderWriter in binary mode instead of in text mode (GH39247)
codecs.StreamReaderWriter
Fixed regression in read_csv() and other read functions were the encoding error policy (errors) did not default to "replace" when no encoding was specified (GH38989)
read_csv()
errors
"replace"
Fixed regression in read_excel() with non-rawbyte file handles (GH38788)
read_excel()
Fixed regression in DataFrame.to_stata() not removing the created file when an error occured (GH39202)
DataFrame.to_stata()
Fixed regression in DataFrame.__setitem__ raising ValueError when expanding DataFrame and new column is from type "0 - name" (GH39010)
DataFrame.__setitem__
ValueError
DataFrame
"0 - name"
Fixed regression in setting with DataFrame.loc() raising ValueError when DataFrame has unsorted MultiIndex columns and indexer is a scalar (GH38601)
DataFrame.loc()
MultiIndex
Fixed regression in setting with DataFrame.loc() raising KeyError with MultiIndex and list-like columns indexer enlarging DataFrame (GH39147)
KeyError
Fixed regression in groupby() with Categorical grouping column not showing unused categories for grouped.indices (GH38642)
groupby()
Categorical
grouped.indices
Fixed regression in GroupBy.sem() where the presence of non-numeric columns would cause an error instead of being dropped (GH38774)
GroupBy.sem()
Fixed regression in DataFrameGroupBy.diff() raising for int8 and int16 columns (GH39050)
DataFrameGroupBy.diff()
int8
int16
Fixed regression in DataFrame.groupby() when aggregating an ExtensionDType that could fail for non-numeric values (GH38980)
DataFrame.groupby()
ExtensionDType
Fixed regression in Rolling.skew() and Rolling.kurt() modifying the object inplace (GH38908)
Rolling.skew()
Rolling.kurt()
Fixed regression in DataFrame.any() and DataFrame.all() not returning a result for tz-aware datetime64 columns (GH38723)
DataFrame.any()
DataFrame.all()
datetime64
Fixed regression in DataFrame.apply() with axis=1 using str accessor in apply function (GH38979)
DataFrame.apply()
axis=1
Fixed regression in DataFrame.replace() raising ValueError when DataFrame has dtype bytes (GH38900)
DataFrame.replace()
bytes
Fixed regression in Series.fillna() that raised RecursionError with datetime64[ns, UTC] dtype (GH38851)
Series.fillna()
RecursionError
datetime64[ns, UTC]
Fixed regression in comparisons between NaT and datetime.date objects incorrectly returning True (GH39151)
NaT
datetime.date
True
Fixed regression in calling NumPy accumulate() ufuncs on DataFrames, e.g. np.maximum.accumulate(df) (GH39259)
accumulate()
np.maximum.accumulate(df)
Fixed regression in repr of float-like strings of an object dtype having trailing 0’s truncated after the decimal (GH38708)
object
Fixed regression that raised AttributeError with PyArrow versions [0.16.0, 1.0.0) (GH38801)
AttributeError
Fixed regression in pandas.testing.assert_frame_equal() raising TypeError with check_like=True when Index or columns have mixed dtype (GH39168)
pandas.testing.assert_frame_equal()
TypeError
check_like=True
Index
We have reverted a commit that resulted in several plotting related regressions in pandas 1.2.0 (GH38969, GH38736, GH38865, GH38947 and GH39126). As a result, bugs reported as fixed in pandas 1.2.0 related to inconsistent tick labeling in bar plots are again present (GH26186 and GH11465)
Before pandas 1.2.0, calling a NumPy ufunc on non-aligned DataFrames (or DataFrame / Series combination) would ignore the indices, only match the inputs by shape, and use the index/columns of the first DataFrame for the result:
>>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1]) ... df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2]) >>> df1 a b 0 1 3 1 2 4 >>> df2 a b 1 1 3 2 2 4 >>> np.add(df1, df2) a b 0 2 6 1 4 8
This contrasts with how other pandas operations work, which first align the inputs:
>>> df1 + df2 a b 0 NaN NaN 1 3.0 7.0 2 NaN NaN
In pandas 1.2.0, we refactored how NumPy ufuncs are called on DataFrames, and this started to align the inputs first (GH39184), as happens in other pandas operations and as it happens for ufuncs called on Series objects.
For pandas 1.2.1, we restored the previous behaviour to avoid a breaking change, but the above example of np.add(df1, df2) with non-aligned inputs will now to raise a warning, and a future pandas 2.0 release will start aligning the inputs first (GH39184). Calling a NumPy ufunc on Series objects (eg np.add(s1, s2)) already aligns and continues to do so.
np.add(df1, df2)
np.add(s1, s2)
To avoid the warning and keep the current behaviour of ignoring the indices, convert one of the arguments to a NumPy array:
>>> np.add(df1, np.asarray(df2)) a b 0 2 6 1 4 8
To obtain the future behaviour and silence the warning, you can align manually before passing the arguments to the ufunc:
>>> df1, df2 = df1.align(df2) >>> np.add(df1, df2) a b 0 NaN NaN 1 3.0 7.0 2 NaN NaN
Bug in read_csv() with float_precision="high" caused segfault or wrong parsing of long exponent strings. This resulted in a regression in some cases as the default for float_precision was changed in pandas 1.2.0 (GH38753)
float_precision="high"
float_precision
Bug in read_csv() not closing an opened file handle when a csv.Error or UnicodeDecodeError occurred while initializing (GH39024)
csv.Error
UnicodeDecodeError
Bug in pandas.testing.assert_index_equal() raising TypeError with check_order=False when Index has mixed dtype (GH39168)
pandas.testing.assert_index_equal()
check_order=False
The deprecated attributes _AXIS_NAMES and _AXIS_NUMBERS of DataFrame and Series will no longer show up in dir or inspect.getmembers calls (GH38740)
_AXIS_NAMES
_AXIS_NUMBERS
Series
dir
inspect.getmembers
Bumped minimum fastparquet version to 0.4.0 to avoid AttributeError from numba (GH38344)
Bumped minimum pymysql version to 0.8.1 to avoid test failures (GH38344)
Fixed build failure on MacOS 11 in Python 3.9.1 (GH38766)
Added reference to backwards incompatible check_freq arg of testing.assert_frame_equal() and testing.assert_series_equal() in pandas 1.1.0 whats new (GH34050)
check_freq
testing.assert_frame_equal()
testing.assert_series_equal()
A total of 20 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Ada Draginda +
Andrew Wieteska
Bryan Cutler
Fangchen Li
Joris Van den Bossche
Matthew Roeschke
Matthew Zeitlin +
MeeseeksMachine
Micael Jarniac
Omar Afifi +
Pandas Development Team
Richard Shadrach
Simon Hawkins
Terji Petersen
Torsten Wörtwein
WANG Aiyong
jbrockmendel
kylekeppler
mzeitlin11
patrick