These are the changes in pandas 1.3.0. See Release notes for a full changelog including other versions of pandas.
Warning
When reading new Excel 2007+ (.xlsx) files, the default argument engine=None to read_excel() will now result in using the openpyxl engine in all cases when the option io.excel.xlsx.reader is set to "auto". Previously, some cases would use the xlrd engine instead. See What’s new 1.2.0 for background on this change.
.xlsx
engine=None
read_excel()
io.excel.xlsx.reader
"auto"
When reading from a remote URL that is not handled by fsspec (ie. HTTP and HTTPS) the dictionary passed to storage_options will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (GH36688). For example:
storage_options
In [1]: headers = {"User-Agent": "pandas"} In [2]: df = pd.read_csv( ...: "https://download.bls.gov/pub/time.series/cu/cu.item", ...: sep="\t", ...: storage_options=headers ...: ) ...:
We added I/O support to read and render shallow versions of XML documents with pandas.read_xml() and DataFrame.to_xml(). Using lxml as parser, both XPath 1.0 and XSLT 1.0 is available. (GH27554)
pandas.read_xml()
DataFrame.to_xml()
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?> ...: <data> ...: <row> ...: <shape>square</shape> ...: <degrees>360</degrees> ...: <sides>4.0</sides> ...: </row> ...: <row> ...: <shape>circle</shape> ...: <degrees>360</degrees> ...: <sides/> ...: </row> ...: <row> ...: <shape>triangle</shape> ...: <degrees>180</degrees> ...: <sides>3.0</sides> ...: </row> ...: </data>""" In [2]: df = pd.read_xml(xml) In [3]: df Out[3]: shape degrees sides 0 square 360 4.0 1 circle 360 NaN 2 triangle 180 3.0 In [4]: df.to_xml() Out[4]: <?xml version='1.0' encoding='utf-8'?> <data> <row> <index>0</index> <shape>square</shape> <degrees>360</degrees> <sides>4.0</sides> </row> <row> <index>1</index> <shape>circle</shape> <degrees>360</degrees> <sides/> </row> <row> <index>2</index> <shape>triangle</shape> <degrees>180</degrees> <sides>3.0</sides> </row> </data>
For more, see Writing XML in the user guide on IO tools.
Rolling and Expanding now support a method argument with a 'table' option that performs the windowing operation over an entire DataFrame. See ref:window.overview for performance and functional benefits (GH15095, GH38995)
Rolling
Expanding
method
'table'
DataFrame
Added MultiIndex.dtypes() (GH37062)
MultiIndex.dtypes()
Added end and end_day options for origin in DataFrame.resample() (GH37804)
end
end_day
origin
DataFrame.resample()
Improve error message when usecols and names do not match for read_csv() and engine="c" (GH29042)
usecols
names
read_csv()
engine="c"
Improved consistency of error message when passing an invalid win_type argument in Window (GH15969)
win_type
Window
pandas.read_sql_query() now accepts a dtype argument to cast the columnar data from the SQL database based on user input (GH10285)
pandas.read_sql_query()
dtype
Improved integer type mapping from pandas to SQLAlchemy when using DataFrame.to_sql() (GH35076)
DataFrame.to_sql()
to_numeric() now supports downcasting of nullable ExtensionDtype objects (GH33013)
to_numeric()
ExtensionDtype
Add support for dict-like names in MultiIndex.set_names and MultiIndex.rename (GH20421)
MultiIndex.set_names
MultiIndex.rename
pandas.read_excel() can now auto detect .xlsb files (GH35416)
pandas.read_excel()
Rolling.sum(), Expanding.sum(), Rolling.mean(), Expanding.mean(), Rolling.median(), Expanding.median(), Rolling.max(), Expanding.max(), Rolling.min(), and Expanding.min() now support Numba execution with the engine keyword (GH38895)
Rolling.sum()
Expanding.sum()
Rolling.mean()
Expanding.mean()
Rolling.median()
Expanding.median()
Rolling.max()
Expanding.max()
Rolling.min()
Expanding.min()
Numba
engine
DataFrame.apply() can now accept NumPy unary operators as strings, e.g. df.apply("sqrt"), which was already the case for Series.apply() (GH39116)
DataFrame.apply()
df.apply("sqrt")
Series.apply()
DataFrame.apply() can now accept non-callable DataFrame properties as strings, e.g. df.apply("size"), which was already the case for Series.apply() (GH39116)
df.apply("size")
Disallow DataFrame indexer for iloc for Series.__getitem__() and DataFrame.__getitem__(), (GH39004)
iloc
Series.__getitem__()
DataFrame.__getitem__()
Series.apply() can now accept list-like or dictionary-like arguments that aren’t lists or dictionaries, e.g. ser.apply(np.array(["sum", "mean"])), which was already the case for DataFrame.apply() (GH39140)
ser.apply(np.array(["sum", "mean"]))
DataFrame.plot.scatter() can now accept a categorical column as the argument to c (GH12380, GH31357)
DataFrame.plot.scatter()
c
Styler.set_tooltips() allows on hover tooltips to be added to styled HTML dataframes (GH35643, GH21266, GH39317, GH39708)
Styler.set_tooltips()
Styler.set_tooltips_class() and Styler.set_table_styles() amended to optionally allow certain css-string input arguments (GH39564)
Styler.set_tooltips_class()
Styler.set_table_styles()
Styler.apply() now more consistently accepts ndarray function returns, i.e. in all cases for axis is 0, 1 or None (GH39359)
Styler.apply()
axis
0, 1 or None
Styler.apply() and Styler.applymap() now raise errors if wrong format CSS is passed on render (GH39660)
Styler.applymap()
Series.loc.__getitem__() and Series.loc.__setitem__() with MultiIndex now raising helpful error message when indexer has too many dimensions (GH35349)
Series.loc.__getitem__()
Series.loc.__setitem__()
MultiIndex
pandas.read_stata() and StataReader support reading data from compressed files.
pandas.read_stata()
StataReader
Add support for parsing ISO 8601-like timestamps with negative signs to pandas.Timedelta() (GH37172)
ISO 8601
pandas.Timedelta()
Add support for unary operators in FloatingArray (GH38749)
FloatingArray
These are bug fixes that might have notable behavior changes.
combine_first()
combine_first() will now preserve dtypes (GH7509)
In [3]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2]) In [4]: df1 Out[4]: A B 0 1 1 1 2 2 2 3 3 In [5]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4]) In [6]: df2 Out[6]: B C 2 4 1 3 5 2 4 6 3 In [7]: combined = df1.combine_first(df2)
pandas 1.2.x
In [1]: combined.dtypes Out[2]: A float64 B float64 C float64 dtype: object
pandas 1.3.0
In [8]: combined.dtypes Out[8]: A float64 B int64 C float64 dtype: object
loc
When setting an entire column using loc or iloc, pandas will try to insert the values into the existing data rather than create an entirely new array.
In [9]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64") In [10]: values = df.values In [11]: new = np.array([5, 6, 7], dtype="int64") In [12]: df.loc[[0, 1, 2], "A"] = new
In both the new and old behavior, the data in values is overwritten, but in the old behavior the dtype of df["A"] changed to int64.
values
df["A"]
int64
In [1]: df.dtypes Out[1]: A int64 dtype: object In [2]: np.shares_memory(df["A"].values, new) Out[2]: False In [3]: np.shares_memory(df["A"].values, values) Out[3]: False
In pandas 1.3.0, df continues to share data with values
df
In [13]: df.dtypes Out[13]: A float64 dtype: object In [14]: np.shares_memory(df["A"], new) Out[14]: False In [15]: np.shares_memory(df["A"], values) Out[15]: True
Setting non-boolean values into a Series with ``dtype=bool` consistently cast to dtype=object (GH38709)
Series with ``dtype=bool`
dtype=object
In [16]: orig = pd.Series([True, False]) In [17]: ser = orig.copy() In [18]: ser.iloc[1] = np.nan In [19]: ser2 = orig.copy() In [20]: ser2.iloc[1] = 2.0
In [1]: ser Out [1]: 0 1.0 1 NaN dtype: float64 In [2]:ser2 Out [2]: 0 True 1 2.0 dtype: object
In [21]: ser Out[21]: 0 True 1 NaN dtype: object In [22]: ser2 Out[22]: 0 True 1 2.0 dtype: object
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package
Minimum Version
Required
Changed
numpy
1.16.5
X
pytz
2017.3
python-dateutil
2.7.3
bottleneck
1.2.1
numexpr
2.6.8
pytest (dev)
5.0.1
mypy (dev)
0.800
setuptools
38.6.0
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
beautifulsoup4
4.6.0
fastparquet
0.3.2
fsspec
0.7.4
gcsfs
0.6.0
lxml
4.3.0
matplotlib
2.2.3
numba
0.46.0
openpyxl
3.0.0
pyarrow
0.15.0
pymysql
0.7.11
pytables
3.5.1
s3fs
0.4.0
scipy
1.2.0
sqlalchemy
1.2.8
tabulate
0.8.7
xarray
0.12.0
xlrd
xlsxwriter
1.0.2
xlwt
1.3.0
pandas-gbq
See Dependencies and Optional dependencies for more.
Partially initialized CategoricalDtype (i.e. those with categories=None objects will no longer compare as equal to fully initialized dtype objects.
CategoricalDtype
categories=None
Accessing _constructor_expanddim on a DataFrame and _constructor_sliced on a Series now raise an AttributeError. Previously a NotImplementedError was raised (GH38782)
_constructor_expanddim
_constructor_sliced
Series
AttributeError
NotImplementedError
Deprecated allowing scalars to be passed to the Categorical constructor (GH38433)
Categorical
Deprecated allowing subclass-specific keyword arguments in the Index constructor, use the specific subclass directly instead (GH14093, GH21311, GH22315, GH26974)
Index
Deprecated astype of datetimelike (timedelta64[ns], datetime64[ns], Datetime64TZDtype, PeriodDtype) to integer dtypes, use values.view(...) instead (GH38544)
astype
timedelta64[ns]
datetime64[ns]
Datetime64TZDtype
PeriodDtype
values.view(...)
Deprecated MultiIndex.is_lexsorted() and MultiIndex.lexsort_depth(), use MultiIndex.is_monotonic_increasing() instead (GH32259)
MultiIndex.is_lexsorted()
MultiIndex.lexsort_depth()
MultiIndex.is_monotonic_increasing()
Deprecated keyword try_cast in Series.where(), Series.mask(), DataFrame.where(), DataFrame.mask(); cast results manually if desired (GH38836)
try_cast
Series.where()
Series.mask()
DataFrame.where()
DataFrame.mask()
Deprecated comparison of Timestamp object with datetime.date objects. Instead of e.g. ts <= mydate use ts <= pd.Timestamp(mydate) or ts.date() <= mydate (GH36131)
Timestamp
datetime.date
ts <= mydate
ts <= pd.Timestamp(mydate)
ts.date() <= mydate
Deprecated Rolling.win_type returning "freq" (GH38963)
Rolling.win_type
"freq"
Deprecated Rolling.is_datetimelike (GH38963)
Rolling.is_datetimelike
Deprecated DataFrame indexer for Series.__setitem__() and DataFrame.__setitem__() (GH39004)
Series.__setitem__()
DataFrame.__setitem__()
Deprecated core.window.ewm.ExponentialMovingWindow.vol() (GH39220)
core.window.ewm.ExponentialMovingWindow.vol()
Using .astype to convert between datetime64[ns] dtype and DatetimeTZDtype is deprecated and will raise in a future version, use obj.tz_localize or obj.dt.tz_localize instead (GH38622)
.astype
DatetimeTZDtype
obj.tz_localize
obj.dt.tz_localize
Deprecated casting datetime.date objects to datetime64 when used as fill_value in DataFrame.unstack(), DataFrame.shift(), Series.shift(), and DataFrame.reindex(), pass pd.Timestamp(dateobj) instead (GH39767)
datetime64
fill_value
DataFrame.unstack()
DataFrame.shift()
Series.shift()
DataFrame.reindex()
pd.Timestamp(dateobj)
Deprecated allowing partial failure in Series.transform() and DataFrame.transform() when func is list-like or dict-like; will raise if any function fails on a column in a future version (GH40211)
Series.transform()
DataFrame.transform()
func
Performance improvement in IntervalIndex.isin() (GH38353)
IntervalIndex.isin()
Performance improvement in Series.mean() for nullable data types (GH34814)
Series.mean()
Performance improvement in Series.isin() for nullable data types (GH38340)
Series.isin()
Performance improvement in DataFrame.corr() for method=kendall (GH28329)
DataFrame.corr()
Performance improvement in core.window.rolling.Rolling.corr() and core.window.rolling.Rolling.cov() (GH39388)
core.window.rolling.Rolling.corr()
core.window.rolling.Rolling.cov()
Performance improvement in core.window.rolling.RollingGroupby.corr(), core.window.expanding.ExpandingGroupby.corr(), core.window.expanding.ExpandingGroupby.corr() and core.window.expanding.ExpandingGroupby.cov() (GH39591)
core.window.rolling.RollingGroupby.corr()
core.window.expanding.ExpandingGroupby.corr()
core.window.expanding.ExpandingGroupby.cov()
Performance improvement in unique() for object data type (GH37615)
unique()
Performance improvement in pd.json_normalize() for basic cases (including seperators) (GH40035 GH15621)
pd.json_normalize()
Performance improvement in core.window.rolling.ExpandingGroupby aggregation methods (GH39664)
core.window.rolling.ExpandingGroupby
Performance improvement in Styler where render times are more than 50% reduced (GH39972 GH39952)
Styler
Performance improvement in core.window.ewm.ExponentialMovingWindow.mean() with times (GH39784)
core.window.ewm.ExponentialMovingWindow.mean()
times
Performance improvement in GroupBy.apply() when requiring the python fallback implementation (GH40176)
GroupBy.apply()
Bug in CategoricalIndex incorrectly failing to raise TypeError when scalar data is passed (GH38614)
CategoricalIndex
TypeError
Bug in CategoricalIndex.reindex failed when Index passed with elements all in category (GH28690)
CategoricalIndex.reindex
Bug where constructing a Categorical from an object-dtype array of date objects did not round-trip correctly with astype (GH38552)
date
Bug in constructing a DataFrame from an ndarray and a CategoricalDtype (GH38857)
ndarray
Bug in DataFrame.reindex() was throwing IndexError when new index contained duplicates and old index was CategoricalIndex (GH38906)
IndexError
Bug in setting categorical values into an object-dtype column in a DataFrame (GH39136)
Bug in DataFrame.reindex() was raising IndexError when new index contained duplicates and old index was CategoricalIndex (GH38906)
Bug in DataFrame and Series constructors sometimes dropping nanoseconds from Timestamp (resp. Timedelta) data, with dtype=datetime64[ns] (resp. timedelta64[ns]) (GH38032)
Timedelta
data
dtype=datetime64[ns]
Bug in DataFrame.first() and Series.first() returning two months for offset one month when first day is last calendar day (GH29623)
DataFrame.first()
Series.first()
Bug in constructing a DataFrame or Series with mismatched datetime64 data and timedelta64 dtype, or vice-versa, failing to raise TypeError (GH38575, GH38764, GH38792)
timedelta64
Bug in constructing a Series or DataFrame with a datetime object out of bounds for datetime64[ns] dtype or a timedelta object out of bounds for timedelta64[ns] dtype (GH38792, GH38965)
datetime
timedelta
Bug in DatetimeIndex.intersection(), DatetimeIndex.symmetric_difference(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference() always returning object-dtype when operating with CategoricalIndex (GH38741)
DatetimeIndex.intersection()
DatetimeIndex.symmetric_difference()
PeriodIndex.intersection()
PeriodIndex.symmetric_difference()
Bug in Series.where() incorrectly casting datetime64 values to int64 (GH37682)
Bug in Categorical incorrectly typecasting datetime object to Timestamp (GH38878)
Bug in comparisons between Timestamp object and datetime64 objects just outside the implementation bounds for nanosecond datetime64 (GH39221)
Bug in Timestamp.round(), Timestamp.floor(), Timestamp.ceil() for values near the implementation bounds of Timestamp (GH39244)
Timestamp.round()
Timestamp.floor()
Timestamp.ceil()
Bug in Timedelta.round(), Timedelta.floor(), Timedelta.ceil() for values near the implementation bounds of Timedelta (GH38964)
Timedelta.round()
Timedelta.floor()
Timedelta.ceil()
Bug in date_range() incorrectly creating DatetimeIndex containing NaT instead of raising OutOfBoundsDatetime in corner cases (GH24124)
date_range()
DatetimeIndex
NaT
OutOfBoundsDatetime
Bug in infer_freq() incorrectly fails to infer ‘H’ frequency of DatetimeIndex if the latter has a timezone and crosses DST boundaries (GH39556)
infer_freq()
Bug in constructing Timedelta from np.timedelta64 objects with non-nanosecond units that are out of bounds for timedelta64[ns] (GH38965)
np.timedelta64
Bug in constructing a TimedeltaIndex incorrectly accepting np.datetime64("NaT") objects (GH39462)
TimedeltaIndex
np.datetime64("NaT")
Bug in constructing Timedelta from input string with only symbols and no digits failed to raise an error (GH39710)
Bug in TimedeltaIndex and to_timedelta() failing to raise when passed non-nanosecond timedelta64 arrays that overflow when converting to timedelta64[ns] (GH40008)
to_timedelta()
Bug in different tzinfo objects representing UTC not being treated as equivalent (GH39216)
tzinfo
Bug in dateutil.tz.gettz("UTC") not being recognized as equivalent to other UTC-representing tzinfos (GH39276)
dateutil.tz.gettz("UTC")
Bug in DataFrame.quantile(), DataFrame.sort_values() causing incorrect subsequent indexing behavior (GH38351)
DataFrame.quantile()
DataFrame.sort_values()
Bug in DataFrame.select_dtypes() with include=np.number now retains numeric ExtensionDtype columns (GH35340)
DataFrame.select_dtypes()
include=np.number
Bug in DataFrame.mode() and Series.mode() not keeping consistent integer Index for empty input (GH33321)
DataFrame.mode()
Series.mode()
Bug in DataFrame.rank() with np.inf and mixture of np.nan and np.inf (GH32593)
DataFrame.rank()
np.inf
np.nan
Bug in DataFrame.rank() with axis=0 and columns holding incomparable types raising IndexError (GH38932)
axis=0
Bug in select_dtypes() different behavior between Windows and Linux with include="int" (GH36569)
select_dtypes()
include="int"
Bug in DataFrame.apply() and DataFrame.agg() when passed argument func="size" would operate on the entire DataFrame instead of rows or columns (GH39934)
DataFrame.agg()
func="size"
Bug in DataFrame.transform() would raise SpecificationError when passed a dictionary and columns were missing; will now raise a KeyError instead (GH40004)
SpecificationError
KeyError
Bug in Series.to_dict() with orient='records' now returns python native types (GH25969)
Series.to_dict()
orient='records'
Bug in Series.view() and Index.view() when converting between datetime-like (datetime64[ns], datetime64[ns, tz], timedelta64, period) dtypes (GH39788)
Series.view()
Index.view()
datetime64[ns, tz]
period
Bug in creating a DataFrame from an empty np.recarray not retaining the original dtypes (GH40121)
np.recarray
Bug in DataFrame failing to raise TypeError when constructing from a frozenset (GH40163)
frozenset
Bug in IntervalIndex.intersection() and IntervalIndex.symmetric_difference() always returning object-dtype when operating with CategoricalIndex (GH38653, GH38741)
IntervalIndex.intersection()
IntervalIndex.symmetric_difference()
Bug in IntervalIndex.intersection() returning duplicates when at least one of both Indexes has duplicates which are present in the other (GH38743)
IntervalIndex.union(), IntervalIndex.intersection(), IntervalIndex.difference(), and IntervalIndex.symmetric_difference() now cast to the appropriate dtype instead of raising TypeError when operating with another IntervalIndex with incompatible dtype (GH39267)
IntervalIndex.union()
IntervalIndex.difference()
IntervalIndex
PeriodIndex.union(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference(), PeriodIndex.difference() now cast to object dtype instead of raising IncompatibleFrequency when opearting with another PeriodIndex with incompatible dtype (GH??)
PeriodIndex.union()
PeriodIndex.difference()
IncompatibleFrequency
PeriodIndex
Bug in Index.union() dropping duplicate Index values when Index was not monotonic or sort was set to False (GH36289, GH31326)
Index.union()
sort
False
Bug in CategoricalIndex.get_indexer() failing to raise InvalidIndexError when non-unique (GH38372)
CategoricalIndex.get_indexer()
InvalidIndexError
Bug in inserting many new columns into a DataFrame causing incorrect subsequent indexing behavior (GH38380)
Bug in DataFrame.__setitem__() raising ValueError when setting multiple values to duplicate columns (GH15695)
ValueError
Bug in DataFrame.loc(), Series.loc(), DataFrame.__getitem__() and Series.__getitem__() returning incorrect elements for non-monotonic DatetimeIndex for string slices (GH33146)
DataFrame.loc()
Series.loc()
Bug in DataFrame.reindex() and Series.reindex() with timezone aware indexes raising TypeError for method="ffill" and method="bfill" and specified tolerance (GH38566)
Series.reindex()
method="ffill"
method="bfill"
tolerance
Bug in DataFrame.reindex() with datetime64[ns] or timedelta64[ns] incorrectly casting to integers when the fill_value requires casting to object dtype (GH39755)
Bug in DataFrame.__setitem__() raising ValueError with empty DataFrame and specified columns for string indexer and non empty DataFrame to set (GH38831)
Bug in DataFrame.loc.__setitem__() raising ValueError when expanding unique column for DataFrame with duplicate columns (GH38521)
DataFrame.loc.__setitem__()
Bug in DataFrame.iloc.__setitem__() and DataFrame.loc.__setitem__() with mixed dtypes when setting with a dictionary value (GH38335)
DataFrame.iloc.__setitem__()
Bug in DataFrame.__setitem__() not raising ValueError when right hand side is a DataFrame with wrong number of columns (GH38604)
Bug in Series.__setitem__() raising ValueError when setting a Series with a scalar indexer (GH38303)
Bug in DataFrame.loc() dropping levels of MultiIndex when DataFrame used as input has only one row (GH10521)
Bug in DataFrame.__getitem__() and Series.__getitem__() always raising KeyError when slicing with existing strings an Index with milliseconds (GH33589)
Bug in setting timedelta64 or datetime64 values into numeric Series failing to cast to object dtype (GH39086, issue:39619)
Bug in setting Interval values into a Series or DataFrame with mismatched IntervalDtype incorrectly casting the new values to the existing dtype (GH39120)
Interval
IntervalDtype
Bug in setting datetime64 values into a Series with integer-dtype incorrect casting the datetime64 values to integers (GH39266)
Bug in setting np.datetime64("NaT") into a Series with Datetime64TZDtype incorrectly treating the timezone-naive value as timezone-aware (GH39769)
Bug in Index.get_loc() not raising KeyError when method is specified for NaN value when NaN is not in Index (GH39382)
Index.get_loc()
NaN
Bug in DatetimeIndex.insert() when inserting np.datetime64("NaT") into a timezone-aware index incorrectly treating the timezone-naive value as timezone-aware (GH39769)
DatetimeIndex.insert()
Bug in incorrectly raising in Index.insert(), when setting a new column that cannot be held in the existing frame.columns, or in Series.reset_index() or DataFrame.reset_index() instead of casting to a compatible dtype (GH39068)
Index.insert()
frame.columns
Series.reset_index()
DataFrame.reset_index()
Bug in RangeIndex.append() where a single object of length 1 was concatenated incorrectly (GH39401)
RangeIndex.append()
Bug in setting numpy.timedelta64 values into an object-dtype Series using a boolean indexer (GH39488)
numpy.timedelta64
Bug in setting numeric values into a into a boolean-dtypes Series using at or iat failing to cast to object-dtype (GH39582)
at
iat
Bug in DataFrame.loc.__setitem__() when setting-with-expansion incorrectly raising when the index in the expanding axis contains duplicates (GH40096)
Bug in Grouper now correctly propagates dropna argument and DataFrameGroupBy.transform() now correctly handles missing values for dropna=True (GH35612)
Grouper
dropna
DataFrameGroupBy.transform()
dropna=True
Bug in isna(), and Series.isna(), Index.isna(), DataFrame.isna() (and the corresponding notna functions) not recognizing Decimal("NaN") objects (GH39409)
isna()
Series.isna()
Index.isna()
DataFrame.isna()
notna
Decimal("NaN")
Bug in DataFrame.drop() raising TypeError when MultiIndex is non-unique and level is not provided (GH36293)
DataFrame.drop()
level
Bug in MultiIndex.intersection() duplicating NaN in result (GH38623)
MultiIndex.intersection()
Bug in MultiIndex.equals() incorrectly returning True when MultiIndex containing NaN even when they are differently ordered (GH38439)
MultiIndex.equals()
True
Bug in MultiIndex.intersection() always returning empty when intersecting with CategoricalIndex (GH38653)
Bug in Index.__repr__() when display.max_seq_items=1 (GH38415)
Index.__repr__()
display.max_seq_items=1
Bug in read_csv() not recognizing scientific notation if decimal is set for engine="python" (GH31920)
engine="python"
Bug in read_csv() interpreting NA value as comment, when NA does contain the comment string fixed for engine="python" (GH34002)
NA
Bug in read_csv() raising IndexError with multiple header columns and index_col specified when file has no data rows (GH38292)
index_col
Bug in read_csv() not accepting usecols with different length than names for engine="python" (GH16469)
Bug in read_csv() returning object dtype when delimiter="," with usecols and parse_dates specified for engine="python" (GH35873)
delimiter=","
parse_dates
Bug in read_csv() raising TypeError when names and parse_dates is specified for engine="c" (GH33699)
Bug in read_clipboard(), DataFrame.to_clipboard() not working in WSL (GH38527)
read_clipboard()
DataFrame.to_clipboard()
Allow custom error values for parse_dates argument of read_sql(), read_sql_query() and read_sql_table() (GH35185)
read_sql()
read_sql_query()
read_sql_table()
Bug in to_hdf() raising KeyError when trying to apply for subclasses of DataFrame or Series (GH33748)
to_hdf()
Bug in put() raising a wrong TypeError when saving a DataFrame with non-string dtype (GH34274)
put()
Bug in json_normalize() resulting in the first element of a generator object not being included in the returned DataFrame (GH35923)
json_normalize()
Bug in read_csv() apllying thousands separator to date columns when column should be parsed for dates and usecols is specified for engine="python" (GH39365)
Bug in read_excel() forward filling MultiIndex names with multiple header and index columns specified (GH34673)
read_excel() now respects set_option() (GH34252)
set_option()
Bug in read_csv() not switching true_values and false_values for nullable boolean dtype (GH34655)
true_values
false_values
boolean
Bug in read_json() when orient="split" does not maintain numeric string index (GH28556)
read_json()
orient="split"
read_sql() returned an empty generator if chunksize was no-zero and the query returned no results. Now returns a generator with a single empty dataframe (GH34411)
chunksize
Bug in read_hdf() returning unexpected records when filtering on categorical string columns using where parameter (GH39189)
read_hdf()
where
Bug in read_sas() raising ValueError when datetimes were null (GH39725)
read_sas()
datetimes
Comparisons of Period objects or Index, Series, or DataFrame with mismatched PeriodDtype now behave like other mismatched-type comparisons, returning False for equals, True for not-equal, and raising TypeError for inequality checks (GH39274)
Period
Bug in scatter_matrix() raising when 2d ax argument passed (GH16253)
scatter_matrix()
ax
Prevent warnings when matplotlib’s constrained_layout is enabled (GH25261)
constrained_layout
Bug in DataFrameGroupBy.agg() and SeriesGroupBy.agg() with PeriodDtype columns incorrectly casting results too aggressively (GH38254)
DataFrameGroupBy.agg()
SeriesGroupBy.agg()
Bug in SeriesGroupBy.value_counts() where unobserved categories in a grouped categorical series were not tallied (GH38672)
SeriesGroupBy.value_counts()
Bug in SeriesGroupBy.value_counts() where error was raised on an empty series (GH39172)
Bug in GroupBy.indices() would contain non-existent indices when null values were present in the groupby keys (GH9304)
GroupBy.indices()
Fixed bug in DataFrameGroupBy.sum() and SeriesGroupBy.sum() causing loss of precision through using Kahan summation (GH38778)
DataFrameGroupBy.sum()
SeriesGroupBy.sum()
Fixed bug in DataFrameGroupBy.cumsum(), SeriesGroupBy.cumsum(), DataFrameGroupBy.mean() and SeriesGroupBy.mean() causing loss of precision through using Kahan summation (GH38934)
DataFrameGroupBy.cumsum()
SeriesGroupBy.cumsum()
DataFrameGroupBy.mean()
SeriesGroupBy.mean()
Bug in Resampler.aggregate() and DataFrame.transform() raising TypeError instead of SpecificationError when missing keys had mixed dtypes (GH39025)
Resampler.aggregate()
Bug in DataFrameGroupBy.idxmin() and DataFrameGroupBy.idxmax() with ExtensionDtype columns (GH38733)
DataFrameGroupBy.idxmin()
DataFrameGroupBy.idxmax()
Bug in Series.resample() would raise when the index was a PeriodIndex consisting of NaT (GH39227)
Series.resample()
Bug in core.window.rolling.RollingGroupby.corr() and core.window.expanding.ExpandingGroupby.corr() where the groupby column would return 0 instead of np.nan when providing other that was longer than each group (GH39591)
other
Bug in core.window.expanding.ExpandingGroupby.corr() and core.window.expanding.ExpandingGroupby.cov() where 1 would be returned instead of np.nan when providing other that was longer than each group (GH39591)
Bug in GroupBy.mean(), GroupBy.median() and DataFrame.pivot_table() not propagating metadata (GH28283)
GroupBy.mean()
GroupBy.median()
DataFrame.pivot_table()
Bug in Series.rolling() and DataFrame.rolling() not calculating window bounds correctly when window is an offset and dates are in descending order (GH40002)
Series.rolling()
DataFrame.rolling()
Bug in SeriesGroupBy and DataFrameGroupBy on an empty Series or DataFrame would lose index, columns, and/or data types when directly using the methods idxmax, idxmin, mad, min, max, sum, prod, and skew or using them through apply, aggregate, or resample (GH26411)
SeriesGroupBy
DataFrameGroupBy
idxmax
idxmin
mad
min
max
sum
prod
skew
apply
aggregate
resample
Bug in DataFrameGroupBy.apply() where a MultiIndex would be created instead of an Index if a :meth:`core.window.rolling.RollingGroupby object was created (GH39732)
DataFrameGroupBy.apply()
:meth:`core.window.rolling.RollingGroupby
Bug in DataFrameGroupBy.sample() where error was raised when weights was specified and the index was an Int64Index (GH39927)
DataFrameGroupBy.sample()
weights
Int64Index
Bug in DataFrameGroupBy.aggregate() and Resampler.aggregate() would sometimes raise SpecificationError when passed a dictionary and columns were missing; will now always raise a KeyError instead (GH40004)
DataFrameGroupBy.aggregate()
Bug in DataFrameGroupBy.sample() where column selection was not applied to sample result (GH39928)
Bug in core.window.ewm.ExponentialMovingWindow when calling __getitem__ would incorrectly raise a ValueError when providing times (GH40164)
core.window.ewm.ExponentialMovingWindow
__getitem__
Bug in core.window.ewm.ExponentialMovingWindow when calling __getitem__ would not retain com, span, alpha or halflife attributes (GH40164)
com
span
alpha
halflife
Bug in merge() raising error when performing an inner join with partial index and right_index when no overlap between indices (GH33814)
merge()
right_index
Bug in DataFrame.unstack() with missing levels led to incorrect index names (GH37510)
Bug in join() over MultiIndex returned wrong result, when one of both indexes had only one level (GH36909)
join()
merge_asof() raises ValueError instead of cryptic TypeError in case of non-numerical merge columns (GH29130)
merge_asof()
Bug in DataFrame.join() not assigning values correctly when having MultiIndex where at least one dimension is from dtype Categorical with non-alphabetically sorted categories (GH38502)
DataFrame.join()
Series.value_counts() and Series.mode() return consistent keys in original order (GH12679, GH11227 and GH39007)
Series.value_counts()
Bug in DataFrame.stack() not handling NaN in MultiIndex columns correct (GH39481)
DataFrame.stack()
Bug in DataFrame.apply() would give incorrect results when used with a string argument and axis=1 when the axis argument was not supported and now raises a ValueError instead (GH39211)
axis=1
Bug in DataFrame.sort_values() not reshaping index correctly after sorting on columns, when ignore_index=True (GH39464)
ignore_index=True
Bug in DataFrame.append() returning incorrect dtypes with combinations of ExtensionDtype dtypes (GH39454)
DataFrame.append()
Bug in DataFrame.append() returning incorrect dtypes with combinations of datetime64 and timedelta64 dtypes (GH39574)
Bug in DataFrame.pivot_table() returning a MultiIndex for a single value when operating on and empty DataFrame (GH13483)
Allow Index to be passed to the numpy.all() function (GH40180)
numpy.all()
Bug in DataFrame.sparse.to_coo() raising KeyError with columns that are a numeric Index without a 0 (GH18414)
DataFrame.sparse.to_coo()
Bug in SparseArray.astype() with copy=False producing incorrect results when going from integer dtype to floating dtype (GH34456)
SparseArray.astype()
copy=False
Bug in DataFrame.where() when other is a Series with ExtensionArray dtype (GH38729)
ExtensionArray
Fixed bug where Series.idxmax(), Series.idxmin() and argmax/min fail when the underlying data is ExtensionArray (GH32749, GH33719, GH36566)
Series.idxmax()
Series.idxmin()
argmax/min
Bug in Index constructor sometimes silently ignorning a specified dtype (GH38879)
Bug in pandas.api.types.infer_dtype() not recognizing Series, Index or array with a period dtype (GH23553)
pandas.api.types.infer_dtype()
Bug in pandas.api.types.infer_dtype() raising an error for general ExtensionArray objects. It will now return "unknown-array" instead of raising (GH37367)
"unknown-array"
Bug in constructing a Series from a list and a PandasDtype (GH39357)
PandasDtype
Bug in Styler which caused CSS to duplicate on multiple renders. (GH39395)
inspect.getmembers(Series) no longer raises an AbstractMethodError (GH38782)
inspect.getmembers(Series)
AbstractMethodError
Bug in Series.where() with numeric dtype and other = None not casting to nan (GH39761)
other = None
nan
Index.where() behavior now mirrors Index.putmask() behavior, i.e. index.where(mask, other) matches index.putmask(~mask, other) (GH39412)
Index.where()
Index.putmask()
index.where(mask, other)
index.putmask(~mask, other)
Bug in pandas.testing.assert_series_equal(), pandas.testing.assert_frame_equal(), pandas.testing.assert_index_equal() and pandas.testing.assert_extension_array_equal() incorrectly raising when an attribute has an unrecognized NA type (GH39461)
pandas.testing.assert_series_equal()
pandas.testing.assert_frame_equal()
pandas.testing.assert_index_equal()
pandas.testing.assert_extension_array_equal()
Bug in Styler where subset arg in methods raised an error for some valid multiindex slices (GH33562)
subset
Styler rendered HTML output minor alterations to support w3 good code standard (GH39626)
Bug in Styler where rendered HTML was missing a column class identifier for certain header cells (GH39716)
Bug in Styler.background_gradient() where text-color was not determined correctly (GH39888)
Styler.background_gradient()
Bug in Styler where multiple elements in CSS-selectors were not correctly added to table_styles (GH39942)
table_styles
Bug in DataFrame.equals(), Series.equals(), Index.equals() with object-dtype containing np.datetime64("NaT") or np.timedelta64("NaT") (GH39650)
DataFrame.equals()
Series.equals()
Index.equals()
np.timedelta64("NaT")
Bug in pandas.util.show_versions() where console JSON output was not proper JSON (GH39701)
pandas.util.show_versions()