What’s new in 3.0.0 (Month XX, 2025)#

These are the changes in pandas 3.0.0. See Release notes for a full changelog including other versions of pandas.

Enhancements#

Dedicated string data type by default#

Historically, pandas represented string columns with NumPy object data type. This representation has numerous problems: it is not specific to strings (any Python object can be stored in an object-dtype array, not just strings) and it is often not very efficient (both performance wise and for memory usage).

Starting with pandas 3.0, a dedicated string data type is enabled by default (backed by PyArrow under the hood, if installed, otherwise falling back to being backed by NumPy object-dtype). This means that pandas will start inferring columns containing string data as the new str data type when creating pandas objects, such as in constructors or IO functions.

Old behavior:

>>> ser = pd.Series(["a", "b"])
0    a
1    b
dtype: object

New behavior:

>>> ser = pd.Series(["a", "b"])
0    a
1    b
dtype: str

The string data type that is used in these scenarios will mostly behave as NumPy object would, including missing value semantics and general operations on these columns.

The main characteristic of the new string data type:

  • Inferred by default for string data (instead of object dtype)

  • The str dtype can only hold strings (or missing values), in contrast to object dtype. (setitem with non string fails)

  • The missing value sentinel is always NaN (np.nan) and follows the same missing value semantics as the other default dtypes.

Those intentional changes can have breaking consequences, for example when checking for the .dtype being object dtype or checking the exact missing value sentinel. See the Migration guide for the new string data type (pandas 3.0) for more details on the behaviour changes and how to adapt your code to the new default.

Copy-on-Write#

The new “copy-on-write” behaviour in pandas 3.0 brings changes in behavior in how pandas operates with respect to copies and views. A summary of the changes:

  1. The result of any indexing operation (subsetting a DataFrame or Series in any way, i.e. including accessing a DataFrame column as a Series) or any method returning a new DataFrame or Series, always behaves as if it were a copy in terms of user API.

  2. As a consequence, if you want to modify an object (DataFrame or Series), the only way to do this is to directly modify that object itself.

The main goal of this change is to make the user API more consistent and predictable. There is now a clear rule: any subset or returned series/dataframe always behaves as a copy of the original, and thus never modifies the original (before pandas 3.0, whether a derived object would be a copy or a view depended on the exact operation performed, which was often confusing).

Because every single indexing step now behaves as a copy, this also means that “chained assignment” (updating a DataFrame with multiple setitem steps) will stop working. Because this now consistently never works, the SettingWithCopyWarning is removed.

The new behavioral semantics are explained in more detail in the user guide about Copy-on-Write.

A secondary goal is to improve performance by avoiding unnecessary copies. As mentioned above, every new DataFrame or Series returned from an indexing operation or method behaves as a copy, but under the hood pandas will use views as much as possible, and only copy when needed to guarantee the “behaves as a copy” behaviour (this is the actual “copy-on-write” mechanism used as an implementation detail).

Some of the behaviour changes described above are breaking changes in pandas 3.0. When upgrading to pandas 3.0, it is recommended to first upgrade to pandas 2.3 to get deprecation warnings for a subset of those changes. The migration guide explains the upgrade process in more detail.

Setting the option mode.copy_on_write no longer has any impact. The option is deprecated and will be removed in pandas 4.0.

pd.col syntax can now be used in DataFrame.assign() and DataFrame.loc()#

You can now use pd.col to create callables for use in dataframe methods which accept them. For example, if you have a dataframe

In [1]: df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})

and you want to create a new column 'c' by summing 'a' and 'b', then instead of

In [2]: df.assign(c = lambda df: df['a'] + df['b'])
Out[2]: 
   a  b  c
0  1  4  5
1  1  5  6
2  2  6  8

you can now write:

In [3]: df.assign(c = pd.col('a') + pd.col('b'))
Out[3]: 
   a  b  c
0  1  4  5
1  1  5  6
2  2  6  8

New Deprecation Policy#

pandas 3.0.0 introduces a new 3-stage deprecation policy: using DeprecationWarning initially, then switching to FutureWarning for broader visibility in the last minor version before the next major release, and then removal of the deprecated functionality in the major release. This was done to give downstream packages more time to adjust to pandas deprecations, which should reduce the amount of warnings that a user gets from code that isn’t theirs. See PDEP 17 for more details.

All warnings for upcoming changes in pandas will have the base class pandas.errors.PandasChangeWarning. Users may also use the following subclasses to control warnings.

Other enhancements#

Notable bug fixes#

These are bug fixes that might have notable behavior changes.

Improved behavior in groupby for observed=False#

A number of bugs have been fixed due to improved handling of unobserved groups. All remarks in this section equally impact SeriesGroupBy. (GH 55738)

In previous versions of pandas, a single grouping with DataFrameGroupBy.apply() or DataFrameGroupBy.agg() would pass the unobserved groups to the provided function, resulting correctly in 0 below.

In [4]: df = pd.DataFrame(
   ...:     {
   ...:         "key1": pd.Categorical(list("aabb"), categories=list("abc")),
   ...:         "key2": [1, 1, 1, 2],
   ...:         "values": [1, 2, 3, 4],
   ...:     }
   ...: )
   ...: 

In [5]: df
Out[5]: 
  key1  key2  values
0    a     1       1
1    a     1       2
2    b     1       3
3    b     2       4

In [6]: gb = df.groupby("key1", observed=False)

In [7]: gb[["values"]].apply(lambda x: x.sum())
Out[7]: 
      values
key1        
a          3
b          7
c          0

However this was not the case when using multiple groupings, resulting in NaN below.

In [1]: gb = df.groupby(["key1", "key2"], observed=False)
In [2]: gb[["values"]].apply(lambda x: x.sum())
Out[2]:
           values
key1 key2
a    1        3.0
     2        NaN
b    1        3.0
     2        4.0
c    1        NaN
     2        NaN

Now using multiple groupings will also pass the unobserved groups to the provided function.

In [8]: gb = df.groupby(["key1", "key2"], observed=False)

In [9]: gb[["values"]].apply(lambda x: x.sum())
Out[9]: 
           values
key1 key2        
a    1          3
     2          0
b    1          3
     2          4
c    1          0
     2          0

Similarly:

These improvements also fixed certain bugs in groupby:

Backwards incompatible API changes#

Datetime resolution inference#

Converting a sequence of strings, datetime objects, or np.datetime64 objects to a datetime64 dtype now performs inference on the appropriate resolution (AKA unit) for the output dtype. This affects Series, DataFrame, Index, DatetimeIndex, and to_datetime().

Previously, these would always give nanosecond resolution:

In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()
In [2]: pd.to_datetime([dt]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.Index([dt]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.DatetimeIndex([dt]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.Series([dt]).dtype
Out[5]: dtype('<M8[ns]')

This now infers the unit microsecond unit “us” from the pydatetime object, matching the scalar Timestamp behavior.

In [10]: In [1]: dt = pd.Timestamp("2024-03-22 11:36").to_pydatetime()

In [11]: In [2]: pd.to_datetime([dt]).dtype
Out[11]: dtype('<M8[us]')

In [12]: In [3]: pd.Index([dt]).dtype
Out[12]: dtype('<M8[us]')

In [13]: In [4]: pd.DatetimeIndex([dt]).dtype
Out[13]: dtype('<M8[us]')

In [14]: In [5]: pd.Series([dt]).dtype
Out[14]: dtype('<M8[us]')

Similar when passed a sequence of np.datetime64 objects, the resolution of the passed objects will be retained (or for lower-than-second resolution, second resolution will be used).

When passing strings, the resolution will depend on the precision of the string, again matching the Timestamp behavior. Previously:

In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
Out[2]: dtype('<M8[ns]')
In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
Out[3]: dtype('<M8[ns]')
In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
Out[4]: dtype('<M8[ns]')
In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
Out[5]: dtype('<M8[ns]')

The inferred resolution now matches that of the input strings for nanosecond-precision strings, otherwise defaulting to microseconds:

In [15]: In [2]: pd.to_datetime(["2024-03-22 11:43:01"]).dtype
Out[15]: dtype('<M8[us]')

In [16]: In [3]: pd.to_datetime(["2024-03-22 11:43:01.002"]).dtype
Out[16]: dtype('<M8[us]')

In [17]: In [4]: pd.to_datetime(["2024-03-22 11:43:01.002003"]).dtype
Out[17]: dtype('<M8[us]')

In [18]: In [5]: pd.to_datetime(["2024-03-22 11:43:01.002003004"]).dtype
Out[18]: dtype('<M8[ns]')

This is also a change for the Timestamp constructor with a string input, which in version 2.x.y could give second or millisecond unit, which users generally disliked (GH 52653)

In cases with mixed-resolution inputs, the highest resolution is used:

In [2]: pd.to_datetime([pd.Timestamp("2024-03-22 11:43:01"), "2024-03-22 11:43:01.002"]).dtype
Out[2]: dtype('<M8[ns]')

Warning

Many users will now get “M8[us]” dtype data in cases when they used to get “M8[ns]”. For most use cases they should not notice a difference. One big exception is converting to integers, which will give integers 1000x smaller.

Similarly, the Timedelta constructor and to_timedelta() with a string input now defaults to a microsecond unit, using nanosecond unit only in cases that actually have nanosecond precision.

concat() no longer ignores sort when all objects have a DatetimeIndex#

When all objects passed to concat() have a DatetimeIndex, passing sort=False will now result in the non-concatenation axis not being sorted. Previously, the result would always be sorted along the non-concatenation axis even when sort=False is passed. (GH 57335)

If you do not specify the sort argument, pandas will continue to return a sorted result but this behavior is deprecated and you will receive a warning. In order to make this less noisy for users, pandas checks if not sorting would impact the result and only warns when it would. This check can be expensive, and users can skip the check by explicitly specifying sort=True or sort=False.

This deprecation can also impact pandas’ internal usage of concat(). Here cases where concat() was sorting a DatetimeIndex but not other indexes are considered bugs and have been fixed as noted below. However it is possible some have been missed. In order to be cautious here, pandas has not added sort=False to any internal calls where we believe behavior should not change. If we have missed something, users will not experience a behavior change but they will receive a warning about concat() even though they are not directly calling this function. If this does occur, we ask users to open an issue so that we may address any potential behavior changes.

In [19]: idx1 = pd.date_range("2025-01-02", periods=3, freq="h")

In [20]: df1 = pd.DataFrame({"a": [1, 2, 3]}, index=idx1)

In [21]: df1
Out[21]: 
                     a
2025-01-02 00:00:00  1
2025-01-02 01:00:00  2
2025-01-02 02:00:00  3

In [22]: idx2 = pd.date_range("2025-01-01", periods=3, freq="h")

In [23]: df2 = pd.DataFrame({"b": [1, 2, 3]}, index=idx2)

In [24]: df2
Out[24]: 
                     b
2025-01-01 00:00:00  1
2025-01-01 01:00:00  2
2025-01-01 02:00:00  3

Old behavior

In [3]: pd.concat([df1, df2], axis=1, sort=False)
Out[3]:
                       a    b
2025-01-01 00:00:00  NaN  1.0
2025-01-01 01:00:00  NaN  2.0
2025-01-01 02:00:00  NaN  3.0
2025-01-02 00:00:00  1.0  NaN
2025-01-02 01:00:00  2.0  NaN
2025-01-02 02:00:00  3.0  NaN

New behavior

In [25]: pd.concat([df1, df2], axis=1, sort=False)
Out[25]: 
                       a    b
2025-01-02 00:00:00  1.0  NaN
2025-01-02 01:00:00  2.0  NaN
2025-01-02 02:00:00  3.0  NaN
2025-01-01 00:00:00  NaN  1.0
2025-01-01 01:00:00  NaN  2.0
2025-01-01 02:00:00  NaN  3.0

Cases where pandas’ internal usage of concat() resulted in inconsistent sorting that are now fixed in this release are as follows.

Changed behavior in DataFrame.value_counts() and DataFrameGroupBy.value_counts() when sort=False#

In previous versions of pandas, DataFrame.value_counts() with sort=False would sort the result by row labels (as was documented). This was nonintuitive and inconsistent with Series.value_counts() which would maintain the order of the input. Now DataFrame.value_counts() will maintain the order of the input. (GH 59745)

In [26]: df = pd.DataFrame(
   ....:     {
   ....:         "a": [2, 2, 2, 2, 1, 1, 1, 1],
   ....:         "b": [2, 1, 3, 1, 2, 3, 1, 1],
   ....:     }
   ....: )
   ....: 

In [27]: df
Out[27]: 
   a  b
0  2  2
1  2  1
2  2  3
3  2  1
4  1  2
5  1  3
6  1  1
7  1  1

Old behavior

In [3]: df.value_counts(sort=False)
Out[3]:
a  b
1  1    2
   2    1
   3    1
2  1    2
   2    1
   3    1
Name: count, dtype: int64

New behavior

In [28]: df.value_counts(sort=False)
Out[28]: 
a  b
2  2    1
   1    2
   3    1
1  2    1
   3    1
   1    2
Name: count, dtype: int64

This change also applies to DataFrameGroupBy.value_counts(). Here, there are two options for sorting: one sort passed to DataFrame.groupby() and one passed directly to DataFrameGroupBy.value_counts(). The former will determine whether to sort the groups, the latter whether to sort the counts. All non-grouping columns will maintain the order of the input within groups.

Old behavior

In [5]: df.groupby("a", sort=True).value_counts(sort=False)
Out[5]:
a  b
1  1    2
   2    1
   3    1
2  1    2
   2    1
   3    1
dtype: int64

New behavior

In [29]: df.groupby("a", sort=True).value_counts(sort=False)
Out[29]: 
a  b
1  2    1
   3    1
   1    2
2  2    1
   3    1
   1    2
Name: count, dtype: int64

Changed behavior of pd.offsets.Day to always represent calendar-day#

In previous versions of pandas, offsets.Day represented a fixed span of 24 hours, disregarding Daylight Savings Time transitions. It now consistently behaves as a calendar-day, preserving time-of-day across DST transitions. (GH 61985)

Old behavior

In [5]: ts = pd.Timestamp("2025-03-08 08:00", tz="US/Eastern")
In [6]: ts + pd.offsets.Day(1)
Out[3]: Timestamp('2025-03-09 09:00:00-0400', tz='US/Eastern')

New behavior

In [30]: ts = pd.Timestamp("2025-03-08 08:00", tz="US/Eastern")

In [31]: ts + pd.offsets.Day(1)
Out[31]: Timestamp('2025-03-09 08:00:00-0400', tz='US/Eastern')

This change fixes a long-standing bug in date_range() (GH 51716, GH 35388), but causes several small behavior differences as collateral:

  • pd.offsets.Day(n) no longer compares as equal to pd.offsets.Hour(24*n)

  • offsets.Day no longer supports division

  • Timedelta no longer accepts Day objects as inputs

  • tseries.frequencies.to_offset() on a Timedelta object returns an offsets.Hour object in cases where it used to return a Day object.

  • Adding or subtracting a scalar from a timezone-aware DatetimeIndex with a Day freq no longer preserves that freq attribute.

  • Adding or subtracting a Day with a Timedelta is no longer supported.

  • Adding or subtracting a Day offset to a timezone-aware Timestamp or datetime-like may lead to an ambiguous or non-existent time, which will raise.

Changed treatment of NaN values in pyarrow and numpy-nullable floating dtypes#

Previously, when dealing with a nullable dtype (e.g. Float64Dtype or int64[pyarrow]), NaN was treated as interchangeable with NA in some circumstances but not others. This was done to make adoption easier, but caused some confusion (GH 32265). In 3.0, this behaviour is made consistent to by default treat NaN as equivalent to NA in all cases.

By default, NaN can be passed to constructors, __setitem__, __contains__ and will be treated the same as NA. The only change users will see is that arithmetic and np.ufunc operations that previously introduced NaN entries produce NA entries instead.

Old behavior:

# NaN in input gets converted to NA
In [1]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())
In [2]: ser
Out[2]:
0     0.0
1    <NA>
dtype: Float64
# NaN produced by arithmetic (0/0) remained NaN
In [3]: ser / 0
Out[3]:
0     NaN
1    <NA>
dtype: Float64
# the NaN value is not considered as missing
In [4]: (ser / 0).isna()
Out[4]:
0    False
1     True
dtype: bool

New behavior:

In [32]: ser = pd.Series([0, np.nan], dtype=pd.Float64Dtype())

In [33]: ser
Out[33]: 
0     0.0
1    <NA>
dtype: Float64

In [34]: ser / 0
Out[34]: 
0    <NA>
1    <NA>
dtype: Float64

In [35]: (ser / 0).isna()
Out[35]: 
0    True
1    True
dtype: bool

In the future, the intention is to consider NaN and NA as distinct values, and an option to control this behaviour is added in 3.0 through pd.options.future.distinguish_nan_and_na. When enabled, NaN is always considered distinct and specifically as a floating-point value. As a consequence, it cannot be used with integer dtypes.

Old behavior:

In [2]: ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
In [3]: ser[1]
Out[3]: <NA>

New behavior:

In [36]: with pd.option_context("future.distinguish_nan_and_na", True):
   ....:     ser = pd.Series([1, np.nan], dtype=pd.Float64Dtype())
   ....:     print(ser[1])
   ....: 
nan

If we had passed pd.Int64Dtype() or "int64[pyarrow]" for the dtype in the latter example, this would raise, as a float NaN cannot be held by an integer dtype.

With "future.distinguish_nan_and_na" enabled, ser.to_numpy() (and frame.values and np.asarray(obj)) will convert to object dtype if NA entries are present, where before they would coerce to NaN. To retain a float numpy dtype, explicitly pass na_value=np.nan to Series.to_numpy().

Note that the option is experimental and subject to change in future releases.

The __module__ attribute now points to public modules#

The __module__ attribute on functions and classes in the public API has been updated to refer to the preferred public module from which to access the object, rather than the module in which the object happens to be defined (GH 55178).

This produces more informative displays in the Python console for classes, e.g., instead of <class 'pandas.core.frame.DataFrame'> you now see <class 'pandas.DataFrame'>, and in interactive tools such as IPython, e.g., instead of <function pandas.io.parsers.readers.read_csv(...)> you now see <function pandas.read_csv(...)>.

This may break code that relies on the previous __module__ values (e.g. doctests inspecting the type() of a DataFrame object).

Increased minimum version for Python#

pandas 3.0.0 supports Python 3.11 and higher.

Increased minimum versions for dependencies#

Some minimum supported versions of dependencies were updated. The following required dependencies were updated:

Package

New Minimum Version

numpy

1.26.0

tzdata

2023.3

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package

New Minimum Version

adbc-driver-postgresql

1.2.0

adbc-driver-sqlite

1.2.0

mypy (dev)

1.9.0

beautifulsoup4

4.12.3

bottleneck

1.4.2

fastparquet

2024.11.0

fsspec

2024.10.0

hypothesis

6.116.0

gcsfs

2024.10.0

Jinja2

3.1.5

lxml

5.3.0

Jinja2

3.1.3

matplotlib

3.9.3

numba

0.60.0

numexpr

2.10.2

qtpy

2.4.2

openpyxl

3.1.5

psycopg2

2.9.10

pyarrow

13.0.0

pymysql

1.1.1

pyreadstat

1.2.8

pytables

3.10.1

python-calamine

0.3.0

pytz

2024.2

s3fs

2024.10.0

SciPy

1.14.1

sqlalchemy

2.0.36

xarray

2024.10.0

xlsxwriter

3.2.0

zstandard

0.23.0

See Dependencies and Optional dependencies for more.

pytz now an optional dependency#

pandas now uses zoneinfo from the standard library as the default timezone implementation when passing a timezone string to various methods. (GH 34916)

Old behavior:

In [1]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")
In [2]: ts.tz
<DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>

New behavior:

In [37]: ts = pd.Timestamp(2024, 1, 1).tz_localize("US/Pacific")

In [38]: ts.tz
Out[38]: zoneinfo.ZoneInfo(key='US/Pacific')

pytz timezone objects are still supported when passed directly, but they will no longer be returned by default from string inputs. Moreover, pytz is no longer a required dependency of pandas, but can be installed with the pip extra pip install pandas[timezone].

Additionally, pandas no longer throws pytz exceptions for timezone operations leading to ambiguous or nonexistent times. These cases will now raise a ValueError.

Other API changes#

  • 3rd party py.path objects are no longer explicitly supported in IO methods. Use pathlib.Path objects instead (GH 57091)

  • read_table()’s parse_dates argument defaults to None to improve consistency with read_csv() (GH 57476)

  • All classes inheriting from builtin tuple (including types created with collections.namedtuple()) are now hashed and compared as builtin tuple during indexing operations (GH 57922)

  • Made dtype a required argument in ExtensionArray._from_sequence_of_strings() (GH 56519)

  • Passing a Series input to json_normalize() will now retain the Series Index, previously output had a new RangeIndex (GH 51452)

  • Pickle and HDF (.h5) files created with Python 2 are no longer explicitly supported (GH 57387)

  • Pickled objects from pandas version less than 1.0.0 are no longer supported (GH 57155)

  • Removed Index.sort() which always raised a TypeError. This attribute is not defined and will raise an AttributeError (GH 59283)

  • Unused dtype argument has been removed from the MultiIndex constructor (GH 60962)

  • Updated DataFrame.to_excel() so that the output spreadsheet has no styling. Custom styling can still be done using Styler.to_excel() (GH 54154)

  • When comparing the indexes in testing.assert_series_equal(), check_exact defaults to True if an Index is of integer dtype. (GH 57386)

  • Index set operations (like union or intersection) will now ignore the dtype of an empty RangeIndex or empty Index with object dtype when determining the dtype of the resulting Index (GH 60797)

  • IncompatibleFrequency now subclasses TypeError instead of ValueError. As a result, joins with mismatched frequencies now cast to object like other non-comparable joins, and arithmetic with indexes with mismatched frequencies align (GH 55782)

  • Series “flex” methods like Series.add() no longer allow passing a DataFrame for other; use the DataFrame reversed method instead (GH 46179)

  • date_range() and timedelta_range() no longer default to unit="ns", instead will infer a unit from the start, end, and freq parameters. Explicitly specify a desired unit to override these (GH 59031)

  • CategoricalIndex.append() no longer attempts to cast different-dtype indexes to the caller’s dtype (GH 41626)

  • ExtensionDtype.construct_array_type() is now a regular method instead of a classmethod (GH 58663)

  • Arithmetic operations between a Series, Index, or ExtensionArray with a list now consistently wrap that list with an array equivalent to Series(my_list).array. To do any other kind of type inference or casting, do so explicitly before operating (GH 62552)

  • Comparison operations between Index and Series now consistently return Series regardless of which object is on the left or right (GH 36759)

  • NumPy functions like np.isinf that return a bool dtype when called on a Index object now return a bool-dtype Index instead of np.ndarray (GH 52676)

  • Methods that can operate in-place (replace(), fillna(), ffill(), bfill(), interpolate(), where(), mask(), clip()) now return the modified DataFrame or Series (self) instead of None when inplace=True (GH 63207)

Deprecations#

Copy keyword#

The copy keyword argument in the following methods is deprecated and will be removed in a future version. (GH 57347)

Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until necessary. Use .copy to trigger an eager copy. The copy keyword has no effect starting with 3.0, so it can be safely removed from your code.

Other Deprecations#

Removal of prior version deprecations/changes#

Enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets#

Renamed the following offset aliases (GH 57986):

offset

removed alias

new alias

MonthEnd

M

ME

BusinessMonthEnd

BM

BME

SemiMonthEnd

SM

SME

CustomBusinessMonthEnd

CBM

CBME

QuarterEnd

Q

QE

BQuarterEnd

BQ

BQE

YearEnd

Y

YE

BYearEnd

BY

BYE

Other Removals#

Performance improvements#

Bug fixes#

Categorical#

Datetimelike#

Timedelta#

  • Accuracy improvement in Timedelta.to_pytimedelta() to round microseconds consistently for large nanosecond based Timedelta (GH 57841)

  • Bug in Timedelta constructor failing to raise when passed an invalid keyword (GH 53801)

  • Bug in DataFrame.cumsum() which was raising IndexError if dtype is timedelta64[ns] (GH 57956)

  • Bug in multiplication operations with timedelta64 dtype failing to raise TypeError when multiplying by bool objects or dtypes (GH 58054)

  • Bug in multiplication operations with timedelta64 dtype incorrectly raising when multiplying by numpy-nullable dtypes or pyarrow integer dtypes (GH 58054)

Timezones#

  • Bug in DatetimeIndex.union(), DatetimeIndex.intersection(), and DatetimeIndex.symmetric_difference() changing timezone to UTC when merging two DatetimeIndex objects with the same timezone but different units (GH 60080)

  • Bug in Series.dt.tz_localize() with a timezone-aware ArrowDtype incorrectly converting to UTC when tz=None (GH 61780)

  • Fixed bug in date_range() where tz-aware endpoints with calendar offsets (e.g. "MS") failed on DST fall-back. These now respect ambiguous/ nonexistent. (GH 52908)

Numeric#

Conversion#

Strings#

Interval#

Indexing#

Missing#

MultiIndex#

I/O#

Period#

Plotting#

Groupby/resample/rolling#

Reshaping#

Sparse#

ExtensionArray#

  • Bug in arrays.ArrowExtensionArray.__setitem__() which caused wrong behavior when using an integer array with repeated values as a key (GH 58530)

  • Bug in ArrowExtensionArray.factorize() where NA values were dropped when input was dictionary-encoded even when dropna was set to False(GH 60567)

  • Bug in NDArrayBackedExtensionArray.take() which produced arrays whose dtypes didn’t match their underlying data, when called with integer arrays (GH 62448)

  • Bug in api.types.is_datetime64_any_dtype() where a custom ExtensionDtype would return False for array-likes (GH 57055)

  • Bug in comparison between object with ArrowDtype and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-False (for ==) or all-True (for !=) (GH 59505)

  • Bug in constructing pandas data structures when passing into dtype a string of the type followed by [pyarrow] while PyArrow is not installed would raise NameError rather than ImportError (GH 57928)

  • Bug in various DataFrame reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (GH 59234)

  • Fixed flex arithmetic with ExtensionArray operands raising when fill_value was passed. (GH 62467)

Styler#

  • Bug in Styler.to_latex() where styling column headers when combined with a hidden index or hidden index-levels is fixed.

Other#

Contributors#