What’s New¶
These are new features and improvements of note in each release.
v0.21.1 (December 12, 2017)¶
This is a minor bug-fix release in the 0.21.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
- Temporarily restore matplotlib datetime plotting functionality. This should resolve issues for users who implicitly relied on pandas to plot datetimes with matplotlib. See here.
- Improvements to the Parquet IO functions introduced in 0.21.0. See here.
What’s new in v0.21.1
Restore Matplotlib datetime Converter Registration¶
Pandas implements some matplotlib converters for nicely formatting the axis
labels on plots with datetime or Period values. Prior to pandas 0.21.0,
these were implicitly registered with matplotlib, as a side effect of import
pandas.
In pandas 0.21.0, we required users to explicitly register the
converter. This caused problems for some users who relied on those converters
being present for regular matplotlib.pyplot plotting methods, so we’re
temporarily reverting that change; pandas 0.21.1 again registers the converters on
import, just like before 0.21.0.
We’ve added a new option to control the converters:
pd.options.plotting.matplotlib.register_converters. By default, they are
registered. Toggling this to False removes pandas’ formatters and restore
any converters we overwrote when registering them (GH18301).
We’re working with the matplotlib developers to make this easier. We’re trying to balance user convenience (automatically registering the converters) with import performance and best practices (importing pandas shouldn’t have the side effect of overwriting any custom converters you’ve already set). In the future we hope to have most of the datetime formatting functionality in matplotlib, with just the pandas-specific converters in pandas. We’ll then gracefully deprecate the automatic registration of converters in favor of users explicitly registering them when they want them.
New features¶
Improvements to the Parquet IO functionality¶
DataFrame.to_parquet()will now write non-default indexes when the underlying engine supports it. The indexes will be preserved when reading back in withread_parquet()(GH18581).read_parquet()now allows to specify the columns to read from a parquet file (GH18154)read_parquet()now allows to specify kwargs which are passed to the respective engine (GH18216)
Other Enhancements¶
Timestamp.timestamp()is now available in Python 2.7. (GH17329)GrouperandTimeGroupernow have a friendly repr output (GH18203).
Deprecations¶
pandas.tseries.registerhas been renamed topandas.plotting.register_matplotlib_converters`()(GH18301)
Bug Fixes¶
Conversion¶
- Bug in
TimedeltaIndexsubtraction could incorrectly overflow whenNaTis present (GH17791) - Bug in
DatetimeIndexsubtracting datetimelike from DatetimeIndex could fail to overflow (GH18020) - Bug in
IntervalIndex.copy()when copying andIntervalIndexwith non-defaultclosed(GH18339) - Bug in
DataFrame.to_dict()where columns of datetime that are tz-aware were not converted to required arrays when used withorient='records', raising``TypeError` (GH18372) - Bug in
DateTimeIndexanddate_range()where mismatching tz-awarestartandendtimezones would not raise an err ifend.tzinfois None (GH18431) - Bug in
Series.fillna()which raised when passed a long integer on Python 2 (GH18159).
Indexing¶
- Bug in a boolean comparison of a
datetime.datetimeand adatetime64[ns]dtype Series (GH17965) - Bug where a
MultiIndexwith more than a million records was not raisingAttributeErrorwhen trying to access a missing attribute (GH18165) - Bug in
IntervalIndexconstructor when a list of intervals is passed with non-defaultclosed(GH18334) - Bug in
Index.putmaskwhen an invalid mask passed (GH18368) - Bug in masked assignment of a
timedelta64[ns]dtypeSeries, incorrectly coerced to float (GH18493)
I/O¶
- Bug in class:~pandas.io.stata.StataReader not converting date/time columns with display formatting addressed (GH17990). Previously columns with display formatting were normally left as ordinal numbers and not converted to datetime objects.
- Bug in
read_csv()when reading a compressed UTF-16 encoded file (GH18071) - Bug in
read_csv()for handling null values in index columns when specifyingna_filter=False(GH5239) - Bug in
read_csv()when reading numeric category fields with high cardinality (GH18186) - Bug in
DataFrame.to_csv()when the table hadMultiIndexcolumns, and a list of strings was passed in forheader(GH5539) - Bug in parsing integer datetime-like columns with specified format in
read_sql(GH17855). - Bug in
DataFrame.to_msgpack()when serializing data of the numpy.bool_ datatype (GH18390) - Bug in
read_json()not decoding when reading line deliminted JSON from S3 (GH17200) - Bug in
pandas.io.json.json_normalize()to avoid modification ofmeta(GH18610) - Bug in
to_latex()where repeated multi-index values were not printed even though a higher level index differed from the previous row (GH14484) - Bug when reading NaN-only categorical columns in
HDFStore(GH18413) - Bug in
DataFrame.to_latex()withlongtable=Truewhere a latex multicolumn always spanned over three columns (GH17959)
Plotting¶
- Bug in
DataFrame.plot()andSeries.plot()withDatetimeIndexwhere a figure generated by them is not pickleable in Python 3 (GH18439)
Groupby/Resample/Rolling¶
- Bug in
DataFrame.resample(...).apply(...)when there is a callable that returns different columns (GH15169) - Bug in
DataFrame.resample(...)when there is a time change (DST) and resampling frequecy is 12h or higher (GH15549) - Bug in
pd.DataFrameGroupBy.count()when counting over a datetimelike column (GH13393) - Bug in
rolling.varwhere calculation is inaccurate with a zero-valued array (GH18430)
Reshaping¶
- Error message in
pd.merge_asof()for key datatype mismatch now includes datatype of left and right key (GH18068) - Bug in
pd.concatwhen empty and non-empty DataFrames or Series are concatenated (GH18178 GH18187) - Bug in
DataFrame.filter(...)whenunicodeis passed as a condition in Python 2 (GH13101) - Bug when merging empty DataFrames when
np.seterr(divide='raise')is set (GH17776)
Numeric¶
- Bug in
pd.Series.rolling.skew()androlling.kurt()with all equal values has floating issue (GH18044) - Bug in
TimedeltaIndexsubtraction could incorrectly overflow whenNaTis present (GH17791) - Bug in
DatetimeIndexsubtracting datetimelike from DatetimeIndex could fail to overflow (GH18020)
Categorical¶
- Bug in
DataFrame.astype()where casting to ‘category’ on an emptyDataFramecauses a segmentation fault (GH18004) - Error messages in the testing module have been improved when items have
different
CategoricalDtype(GH18069) CategoricalIndexcan now correctly take apd.api.types.CategoricalDtypeas its dtype (GH18116)- Bug in
Categorical.unique()returning read-onlycodesarray when all categories wereNaN(GH18051) - Bug in
DataFrame.groupby(axis=1)with aCategoricalIndex(GH18432)
String¶
Series.str.split()will now propogateNaNvalues across all expanded columns instead ofNone(GH18450)
v0.21.0 (October 27, 2017)¶
This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Integration with Apache Parquet, including a new top-level
read_parquet()function andDataFrame.to_parquet()method, see here. - New user-facing
pandas.api.types.CategoricalDtypefor specifying categoricals independent of the data, see here. - The behavior of
sumandprodon all-NaN Series/DataFrames is now consistent and no longer depends on whether bottleneck is installed, andsumandprodon empty Series now return NaN instead of 0, see here. - Compatibility fixes for pypy, see here.
- Additions to the
drop,reindexandrenameAPI to make them more consistent, see here. - Addition of the new methods
DataFrame.infer_objects(see here) andGroupBy.pipe(see here). - Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see here.
Check the API Changes and deprecations before updating.
What’s new in v0.21.0
- New features
- Integration with Apache Parquet file format
infer_objectstype conversion- Improved warnings when attempting to create columns
dropnow also accepts index/columns keywordsrename,reindexnow also accept axis keywordCategoricalDtypefor specifying categoricalsGroupByobjects now have apipemethodCategorical.rename_categoriesaccepts a dict-like- Other Enhancements
- Backwards incompatible API changes
- Dependencies have increased minimum versions
- Sum/Prod of all-NaN or empty Series/DataFrames is now consistently NaN
- Indexing with a list with missing labels is Deprecated
- NA naming Changes
- Iteration of Series/Index will now return Python scalars
- Indexing with a Boolean Index
PeriodIndexresampling- Improved error handling during item assignment in pd.eval
- Dtype Conversions
- MultiIndex Constructor with a Single Level
- UTC Localization with Series
- Consistency of Range Functions
- No Automatic Matplotlib Converters
- Other API Changes
- Deprecations
- Removal of prior version deprecations/changes
- Performance Improvements
- Documentation Changes
- Bug Fixes
New features¶
Integration with Apache Parquet file format¶
Integration with Apache Parquet, including a new top-level read_parquet() and DataFrame.to_parquet() method, see here (GH15838, GH17438).
Apache Parquet provides a cross-language, binary file format for reading and writing data frames efficiently.
Parquet is designed to faithfully serialize and de-serialize DataFrame s, supporting all of the pandas
dtypes, including extension dtypes such as datetime with timezones.
This functionality depends on either the pyarrow or fastparquet library. For more details, see see the IO docs on Parquet.
infer_objects type conversion¶
The DataFrame.infer_objects() and Series.infer_objects()
methods have been added to perform dtype inference on object columns, replacing
some of the functionality of the deprecated convert_objects
method. See the documentation here
for more details. (GH11221)
This method only performs soft conversions on object columns, converting Python objects to native types, but not any coercive conversions. For example:
In [1]: df = pd.DataFrame({'A': [1, 2, 3],
...: 'B': np.array([1, 2, 3], dtype='object'),
...: 'C': ['1', '2', '3']})
...:
In [2]: df.dtypes
Out[2]:
A int64
B object
C object
dtype: object
In [3]: df.infer_objects().dtypes