v0.21.0 (October 27, 2017)¶
This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Integration with Apache Parquet, including a new top-level
read_parquet()
function andDataFrame.to_parquet()
method, see here. - New user-facing
pandas.api.types.CategoricalDtype
for specifying categoricals independent of the data, see here. - The behavior of
sum
andprod
on all-NaN Series/DataFrames is now consistent and no longer depends on whether bottleneck is installed, andsum
andprod
on empty Series now return NaN instead of 0, see here. - Compatibility fixes for pypy, see here.
- Additions to the
drop
,reindex
andrename
API to make them more consistent, see here. - Addition of the new methods
DataFrame.infer_objects
(see here) andGroupBy.pipe
(see here). - Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see here.
Check the API Changes and deprecations before updating.
What’s new in v0.21.0
- New features
- Integration with Apache Parquet file format
infer_objects
type conversion- Improved warnings when attempting to create columns
drop
now also accepts index/columns keywordsrename
,reindex
now also accept axis keywordCategoricalDtype
for specifying categoricalsGroupBy
objects now have apipe
methodCategorical.rename_categories
accepts a dict-like- Other enhancements
- Backwards incompatible API changes
- Dependencies have increased minimum versions
- Sum/Prod of all-NaN or empty Series/DataFrames is now consistently NaN
- Indexing with a list with missing labels is deprecated
- NA naming changes
- Iteration of Series/Index will now return Python scalars
- Indexing with a Boolean Index
PeriodIndex
resampling- Improved error handling during item assignment in pd.eval
- Dtype conversions
- MultiIndex constructor with a single level
- UTC Localization with Series
- Consistency of range functions
- No automatic Matplotlib converters
- Other API changes
- Deprecations
- Removal of prior version deprecations/changes
- Performance improvements
- Documentation changes
- Bug fixes
- Contributors
New features¶
Integration with Apache Parquet file format¶
Integration with Apache Parquet, including a new top-level read_parquet()
and DataFrame.to_parquet()
method, see here (GH15838, GH17438).
Apache Parquet provides a cross-language, binary file format for reading and writing data frames efficiently.
Parquet is designed to faithfully serialize and de-serialize DataFrame
s, supporting all of the pandas
dtypes, including extension dtypes such as datetime with timezones.
This functionality depends on either the pyarrow or fastparquet library. For more details, see see the IO docs on Parquet.
infer_objects
type conversion¶
The DataFrame.infer_objects()
and Series.infer_objects()
methods have been added to perform dtype inference on object columns, replacing
some of the functionality of the deprecated convert_objects
method. See the documentation here
for more details. (GH11221)
This method only performs soft conversions on object columns, converting Python objects to native types, but not any coercive conversions. For example:
In [1]: df = pd.DataFrame({'A': [1, 2, 3],
...: 'B': np.array([1, 2, 3], dtype='object'),
...: 'C': ['1', '2', '3']})
...:
In [2]: df.dtypes
Out[2]:
A int64
B object
C object
Length: 3, dtype: object
In [3]: df.infer_objects().dtypes