This is a minor bug-fix release in the 0.21.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
Temporarily restore matplotlib datetime plotting functionality. This should resolve issues for users who implicitly relied on pandas to plot datetimes with matplotlib. See here.
Improvements to the Parquet IO functions introduced in 0.21.0. See here.
What’s new in v0.21.1
Restore Matplotlib datetime converter registration
New features
Improvements to the Parquet IO functionality
Other enhancements
Deprecations
Performance improvements
Bug fixes
Conversion
Indexing
IO
Plotting
GroupBy/resample/rolling
Reshaping
Numeric
Categorical
String
Contributors
pandas implements some matplotlib converters for nicely formatting the axis labels on plots with datetime or Period values. Prior to pandas 0.21.0, these were implicitly registered with matplotlib, as a side effect of import pandas.
datetime
Period
import pandas
In pandas 0.21.0, we required users to explicitly register the converter. This caused problems for some users who relied on those converters being present for regular matplotlib.pyplot plotting methods, so we’re temporarily reverting that change; pandas 0.21.1 again registers the converters on import, just like before 0.21.0.
matplotlib.pyplot
We’ve added a new option to control the converters: pd.options.plotting.matplotlib.register_converters. By default, they are registered. Toggling this to False removes pandas’ formatters and restore any converters we overwrote when registering them (GH18301).
pd.options.plotting.matplotlib.register_converters
False
We’re working with the matplotlib developers to make this easier. We’re trying to balance user convenience (automatically registering the converters) with import performance and best practices (importing pandas shouldn’t have the side effect of overwriting any custom converters you’ve already set). In the future we hope to have most of the datetime formatting functionality in matplotlib, with just the pandas-specific converters in pandas. We’ll then gracefully deprecate the automatic registration of converters in favor of users explicitly registering them when they want them.
DataFrame.to_parquet() will now write non-default indexes when the underlying engine supports it. The indexes will be preserved when reading back in with read_parquet() (GH18581).
DataFrame.to_parquet()
read_parquet()
read_parquet() now allows to specify the columns to read from a parquet file (GH18154)
read_parquet() now allows to specify kwargs which are passed to the respective engine (GH18216)
Timestamp.timestamp() is now available in Python 2.7. (GH17329)
Timestamp.timestamp()
Grouper and TimeGrouper now have a friendly repr output (GH18203).
Grouper
TimeGrouper
pandas.tseries.register has been renamed to pandas.plotting.register_matplotlib_converters() (GH18301)
pandas.tseries.register
pandas.plotting.register_matplotlib_converters()
Improved performance of plotting large series/dataframes (GH18236).
Bug in TimedeltaIndex subtraction could incorrectly overflow when NaT is present (GH17791)
TimedeltaIndex
NaT
Bug in DatetimeIndex subtracting datetimelike from DatetimeIndex could fail to overflow (GH18020)
DatetimeIndex
Bug in IntervalIndex.copy() when copying and IntervalIndex with non-default closed (GH18339)
IntervalIndex.copy()
IntervalIndex
closed
Bug in DataFrame.to_dict() where columns of datetime that are tz-aware were not converted to required arrays when used with orient='records', raising TypeError (GH18372)
DataFrame.to_dict()
orient='records'
TypeError
Bug in DateTimeIndex and date_range() where mismatching tz-aware start and end timezones would not raise an err if end.tzinfo is None (GH18431)
DateTimeIndex
date_range()
start
end
end.tzinfo
Bug in Series.fillna() which raised when passed a long integer on Python 2 (GH18159).
Series.fillna()
Bug in a boolean comparison of a datetime.datetime and a datetime64[ns] dtype Series (GH17965)
datetime.datetime
datetime64[ns]
Bug where a MultiIndex with more than a million records was not raising AttributeError when trying to access a missing attribute (GH18165)
MultiIndex
AttributeError
Bug in IntervalIndex constructor when a list of intervals is passed with non-default closed (GH18334)
Bug in Index.putmask when an invalid mask passed (GH18368)
Index.putmask
Bug in masked assignment of a timedelta64[ns] dtype Series, incorrectly coerced to float (GH18493)
timedelta64[ns]
Series
Bug in class:~pandas.io.stata.StataReader not converting date/time columns with display formatting addressed (GH17990). Previously columns with display formatting were normally left as ordinal numbers and not converted to datetime objects.
Bug in read_csv() when reading a compressed UTF-16 encoded file (GH18071)
read_csv()
Bug in read_csv() for handling null values in index columns when specifying na_filter=False (GH5239)
na_filter=False
Bug in read_csv() when reading numeric category fields with high cardinality (GH18186)
Bug in DataFrame.to_csv() when the table had MultiIndex columns, and a list of strings was passed in for header (GH5539)
DataFrame.to_csv()
header
Bug in parsing integer datetime-like columns with specified format in read_sql (GH17855).
read_sql
Bug in DataFrame.to_msgpack() when serializing data of the numpy.bool_ datatype (GH18390)
DataFrame.to_msgpack()
numpy.bool_
Bug in read_json() not decoding when reading line delimited JSON from S3 (GH17200)
read_json()
Bug in pandas.io.json.json_normalize() to avoid modification of meta (GH18610)
pandas.io.json.json_normalize()
meta
Bug in to_latex() where repeated MultiIndex values were not printed even though a higher level index differed from the previous row (GH14484)
to_latex()
Bug when reading NaN-only categorical columns in HDFStore (GH18413)
HDFStore
Bug in DataFrame.to_latex() with longtable=True where a latex multicolumn always spanned over three columns (GH17959)
DataFrame.to_latex()
longtable=True
Bug in DataFrame.plot() and Series.plot() with DatetimeIndex where a figure generated by them is not pickleable in Python 3 (GH18439)
DataFrame.plot()
Series.plot()
Bug in DataFrame.resample(...).apply(...) when there is a callable that returns different columns (GH15169)
DataFrame.resample(...).apply(...)
Bug in DataFrame.resample(...) when there is a time change (DST) and resampling frequency is 12h or higher (GH15549)
DataFrame.resample(...)
Bug in pd.DataFrameGroupBy.count() when counting over a datetimelike column (GH13393)
pd.DataFrameGroupBy.count()
Bug in rolling.var where calculation is inaccurate with a zero-valued array (GH18430)
rolling.var
Error message in pd.merge_asof() for key datatype mismatch now includes datatype of left and right key (GH18068)
pd.merge_asof()
Bug in pd.concat when empty and non-empty DataFrames or Series are concatenated (GH18178 GH18187)
pd.concat
Bug in DataFrame.filter(...) when unicode is passed as a condition in Python 2 (GH13101)
DataFrame.filter(...)
unicode
Bug when merging empty DataFrames when np.seterr(divide='raise') is set (GH17776)
np.seterr(divide='raise')
Bug in pd.Series.rolling.skew() and rolling.kurt() with all equal values has floating issue (GH18044)
pd.Series.rolling.skew()
rolling.kurt()
Bug in DataFrame.astype() where casting to ‘category’ on an empty DataFrame causes a segmentation fault (GH18004)
DataFrame.astype()
DataFrame
Error messages in the testing module have been improved when items have different CategoricalDtype (GH18069)
CategoricalDtype
CategoricalIndex can now correctly take a pd.api.types.CategoricalDtype as its dtype (GH18116)
CategoricalIndex
pd.api.types.CategoricalDtype
Bug in Categorical.unique() returning read-only codes array when all categories were NaN (GH18051)
Categorical.unique()
codes
NaN
Bug in DataFrame.groupby(axis=1) with a CategoricalIndex (GH18432)
DataFrame.groupby(axis=1)
Series.str.split() will now propagate NaN values across all expanded columns instead of None (GH18450)
Series.str.split()
None
A total of 46 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Aaron Critchley +
Alex Rychyk
Alexander Buchkovsky +
Alexander Michael Schade +
Chris Mazzullo
Cornelius Riemenschneider +
Dave Hirschfeld +
David Fischer +
David Stansby +
Dror Atariah +
Eric Kisslinger +
Hans +
Ingolf Becker +
Jan Werkmann +
Jeff Reback
Joris Van den Bossche
Jörg Döpfert +
Kevin Kuhl +
Krzysztof Chomski +
Leif Walsh
Licht Takeuchi
Manraj Singh +
Matt Braymer-Hayes +
Michael Waskom +
Mie~~~ +
Peter Hoffmann +
Robert Meyer +
Sam Cohan +
Sietse Brouwer +
Sven +
Tim Swast
Tom Augspurger
Wes Turner
William Ayd +
Yee Mey +
bolkedebruin +
cgohlke
derestle-htwg +
fjdiod +
gabrielclow +
gfyoung
ghasemnaddaf +
jbrockmendel
jschendel
miker985 +
topper-123