Version 0.16.2 (June 12, 2015)#
This is a minor bug-fix release from 0.16.1 and includes a large number of
bug fixes along some new features (pipe() method), enhancements, and performance improvements.
We recommend that all users upgrade to this version.
Highlights include:
What’s new in v0.16.2
New features#
Pipe#
We’ve introduced a new method DataFrame.pipe(). As suggested by the name, pipe
should be used to pipe data through a chain of function calls.
The goal is to avoid confusing nested function calls like
# df is a DataFrame
# f, g, and h are functions that take and return DataFrames
f(g(h(df), arg1=1), arg2=2, arg3=3)  # noqa F821
The logic flows from inside out, and function names are separated from their keyword arguments. This can be rewritten as
(
    df.pipe(h)  # noqa F821
    .pipe(g, arg1=1)  # noqa F821
    .pipe(f, arg2=2, arg3=3)  # noqa F821
)
Now both the code and the logic flow from top to bottom. Keyword arguments are next to their functions. Overall the code is much more readable.
In the example above, the functions f, g, and h each expected the DataFrame as the first positional argument.
When the function you wish to apply takes its data anywhere other than the first argument, pass a tuple
of (function, keyword) indicating where the DataFrame should flow. For example:
In [1]: import statsmodels.formula.api as sm
In [2]: bb = pd.read_csv("data/baseball.csv", index_col="id")
# sm.ols takes (formula, data)
In [3]: (
...:     bb.query("h > 0")
...:     .assign(ln_h=lambda df: np.log(df.h))
...:     .pipe((sm.ols, "data"), "hr ~ ln_h + year + g + C(lg)")
...:     .fit()
...:     .summary()
...: )
...:
Out[3]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                            OLS Regression Results
==============================================================================
Dep. Variable:                     hr   R-squared:                       0.685
Model:                            OLS   Adj. R-squared:                  0.665
Method:                 Least Squares   F-statistic:                     34.28
Date:                Tue, 22 Nov 2022   Prob (F-statistic):           3.48e-15
Time:                        05:35:23   Log-Likelihood:                -205.92
No. Observations:                  68   AIC:                             421.8
Df Residuals:                      63   BIC:                             432.9
Df Model:                           4
Covariance Type:            nonrobust
===============================================================================
                coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept   -8484.7720   4664.146     -1.819      0.074   -1.78e+04     835.780
C(lg)[T.NL]    -2.2736      1.325     -1.716      0.091      -4.922       0.375
ln_h           -1.3542      0.875     -1.547      0.127      -3.103       0.395
year            4.2277      2.324      1.819      0.074      -0.417       8.872
g               0.1841      0.029      6.258      0.000       0.125       0.243
==============================================================================
Omnibus:                       10.875   Durbin-Watson:                   1.999
Prob(Omnibus):                  0.004   Jarque-Bera (JB):               17.298
Skew:                           0.537   Prob(JB):                     0.000175
Kurtosis:                       5.225   Cond. No.                     1.49e+07
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.49e+07. This might indicate that there are
strong multicollinearity or other numerical problems.
"""
The pipe method is inspired by unix pipes, which stream text through
processes. More recently dplyr and magrittr have introduced the
popular (%>%) pipe operator for R.
See the documentation for more. (GH 10129)
Other enhancements#
- Added - rsplitto Index/Series StringMethods (GH 10303)
- Removed the hard-coded size limits on the - DataFrameHTML representation in the IPython notebook, and leave this to IPython itself (only for IPython v3.0 or greater). This eliminates the duplicate scroll bars that appeared in the notebook with large frames (GH 10231).- Note that the notebook has a - toggle output scrollingfeature to limit the display of very large frames (by clicking left of the output). You can also configure the way DataFrames are displayed using the pandas options, see here here.
- axisparameter of- DataFrame.quantilenow accepts also- indexand- column. (GH 9543)
API changes#
- Holidaynow raises- NotImplementedErrorif both- offsetand- observanceare used in the constructor instead of returning an incorrect result (GH 10217).
Performance improvements#
Bug fixes#
- Bug in - Series.histraises an error when a one row- Serieswas given (GH 10214)
- Bug where - HDFStore.selectmodifies the passed columns list (GH 7212)
- Bug in - Categoricalrepr with- display.widthof- Nonein Python 3 (GH 10087)
- Bug in - to_jsonwith certain orients and a- CategoricalIndexwould segfault (GH 10317)
- Bug where some of the nan functions do not have consistent return dtypes (GH 10251) 
- Bug in - DataFrame.quantileon checking that a valid axis was passed (GH 9543)
- Bug in - groupby.applyaggregation for- Categoricalnot preserving categories (GH 10138)
- Bug in - to_csvwhere- date_formatis ignored if the- datetimeis fractional (GH 10209)
- Bug in - DataFrame.to_jsonwith mixed data types (GH 10289)
- Bug in cache updating when consolidating (GH 10264) 
- Bug in - mean()where integer dtypes can overflow (GH 10172)
- Bug where - Panel.from_dictdoes not set dtype when specified (GH 10058)
- Bug in - Index.unionraises- AttributeErrorwhen passing array-likes. (GH 10149)
- Bug in - Timestamp’s’- microsecond,- quarter,- dayofyear,- weekand- daysinmonthproperties return- np.inttype, not built-in- int. (GH 10050)
- Bug in - NaTraises- AttributeErrorwhen accessing to- daysinmonth,- dayofweekproperties. (GH 10096)
- Bug in Index repr when using the - max_seq_items=Nonesetting (GH 10182).
- Bug in getting timezone data with - dateutilon various platforms ( GH 9059, GH 8639, GH 9663, GH 10121)
- Bug in displaying datetimes with mixed frequencies; display ‘ms’ datetimes to the proper precision. (GH 10170) 
- Bug in - setitemwhere type promotion is applied to the entire block (GH 10280)
- Bug in - Seriesarithmetic methods may incorrectly hold names (GH 10068)
- Bug in - GroupBy.get_groupwhen grouping on multiple keys, one of which is categorical. (GH 10132)
- Bug in - DatetimeIndexand- TimedeltaIndexnames are lost after timedelta arithmetic ( GH 9926)
- Bug in - DataFrameconstruction from nested- dictwith- datetime64(GH 10160)
- Bug in - Seriesconstruction from- dictwith- datetime64keys (GH 9456)
- Bug in - Series.plot(label="LABEL")not correctly setting the label (GH 10119)
- Bug in - plotnot defaulting to matplotlib- axes.gridsetting (GH 9792)
- Bug causing strings containing an exponent, but no decimal to be parsed as - intinstead of- floatin- engine='python'for the- read_csvparser (GH 9565)
- Bug in - Series.alignresets- namewhen- fill_valueis specified (GH 10067)
- Bug in - read_csvcausing index name not to be set on an empty DataFrame (GH 10184)
- Bug in - SparseSeries.absresets- name(GH 10241)
- Bug in - TimedeltaIndexslicing may reset freq (GH 10292)
- Bug in - GroupBy.get_groupraises- ValueErrorwhen group key contains- NaT(GH 6992)
- Bug in - SparseSeriesconstructor ignores input data name (GH 10258)
- Bug in - Categorical.remove_categoriescausing a- ValueErrorwhen removing the- NaNcategory if underlying dtype is floating-point (GH 10156)
- Bug where infer_freq infers time rule (WOM-5XXX) unsupported by to_offset (GH 9425) 
- Bug in - DataFrame.to_hdf()where table format would raise a seemingly unrelated error for invalid (non-string) column names. This is now explicitly forbidden. (GH 9057)
- Bug to handle masking empty - DataFrame(GH 10126).
- Bug where MySQL interface could not handle numeric table/column names (GH 10255) 
- Bug in - read_csvwith a- date_parserthat returned a- datetime64array of other time resolution than- [ns](GH 10245)
- Bug in - Panel.applywhen the result has ndim=0 (GH 10332)
- Bug in - read_hdfwhere- auto_closecould not be passed (GH 9327).
- Bug in - read_hdfwhere open stores could not be used (GH 10330).
- Bug in adding empty - DataFrames, now results in a- DataFramethat- .equalsan empty- DataFrame(GH 10181).
- Bug in - to_hdfand- HDFStorewhich did not check that complib choices were valid (GH 4582, GH 8874).
Contributors#
A total of 34 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Andrew Rosenfeld 
- Artemy Kolchinsky 
- Bernard Willers + 
- Christer van der Meeren 
- Christian Hudon + 
- Constantine Glen Evans + 
- Daniel Julius Lasiman + 
- Evan Wright 
- Francesco Brundu + 
- Gaëtan de Menten + 
- Jake VanderPlas 
- James Hiebert + 
- Jeff Reback 
- Joris Van den Bossche 
- Justin Lecher + 
- Ka Wo Chen + 
- Kevin Sheppard 
- Mortada Mehyar 
- Morton Fox + 
- Robin Wilson + 
- Sinhrks 
- Stephan Hoyer 
- Thomas Grainger 
- Tom Ajamian 
- Tom Augspurger 
- Yoshiki Vázquez Baeza 
- Younggun Kim 
- austinc + 
- behzad nouri 
- jreback 
- lexual 
- rekcahpassyla + 
- scls19fr 
- sinhrks