v0.18.0 (March 13, 2016)¶
This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Warning
pandas >= 0.18.0 no longer supports compatibility with Python version 2.6 and 3.3 (GH7718, GH11273)
Warning
numexpr
version 2.4.4 will now show a warning and not be used as a computation back-end for pandas because of some buggy behavior. This does not affect other versions (>= 2.1 and >= 2.4.6). (GH12489)
Highlights include:
- Moving and expanding window functions are now methods on Series and DataFrame,
similar to
.groupby
, see here. - Adding support for a
RangeIndex
as a specialized form of theInt64Index
for memory savings, see here. - API breaking change to the
.resample
method to make it more.groupby
like, see here. - Removal of support for positional indexing with floats, which was deprecated
since 0.14.0. This will now raise a
TypeError
, see here. - The
.to_xarray()
function has been added for compatibility with the xarray package, see here. - The
read_sas
function has been enhanced to readsas7bdat
files, see here. - Addition of the .str.extractall() method, and API changes to the .str.extract() method and .str.cat() method.
pd.test()
top-level nose test runner is available (GH4327).
Check the API Changes and deprecations before updating.
What’s new in v0.18.0
- New features
- Window functions are now methods
- Changes to rename
- Range Index
- Changes to str.extract
- Addition of str.extractall
- Changes to str.cat
- Datetimelike rounding
- Formatting of Integers in FloatIndex
- Changes to dtype assignment behaviors
- to_xarray
- Latex Representation
pd.read_sas()
changes- Other enhancements
- Backwards incompatible API changes
- Performance Improvements
- Bug Fixes
- Contributors
New features¶
Window functions are now methods¶
Window functions have been refactored to be methods on Series/DataFrame
objects, rather than top-level functions, which are now deprecated. This allows these window-type functions, to have a similar API to that of .groupby
. See the full documentation here (GH11603, GH12373)
In [1]: np.random.seed(1234)
In [2]: df = pd.DataFrame({'A': range(10), 'B': np.random.randn(10)})
In [3]: df
Out[3]:
A B
0 0 0.471435
1 1 -1.190976
2 2 1.432707
3 3 -0.312652
4 4 -0.720589
5 5 0.887163
6 6 0.859588
7 7 -0.636524
8 8 0.015696
9 9 -2.242685
[10 rows x 2 columns]
Previous Behavior:
In [8]: pd.rolling_mean(df, window=3)
FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(window=3,center=False).mean()
Out[8]:
A B
0 NaN NaN
1 NaN NaN
2 1 0.237722
3 2 -0.023640
4 3 0.133155
5 4 -0.048693
6 5 0.342054
7 6 0.370076
8 7 0.079587
9 8 -0.954504
New Behavior:
In [4]: r = df.rolling(window=3)
These show a descriptive repr
In [5]: r
Out[5]: Rolling [window=3,center=False,axis=0]
with tab-completion of available methods and properties.
In [9]: r.<TAB> # noqa E225, E999
r.A r.agg r.apply r.count r.exclusions r.max r.median r.name r.skew r.sum
r.B r.aggregate r.corr r.cov r.kurt r.mean r.min r.quantile r.std r.var
The methods operate on the Rolling
object itself
In [6]: r.mean()
Out[6]:
A B
0 NaN NaN
1 NaN NaN
2 1.0 0.237722
3 2.0 -0.023640
4 3.0 0.133155
5 4.0 -0.048693
6 5.0 0.342054
7 6.0 0.370076
8 7.0 0.079587
9 8.0 -0.954504
[10 rows x 2 columns]
They provide getitem accessors
In [7]: r['A'].mean()
Out[7]:
0 NaN
1 NaN
2 1.0
3 2.0
4 3.0
5 4.0
6 5.0
7 6.0
8 7.0
9 8.0
Name: A, Length: 10, dtype: float64
And multiple aggregations
In [8]: r.agg({'A': ['mean', 'std'],
...: 'B': ['mean', 'std']})
...:
Out[8]:
A B
mean std mean std
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 1.0 1.0 0.237722 1.327364
3 2.0 1.0 -0.023640 1.335505
4 3.0 1.0 0.133155 1.143778
5 4.0 1.0 -0.048693 0.835747
6 5.0 1.0 0.342054 0.920379
7 6.0 1.0 0.370076 0.871850
8 7.0 1.0 0.079587 0.750099
9 8.0 1.0 -0.954504 1.162285
[10 rows x 4 columns]
Changes to rename¶
Series.rename
and NDFrame.rename_axis
can now take a scalar or list-like
argument for altering the Series or axis name, in addition to their old behaviors of altering labels. (GH9494, GH11965)
In [9]: s = pd.Series(np.random.randn(5))
In [10]: s.rename('newname')
Out[10]:
0 1.150036
1 0.991946
2 0.953324
3 -2.021255
4 -0.334077
Name: newname, Length: 5, dtype: float64
In [11]: df = pd.DataFrame(np.random.randn(5, 2))
In [12]: (df.rename_axis("indexname")
....: .rename_axis("columns_name", axis="columns"))
....:
Out[12]:
columns_name 0 1
indexname
0 0.002118 0.405453
1 0.289092 1.321158
2 -1.546906 -0.202646
3 -0.655969 0.193421
4 0.553439 1.318152
[5 rows x 2 columns]
The new functionality works well in method chains. Previously these methods only accepted functions or dicts mapping a label to a new label. This continues to work as before for function or dict-like values.
Range Index¶
A RangeIndex
has been added to the Int64Index
sub-classes to support a memory saving alternative for common use cases. This has a similar implementation to the python range
object (xrange
in python 2), in that it only stores the start, stop, and step values for the index. It will transparently interact with the user API, converting to Int64Index
if needed.
This will now be the default constructed index for NDFrame
objects, rather than previous an Int64Index
. (GH939, GH12070, GH12071, GH12109, GH12888)
Previous Behavior:
In [3]: s = pd.Series(range(1000))
In [4]: s.index
Out[4]:
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
990, 991, 992, 993, 994, 995, 996, 997, 998, 999], dtype='int64', length=1000)
In [6]: s.index.nbytes
Out[6]: 8000
New Behavior:
In [13]: s = pd.Series(range(1000))
In [14]: s.index
Out[14]: RangeIndex(start=0, stop=1000, step=1)
In [15]: s.index.nbytes