v0.15.0 (October 18, 2014)¶
This is a major release from 0.14.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Warning
pandas >= 0.15.0 will no longer support compatibility with NumPy versions < 1.7.0. If you want to use the latest versions of pandas, please upgrade to NumPy >= 1.7.0 (GH7711)
- Highlights include:
- The
Categorical
type was integrated as a first-class pandas type, see here - New scalar type
Timedelta
, and a new index typeTimedeltaIndex
, see here - New datetimelike properties accessor
.dt
for Series, see Datetimelike Properties - New DataFrame default display for
df.info()
to include memory usage, see Memory Usage read_csv
will now by default ignore blank lines when parsing, see here- API change in using Indexes in set operations, see here
- Enhancements in the handling of timezones, see here
- A lot of improvements to the rolling and expanding moment functions, see here
- Internal refactoring of the
Index
class to no longer sub-classndarray
, see Internal Refactoring - dropping support for
PyTables
less than version 3.0.0, andnumexpr
less than version 2.1 (GH7990) - Split indexing documentation into Indexing and Selecting Data and MultiIndex / Advanced Indexing
- Split out string methods documentation into Working with Text Data
- The
- Check the API Changes and deprecations before updating
- Other Enhancements
- Performance Improvements
- Bug Fixes
Warning
In 0.15.0 Index
has internally been refactored to no longer sub-class ndarray
but instead subclass PandasObject
, similarly to the rest of the pandas objects. This change allows very easy sub-classing and creation of new index types. This should be
a transparent change with only very limited API implications (See the Internal Refactoring)
Warning
The refactoring in Categorical
changed the two argument constructor from
“codes/labels and levels” to “values and levels (now called ‘categories’)”. This can lead to subtle bugs. If you use
Categorical
directly, please audit your code before updating to this pandas
version and change it to use the from_codes()
constructor. See more on Categorical
here
New features¶
Categoricals in Series/DataFrame¶
Categorical
can now be included in Series and DataFrames and gained new
methods to manipulate. Thanks to Jan Schulz for much of this API/implementation. (GH3943, GH5313, GH5314,
GH7444, GH7839, GH7848, GH7864, GH7914, GH7768, GH8006, GH3678,
GH8075, GH8076, GH8143, GH8453, GH8518).
For full docs, see the categorical introduction and the API documentation.
In [1]: df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
...: "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']})
...:
In [2]: df["grade"] = df["raw_grade"].astype("category")
In [3]: df["grade"]
Out[3]:
0 a
1 b
2 b
3 a
4 a
5 e
Name: grade, Length: 6, dtype: category
Categories (3, object): [a, b, e]
# Rename the categories
In [4]: df["grade"].cat.categories = ["very good", "good", "very bad"]
# Reorder the categories and simultaneously add the missing categories
In [5]: df["grade"] = df["grade"].cat.set_categories(["very bad", "bad",
...: "medium", "good", "very good"])
...:
In [6]: df["grade"]
Out[6]:
0 very good
1 good
2 good
3 very good
4 very good
5 very bad
Name: grade, Length: 6, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]
In [7]: df.sort_values("grade")