v0.8.0 (June 29, 2012)¶
This is a major release from 0.7.3 and includes extensive work on the time series handling and processing infrastructure as well as a great deal of new functionality throughout the library. It includes over 700 commits from more than 20 distinct authors. Most pandas 0.7.3 and earlier users should not experience any issues upgrading, but due to the migration to the NumPy datetime64 dtype, there may be a number of bugs and incompatibilities lurking. Lingering incompatibilities will be fixed ASAP in a 0.8.1 release if necessary. See the full release notes or issue tracker on GitHub for a complete list.
Support for non-unique indexes¶
All objects can now work with non-unique indexes. Data alignment / join operations work according to SQL join semantics (including, if application, index duplication in many-to-many joins)
NumPy datetime64 dtype and 1.6 dependency¶
Time series data are now represented using NumPy’s datetime64 dtype; thus, pandas 0.8.0 now requires at least NumPy 1.6. It has been tested and verified to work with the development version (1.7+) of NumPy as well which includes some significant user-facing API changes. NumPy 1.6 also has a number of bugs having to do with nanosecond resolution data, so I recommend that you steer clear of NumPy 1.6’s datetime64 API functions (though limited as they are) and only interact with this data using the interface that pandas provides.
See the end of the 0.8.0 section for a “porting” guide listing potential issues for users migrating legacy code bases from pandas 0.7 or earlier to 0.8.0.
Bug fixes to the 0.7.x series for legacy NumPy < 1.6 users will be provided as they arise. There will be no more further development in 0.7.x beyond bug fixes.
Time series changes and improvements¶
Note
With this release, legacy scikits.timeseries users should be able to port their code to use pandas.
Note
See documentation for overview of pandas timeseries API.
- New datetime64 representation speeds up join operations and data alignment, reduces memory usage, and improve serialization / deserialization performance significantly over datetime.datetime
- High performance and flexible resample method for converting from high-to-low and low-to-high frequency. Supports interpolation, user-defined aggregation functions, and control over how the intervals and result labeling are defined. A suite of high performance Cython/C-based resampling functions (including Open-High-Low-Close) have also been implemented.
- Revamp of frequency aliases and support for frequency shortcuts like ‘15min’, or ‘1h30min’
- New DatetimeIndex class supports both fixed frequency and irregular time series. Replaces now deprecated DateRange class
- New
PeriodIndex
andPeriod
classes for representing time spans and performing calendar logic, including the 12 fiscal quarterly frequencies <timeseries.quarterly>. This is a partial port of, and a substantial enhancement to, elements of the scikits.timeseries code base. Support for conversion between PeriodIndex and DatetimeIndex - New Timestamp data type subclasses datetime.datetime, providing the same interface while enabling working with nanosecond-resolution data. Also provides easy time zone conversions.
- Enhanced support for time zones. Add
tz_convert and
tz_localize
methods to TimeSeries and DataFrame. All timestamps are stored as UTC; Timestamps from DatetimeIndex objects with time zone set will be localized to local time. Time zone conversions are therefore essentially free. User needs to know very little about pytz library now; only time zone names as as strings are required. Time zone-aware timestamps are equal if and only if their UTC timestamps match. Operations between time zone-aware time series with different time zones will result in a UTC-indexed time series. - Time series string indexing conveniences / shortcuts: slice years, year and month, and index values with strings
- Enhanced time series plotting; adaptation of scikits.timeseries matplotlib-based plotting code
- New
date_range
,bdate_range
, andperiod_range
factory functions - Robust frequency inference function infer_freq and
inferred_freq
property of DatetimeIndex, with option to infer frequency on construction of DatetimeIndex - to_datetime function efficiently parses array of strings to DatetimeIndex. DatetimeIndex will parse array or list of strings to datetime64
- Optimized support for datetime64-dtype data in Series and DataFrame columns
- New NaT (Not-a-Time) type to represent NA in timestamp arrays
- Optimize Series.asof for looking up “as of” values for arrays of timestamps
- Milli, Micro, Nano date offset objects
- Can index time series with datetime.time objects to select all data at
particular time of day (
TimeSeries.at_time
) or between two times (TimeSeries.between_time
) - Add tshift method for leading/lagging using the frequency (if any) of the index, as opposed to a naive lead/lag using shift
Other new features¶
- New cut and
qcut
functions (like R’s cut function) for computing a categorical variable from a continuous variable by binning values either into value-based (cut
) or quantile-based (qcut
) bins - Rename
Factor
toCategorical
and add a number of usability features - Add limit argument to fillna/reindex
- More flexible multiple function application in GroupBy, and can pass list (name, function) tuples to get result in particular order with given names
- Add flexible replace method for efficiently substituting values
- Enhanced read_csv/read_table for reading time series data and converting multiple columns to dates
- Add comments option to parser functions: read_csv, etc.
- Add dayfirst option to parser functions for parsing international DD/MM/YYYY dates
- Allow the user to specify the CSV reader dialect to control quoting etc.
- Handling thousands separators in read_csv to improve integer parsing.
- Enable unstacking of multiple levels in one shot. Alleviate
pivot_table
bugs (empty columns being introduced) - Move to klib-based hash tables for indexing; better performance and less memory usage than Python’s dict
- Add first, last, min, max, and prod optimized GroupBy functions
- New ordered_merge function
- Add flexible comparison instance methods eq, ne, lt, gt, etc. to DataFrame, Series
- Improve scatter_matrix plotting function and add histogram or kernel density estimates to diagonal
- Add ‘kde’ plot option for density plots
- Support for converting DataFrame to R data.frame through rpy2
- Improved support for complex numbers in Series and DataFrame
- Add pct_change method to all data structures
- Add max_colwidth configuration option for DataFrame console output
- Interpolate Series values using index values
- Can select multiple columns from GroupBy
- Add update methods to Series/DataFrame for updating values in place
- Add
any
andall
method to DataFrame
New plotting methods¶
Series.plot
now supports a secondary_y
option:
In [1]: plt.figure()
Out[1]: <Figure size 640x480 with 0 Axes>
In [2]: fx['FR'].plot(style='g')