# Version 0.9.1 (November 14, 2012)¶

This is a bug fix release from 0.9.0 and includes several new features and enhancements along with a large number of bug fixes. The new features include by-column sort order for DataFrame and Series, improved NA handling for the rank method, masking functions for DataFrame, and intraday time-series filtering for DataFrame.

## New features¶

`Series.sort`

,`DataFrame.sort`

, and`DataFrame.sort_index`

can now be specified in a per-column manner to support multiple sort orders (GH928)In [2]: df = pd.DataFrame(np.random.randint(0, 2, (6, 3)), ...: columns=['A', 'B', 'C']) In [3]: df.sort(['A', 'B'], ascending=[1, 0]) Out[3]: A B C 3 0 1 1 4 0 1 1 2 0 0 1 0 1 0 0 1 1 0 0 5 1 0 0

`DataFrame.rank`

now supports additional argument values for the`na_option`

parameter so missing values can be assigned either the largest or the smallest rank (GH1508, GH2159)In [1]: df = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C']) In [2]: df.loc[2:4] = np.nan In [3]: df.rank() Out[3]: A B C 0 3.0 2.0 1.0 1 1.0 3.0 2.0 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 2.0 1.0 3.0 In [4]: df.rank(na_option='top') Out[4]: A B C 0 6.0 5.0 4.0 1 4.0 6.0 5.0 2 2.0 2.0 2.0 3 2.0 2.0 2.0 4 2.0 2.0 2.0 5 5.0 4.0 6.0 In [5]: df.rank(na_option='bottom') Out[5]: A B C 0 3.0 2.0 1.0 1 1.0 3.0 2.0 2 5.0 5.0 5.0 3 5.0 5.0 5.0 4 5.0 5.0 5.0 5 2.0 1.0 3.0DataFrame has new

`where`

and`mask`

methods to select values according to a given boolean mask (GH2109, GH2151)DataFrame currently supports slicing via a boolean vector the same length as the DataFrame (inside the

`[]`

). The returned DataFrame has the same number of columns as the original, but is sliced on its index.In [6]: df = DataFrame(np.random.randn(5, 3), columns = ['A','B','C']) --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-6-b548a30a7bce> in <module> ----> 1 df = DataFrame(np.random.randn(5, 3), columns = ['A','B','C']) NameError: name 'DataFrame' is not defined In [7]: df Out[7]: A B C 0 0.469112 -0.282863 -1.509059 1 -1.135632 1.212112 -0.173215 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 0.271860 -0.424972 0.567020 In [8]: df[df['A'] > 0] Out[8]: A B C 0 0.469112 -0.282863 -1.509059 5 0.271860 -0.424972 0.567020If a DataFrame is sliced with a DataFrame based boolean condition (with the same size as the original DataFrame), then a DataFrame the same size (index and columns) as the original is returned, with elements that do not meet the boolean condition as

`NaN`

. This is accomplished via the new method`DataFrame.where`

. In addition,`where`

takes an optional`other`

argument for replacement.In [9]: df[df>0] Out[9]: A B C 0 0.469112 NaN NaN 1 NaN 1.212112 NaN 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 0.271860 NaN 0.56702 In [10]: df.where(df>0) Out[10]: A B C 0 0.469112 NaN NaN 1 NaN 1.212112 NaN 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 0.271860 NaN 0.56702 In [11]: df.where(df>0,-df) Out[11]: A B C 0 0.469112 0.282863 1.509059 1 1.135632 1.212112 0.173215 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 0.271860 0.424972 0.567020Furthermore,

`where`

now aligns the input boolean condition (ndarray or DataFrame), such that partial selection with setting is possible. This is analogous to partial setting via`.ix`

(but on the contents rather than the axis labels)In [12]: df2 = df.copy() In [13]: df2[ df2[1:4] > 0 ] = 3 In [14]: df2 Out[14]: A B C 0 0.469112 -0.282863 -1.509059 1 -1.135632 3.000000 -0.173215 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 0.271860 -0.424972 0.567020

`DataFrame.mask`

is the inverse boolean operation of`where`

.In [15]: df.mask(df<=0) Out[15]: A B C 0 0.469112 NaN NaN 1 NaN 1.212112 NaN 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 0.271860 NaN 0.56702Enable referencing of Excel columns by their column names (GH1936)

In [16]: xl = pd.ExcelFile('data/test.xls') In [17]: xl.parse('Sheet1', index_col=0, parse_dates=True, ....: parse_cols='A:D') ....: Out[17]: A B C D 2000-01-03 0.980269 3.685731 -0.364217 -1.159738 2000-01-04 1.047916 -0.041232 -0.161812 0.212549 2000-01-05 0.498581 0.731168 -0.537677 1.346270 2000-01-06 1.120202 1.567621 0.003641 0.675253 2000-01-07 -0.487094 0.571455 -1.611639 0.103469 2000-01-10 0.836649 0.246462 0.588543 1.062782 2000-01-11 -0.157161 1.340307 1.195778 -1.097007Added option to disable pandas-style tick locators and formatters using

`series.plot(x_compat=True)`

or`pandas.plot_params['x_compat'] = True`

(GH2205)Existing TimeSeries methods

`at_time`

and`between_time`

were added to DataFrame (GH2149)DataFrame.dot can now accept ndarrays (GH2042)

DataFrame.drop now supports non-unique indexes (GH2101)

Panel.shift now supports negative periods (GH2164)

DataFrame now support unary ~ operator (GH2110)

## API changes¶

Upsampling data with a PeriodIndex will result in a higher frequency TimeSeries that spans the original time window

In [1]: prng = pd.period_range('2012Q1', periods=2, freq='Q') In [2]: s = pd.Series(np.random.randn(len(prng)), prng) In [4]: s.resample('M') Out[4]: 2012-01 -1.471992 2012-02 NaN 2012-03 NaN 2012-04 -0.493593 2012-05 NaN 2012-06 NaN Freq: M, dtype: float64Period.end_time now returns the last nanosecond in the time interval (GH2124, GH2125, GH1764)

In [18]: p = pd.Period('2012') In [19]: p.end_time Out[19]: Timestamp('2012-12-31 23:59:59.999999999')File parsers no longer coerce to float or bool for columns that have custom converters specified (GH2184)

In [20]: import io In [21]: data = ('A,B,C\n' ....: '00001,001,5\n' ....: '00002,002,6') ....: In [22]: pd.read_csv(io.StringIO(data), converters={'A': lambda x: x.strip()}) Out[22]: A B C 0 00001 1 5 1 00002 2 6

See the full release notes or issue tracker on GitHub for a complete list.

## Contributors¶

A total of 11 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

Brenda Moon +

Chang She

Jeff Reback +

Justin C Johnson +

K.-Michael Aye

Martin Blais

Tobias Brandt +

Wes McKinney

Wouter Overmeire

timmie

y-p