# Version 0.15.2 (December 12, 2014)#

This is a minor release from 0.15.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs. We recommend that all users upgrade to this version.

## API changes#

Indexing in

`MultiIndex`

beyond lex-sort depth is now supported, though a lexically sorted index will have a better performance. (GH 2646)In [1]: df = pd.DataFrame({'jim':[0, 0, 1, 1], ...: 'joe':['x', 'x', 'z', 'y'], ...: 'jolie':np.random.rand(4)}).set_index(['jim', 'joe']) ...: In [2]: df Out[2]: jolie jim joe 0 x 0.126970 x 0.966718 1 z 0.260476 y 0.897237 [4 rows x 1 columns] In [3]: df.index.lexsort_depth Out[3]: 1 # in prior versions this would raise a KeyError # will now show a PerformanceWarning In [4]: df.loc[(1, 'z')] Out[4]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns] # lexically sorting In [5]: df2 = df.sort_index() In [6]: df2 Out[6]: jolie jim joe 0 x 0.126970 x 0.966718 1 y 0.897237 z 0.260476 [4 rows x 1 columns] In [7]: df2.index.lexsort_depth Out[7]: 2 In [8]: df2.loc[(1,'z')] Out[8]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns]

Bug in unique of Series with

`category`

dtype, which returned all categories regardless whether they were “used” or not (see GH 8559 for the discussion). Previous behaviour was to return all categories:In [3]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [4]: cat Out[4]: [a, b, a] Categories (3, object): [a < b < c] In [5]: cat.unique() Out[5]: array(['a', 'b', 'c'], dtype=object)

Now, only the categories that do effectively occur in the array are returned:

In [1]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [2]: cat.unique() Out[2]: ['a', 'b'] Categories (3, object): ['a', 'b', 'c']

`Series.all`

and`Series.any`

now support the`level`

and`skipna`

parameters.`Series.all`

,`Series.any`

,`Index.all`

, and`Index.any`

no longer support the`out`

and`keepdims`

parameters, which existed for compatibility with ndarray. Various index types no longer support the`all`

and`any`

aggregation functions and will now raise`TypeError`

. (GH 8302).Allow equality comparisons of Series with a categorical dtype and object dtype; previously these would raise

`TypeError`

(GH 8938)Bug in

`NDFrame`

: conflicting attribute/column names now behave consistently between getting and setting. Previously, when both a column and attribute named`y`

existed,`data.y`

would return the attribute, while`data.y = z`

would update the column (GH 8994)In [3]: data = pd.DataFrame({'x': [1, 2, 3]}) In [4]: data.y = 2 In [5]: data['y'] = [2, 4, 6] In [6]: data Out[6]: x y 0 1 2 1 2 4 2 3 6 [3 rows x 2 columns] # this assignment was inconsistent In [7]: data.y = 5

Old behavior:

In [6]: data.y Out[6]: 2 In [7]: data['y'].values Out[7]: array([5, 5, 5])

New behavior:

In [8]: data.y Out[8]: 5 In [9]: data['y'].values Out[9]: array([2, 4, 6])

`Timestamp('now')`

is now equivalent to`Timestamp.now()`

in that it returns the local time rather than UTC. Also,`Timestamp('today')`

is now equivalent to`Timestamp.today()`

and both have`tz`

as a possible argument. (GH 9000)Fix negative step support for label-based slices (GH 8753)

Old behavior:

In [1]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) Out[1]: a 0 b 1 c 2 dtype: int64 In [2]: s.loc['c':'a':-1] Out[2]: c 2 dtype: int64

New behavior:

In [10]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) In [11]: s.loc['c':'a':-1] Out[11]: c 2 b 1 a 0 Length: 3, dtype: int64

## Enhancements#

`Categorical`

enhancements:

Added ability to export Categorical data to Stata (GH 8633). See here for limitations of categorical variables exported to Stata data files.

Added flag

`order_categoricals`

to`StataReader`

and`read_stata`

to select whether to order imported categorical data (GH 8836). See here for more information on importing categorical variables from Stata data files.Added ability to export Categorical data to/from HDF5 (GH 7621). Queries work the same as if it was an object array. However, the

`category`

dtyped data is stored in a more efficient manner. See here for an example and caveats w.r.t. prior versions of pandas.Added support for

`searchsorted()`

on`Categorical`

class (GH 8420).

Other enhancements:

Added the ability to specify the SQL type of columns when writing a DataFrame to a database (GH 8778). For example, specifying to use the sqlalchemy

`String`

type instead of the default`Text`

type for string columns:from sqlalchemy.types import String data.to_sql('data_dtype', engine, dtype={'Col_1': String}) # noqa F821

`Series.all`

and`Series.any`

now support the`level`

and`skipna`

parameters (GH 8302):>>> s = pd.Series([False, True, False], index=[0, 0, 1]) >>> s.any(level=0) 0 True 1 False dtype: bool

`Panel`

now supports the`all`

and`any`

aggregation functions. (GH 8302):>>> p = pd.Panel(np.random.rand(2, 5, 4) > 0.1) >>> p.all() 0 1 2 3 0 True True True True 1 True False True True 2 True True True True 3 False True False True 4 True True True True

Added support for

`utcfromtimestamp()`

,`fromtimestamp()`

, and`combine()`

on`Timestamp`

class (GH 5351).Added Google Analytics (pandas.io.ga) basic documentation (GH 8835). See here.

`Timedelta`

arithmetic returns`NotImplemented`

in unknown cases, allowing extensions by custom classes (GH 8813).`Timedelta`

now supports arithmetic with`numpy.ndarray`

objects of the appropriate dtype (numpy 1.8 or newer only) (GH 8884).Added

`Timedelta.to_timedelta64()`

method to the public API (GH 8884).Added

`gbq.generate_bq_schema()`

function to the gbq module (GH 8325).`Series`

now works with map objects the same way as generators (GH 8909).Added context manager to

`HDFStore`

for automatic closing (GH 8791).`to_datetime`

gains an`exact`

keyword to allow for a format to not require an exact match for a provided format string (if its`False`

).`exact`

defaults to`True`

(meaning that exact matching is still the default) (GH 8904)Added

`axvlines`

boolean option to parallel_coordinates plot function, determines whether vertical lines will be printed, default is TrueAdded ability to read table footers to read_html (GH 8552)

`to_sql`

now infers data types of non-NA values for columns that contain NA values and have dtype`object`

(GH 8778).

## Performance#

## Bug fixes#

Bug in concat of Series with

`category`

dtype which were coercing to`object`

. (GH 8641)Bug in Timestamp-Timestamp not returning a Timedelta type and datelike-datelike ops with timezones (GH 8865)

Made consistent a timezone mismatch exception (either tz operated with None or incompatible timezone), will now return

`TypeError`

rather than`ValueError`

(a couple of edge cases only), (GH 8865)Bug in using a

`pd.Grouper(key=...)`

with no level/axis or level only (GH 8795, GH 8866)Report a

`TypeError`

when invalid/no parameters are passed in a groupby (GH 8015)Bug in packaging pandas with

`py2app/cx_Freeze`

(GH 8602, GH 8831)Bug in

`groupby`

signatures that didn’t include *args or **kwargs (GH 8733).`io.data.Options`

now raises`RemoteDataError`

when no expiry dates are available from Yahoo and when it receives no data from Yahoo (GH 8761), (GH 8783).Unclear error message in csv parsing when passing dtype and names and the parsed data is a different data type (GH 8833)

Bug in slicing a MultiIndex with an empty list and at least one boolean indexer (GH 8781)

`io.data.Options`

now raises`RemoteDataError`

when no expiry dates are available from Yahoo (GH 8761).`Timedelta`

kwargs may now be numpy ints and floats (GH 8757).Fixed several outstanding bugs for

`Timedelta`

arithmetic and comparisons (GH 8813, GH 5963, GH 5436).`sql_schema`

now generates dialect appropriate`CREATE TABLE`

statements (GH 8697)`slice`

string method now takes step into account (GH 8754)Bug in

`BlockManager`

where setting values with different type would break block integrity (GH 8850)Bug in

`DatetimeIndex`

when using`time`

object as key (GH 8667)Bug in

`merge`

where`how='left'`

and`sort=False`

would not preserve left frame order (GH 7331)Bug in

`MultiIndex.reindex`

where reindexing at level would not reorder labels (GH 4088)Bug in certain operations with dateutil timezones, manifesting with dateutil 2.3 (GH 8639)

Regression in DatetimeIndex iteration with a Fixed/Local offset timezone (GH 8890)

Bug in

`to_datetime`

when parsing a nanoseconds using the`%f`

format (GH 8989)`io.data.Options`

now raises`RemoteDataError`

when no expiry dates are available from Yahoo and when it receives no data from Yahoo (GH 8761), (GH 8783).Fix: The font size was only set on x axis if vertical or the y axis if horizontal. (GH 8765)

Fixed division by 0 when reading big csv files in python 3 (GH 8621)

Bug in outputting a MultiIndex with

`to_html,index=False`

which would add an extra column (GH 8452)Imported categorical variables from Stata files retain the ordinal information in the underlying data (GH 8836).

Defined

`.size`

attribute across`NDFrame`

objects to provide compat with numpy >= 1.9.1; buggy with`np.array_split`

(GH 8846)Skip testing of histogram plots for matplotlib <= 1.2 (GH 8648).

Bug where

`get_data_google`

returned object dtypes (GH 3995)Bug in

`DataFrame.stack(..., dropna=False)`

when the DataFrame’s`columns`

is a`MultiIndex`

whose`labels`

do not reference all its`levels`

. (GH 8844)Bug in that Option context applied on

`__enter__`

(GH 8514)Bug in resample that causes a ValueError when resampling across multiple days and the last offset is not calculated from the start of the range (GH 8683)

Bug where

`DataFrame.plot(kind='scatter')`

fails when checking if an np.array is in the DataFrame (GH 8852)Bug in

`pd.infer_freq/DataFrame.inferred_freq`

that prevented proper sub-daily frequency inference when the index contained DST days (GH 8772).Bug where index name was still used when plotting a series with

`use_index=False`

(GH 8558).Bugs when trying to stack multiple columns, when some (or all) of the level names are numbers (GH 8584).

Bug in

`MultiIndex`

where`__contains__`

returns wrong result if index is not lexically sorted or unique (GH 7724)BUG CSV: fix problem with trailing white space in skipped rows, (GH 8679), (GH 8661), (GH 8983)

Regression in

`Timestamp`

does not parse ‘Z’ zone designator for UTC (GH 8771)Bug in

`StataWriter`

the produces writes strings with 244 characters irrespective of actual size (GH 8969)Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. (GH 8965)

Bug in DataReader returns object dtype if there are missing values (GH 8980)

Bug in plotting if sharex was enabled and index was a timeseries, would show labels on multiple axes (GH 3964).

Bug where passing a unit to the TimedeltaIndex constructor applied the to nano-second conversion twice. (GH 9011).

Bug in plotting of a period-like array (GH 9012)

## Contributors#

A total of 49 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

Aaron Staple

Angelos Evripiotis +

Artemy Kolchinsky

Benoit Pointet +

Brian Jacobowski +

Charalampos Papaloizou +

Chris Warth +

David Stephens

Fabio Zanini +

Francesc Via +

Henry Kleynhans +

Jake VanderPlas +

Jan Schulz

Jeff Reback

Jeff Tratner

Joris Van den Bossche

Kevin Sheppard

Matt Suggit +

Matthew Brett

Phillip Cloud

Rupert Thompson +

Scott E Lasley +

Stephan Hoyer

Stephen Simmons +

Sylvain Corlay +

Thomas Grainger +

Tiago Antao +

Tom Augspurger

Trent Hauck

Victor Chaves +

Victor Salgado +

Vikram Bhandoh +

WANG Aiyong

Will Holmgren +

behzad nouri

broessli +

charalampos papaloizou +

immerrr

jnmclarty

jreback

mgilbert +

onesandzeroes

peadarcoyle +

rockg

seth-p

sinhrks

unutbu

wavedatalab +

Åsmund Hjulstad +