Version 0.15.2 (December 12, 2014)#
This is a minor release from 0.15.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs. We recommend that all users upgrade to this version.
API changes#
- Indexing in - MultiIndexbeyond lex-sort depth is now supported, though a lexically sorted index will have a better performance. (GH 2646)- In [1]: df = pd.DataFrame({'jim':[0, 0, 1, 1], ...: 'joe':['x', 'x', 'z', 'y'], ...: 'jolie':np.random.rand(4)}).set_index(['jim', 'joe']) ...: In [2]: df Out[2]: jolie jim joe 0 x 0.126970 x 0.966718 1 z 0.260476 y 0.897237 [4 rows x 1 columns] In [3]: df.index.lexsort_depth Out[3]: 1 # in prior versions this would raise a KeyError # will now show a PerformanceWarning In [4]: df.loc[(1, 'z')] Out[4]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns] # lexically sorting In [5]: df2 = df.sort_index() In [6]: df2 Out[6]: jolie jim joe 0 x 0.126970 x 0.966718 1 y 0.897237 z 0.260476 [4 rows x 1 columns] In [7]: df2.index.lexsort_depth Out[7]: 2 In [8]: df2.loc[(1,'z')] Out[8]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns] 
- Bug in unique of Series with - categorydtype, which returned all categories regardless whether they were “used” or not (see GH 8559 for the discussion). Previous behaviour was to return all categories:- In [3]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [4]: cat Out[4]: [a, b, a] Categories (3, object): [a < b < c] In [5]: cat.unique() Out[5]: array(['a', 'b', 'c'], dtype=object) - Now, only the categories that do effectively occur in the array are returned: - In [1]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [2]: cat.unique() Out[2]: ['a', 'b'] Categories (3, object): ['a', 'b', 'c'] 
- Series.alland- Series.anynow support the- leveland- skipnaparameters.- Series.all,- Series.any,- Index.all, and- Index.anyno longer support the- outand- keepdimsparameters, which existed for compatibility with ndarray. Various index types no longer support the- alland- anyaggregation functions and will now raise- TypeError. (GH 8302).
- Allow equality comparisons of Series with a categorical dtype and object dtype; previously these would raise - TypeError(GH 8938)
- Bug in - NDFrame: conflicting attribute/column names now behave consistently between getting and setting. Previously, when both a column and attribute named- yexisted,- data.ywould return the attribute, while- data.y = zwould update the column (GH 8994)- In [3]: data = pd.DataFrame({'x': [1, 2, 3]}) In [4]: data.y = 2 In [5]: data['y'] = [2, 4, 6] In [6]: data Out[6]: x y 0 1 2 1 2 4 2 3 6 [3 rows x 2 columns] # this assignment was inconsistent In [7]: data.y = 5 - Old behavior: - In [6]: data.y Out[6]: 2 In [7]: data['y'].values Out[7]: array([5, 5, 5]) - New behavior: - In [8]: data.y Out[8]: 5 In [9]: data['y'].values Out[9]: array([2, 4, 6]) 
- Timestamp('now')is now equivalent to- Timestamp.now()in that it returns the local time rather than UTC. Also,- Timestamp('today')is now equivalent to- Timestamp.today()and both have- tzas a possible argument. (GH 9000)
- Fix negative step support for label-based slices (GH 8753) - Old behavior: - In [1]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) Out[1]: a 0 b 1 c 2 dtype: int64 In [2]: s.loc['c':'a':-1] Out[2]: c 2 dtype: int64 - New behavior: - In [10]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) In [11]: s.loc['c':'a':-1] Out[11]: c 2 b 1 a 0 Length: 3, dtype: int64 
Enhancements#
Categorical enhancements:
- Added ability to export Categorical data to Stata (GH 8633). See here for limitations of categorical variables exported to Stata data files. 
- Added flag - order_categoricalsto- StataReaderand- read_statato select whether to order imported categorical data (GH 8836). See here for more information on importing categorical variables from Stata data files.
- Added ability to export Categorical data to/from HDF5 (GH 7621). Queries work the same as if it was an object array. However, the - categorydtyped data is stored in a more efficient manner. See here for an example and caveats w.r.t. prior versions of pandas.
- Added support for - searchsorted()on- Categoricalclass (GH 8420).
Other enhancements:
- Added the ability to specify the SQL type of columns when writing a DataFrame to a database (GH 8778). For example, specifying to use the sqlalchemy - Stringtype instead of the default- Texttype for string columns:- from sqlalchemy.types import String data.to_sql('data_dtype', engine, dtype={'Col_1': String}) # noqa F821 
- Series.alland- Series.anynow support the- leveland- skipnaparameters (GH 8302):- >>> s = pd.Series([False, True, False], index=[0, 0, 1]) >>> s.any(level=0) 0 True 1 False dtype: bool 
- Panelnow supports the- alland- anyaggregation functions. (GH 8302):- >>> p = pd.Panel(np.random.rand(2, 5, 4) > 0.1) >>> p.all() 0 1 2 3 0 True True True True 1 True False True True 2 True True True True 3 False True False True 4 True True True True 
- Added support for - utcfromtimestamp(),- fromtimestamp(), and- combine()on- Timestampclass (GH 5351).
- Added Google Analytics (pandas.io.ga) basic documentation (GH 8835). See here. 
- Timedeltaarithmetic returns- NotImplementedin unknown cases, allowing extensions by custom classes (GH 8813).
- Timedeltanow supports arithmetic with- numpy.ndarrayobjects of the appropriate dtype (numpy 1.8 or newer only) (GH 8884).
- Added - Timedelta.to_timedelta64()method to the public API (GH 8884).
- Added - gbq.generate_bq_schema()function to the gbq module (GH 8325).
- Seriesnow works with map objects the same way as generators (GH 8909).
- Added context manager to - HDFStorefor automatic closing (GH 8791).
- to_datetimegains an- exactkeyword to allow for a format to not require an exact match for a provided format string (if its- False).- exactdefaults to- True(meaning that exact matching is still the default) (GH 8904)
- Added - axvlinesboolean option to parallel_coordinates plot function, determines whether vertical lines will be printed, default is True
- Added ability to read table footers to read_html (GH 8552) 
- to_sqlnow infers data types of non-NA values for columns that contain NA values and have dtype- object(GH 8778).
Performance#
Bug fixes#
- Bug in concat of Series with - categorydtype which were coercing to- object. (GH 8641)
- Bug in Timestamp-Timestamp not returning a Timedelta type and datelike-datelike ops with timezones (GH 8865) 
- Made consistent a timezone mismatch exception (either tz operated with None or incompatible timezone), will now return - TypeErrorrather than- ValueError(a couple of edge cases only), (GH 8865)
- Bug in using a - pd.Grouper(key=...)with no level/axis or level only (GH 8795, GH 8866)
- Report a - TypeErrorwhen invalid/no parameters are passed in a groupby (GH 8015)
- Bug in packaging pandas with - py2app/cx_Freeze(GH 8602, GH 8831)
- Bug in - groupbysignatures that didn’t include *args or **kwargs (GH 8733).
- io.data.Optionsnow raises- RemoteDataErrorwhen no expiry dates are available from Yahoo and when it receives no data from Yahoo (GH 8761), (GH 8783).
- Unclear error message in csv parsing when passing dtype and names and the parsed data is a different data type (GH 8833) 
- Bug in slicing a MultiIndex with an empty list and at least one boolean indexer (GH 8781) 
- io.data.Optionsnow raises- RemoteDataErrorwhen no expiry dates are available from Yahoo (GH 8761).
- Timedeltakwargs may now be numpy ints and floats (GH 8757).
- Fixed several outstanding bugs for - Timedeltaarithmetic and comparisons (GH 8813, GH 5963, GH 5436).
- sql_schemanow generates dialect appropriate- CREATE TABLEstatements (GH 8697)
- slicestring method now takes step into account (GH 8754)
- Bug in - BlockManagerwhere setting values with different type would break block integrity (GH 8850)
- Bug in - DatetimeIndexwhen using- timeobject as key (GH 8667)
- Bug in - mergewhere- how='left'and- sort=Falsewould not preserve left frame order (GH 7331)
- Bug in - MultiIndex.reindexwhere reindexing at level would not reorder labels (GH 4088)
- Bug in certain operations with dateutil timezones, manifesting with dateutil 2.3 (GH 8639) 
- Regression in DatetimeIndex iteration with a Fixed/Local offset timezone (GH 8890) 
- Bug in - to_datetimewhen parsing a nanoseconds using the- %fformat (GH 8989)
- io.data.Optionsnow raises- RemoteDataErrorwhen no expiry dates are available from Yahoo and when it receives no data from Yahoo (GH 8761), (GH 8783).
- Fix: The font size was only set on x axis if vertical or the y axis if horizontal. (GH 8765) 
- Fixed division by 0 when reading big csv files in python 3 (GH 8621) 
- Bug in outputting a MultiIndex with - to_html,index=Falsewhich would add an extra column (GH 8452)
- Imported categorical variables from Stata files retain the ordinal information in the underlying data (GH 8836). 
- Defined - .sizeattribute across- NDFrameobjects to provide compat with numpy >= 1.9.1; buggy with- np.array_split(GH 8846)
- Skip testing of histogram plots for matplotlib <= 1.2 (GH 8648). 
- Bug where - get_data_googlereturned object dtypes (GH 3995)
- Bug in - DataFrame.stack(..., dropna=False)when the DataFrame’s- columnsis a- MultiIndexwhose- labelsdo not reference all its- levels. (GH 8844)
- Bug in that Option context applied on - __enter__(GH 8514)
- Bug in resample that causes a ValueError when resampling across multiple days and the last offset is not calculated from the start of the range (GH 8683) 
- Bug where - DataFrame.plot(kind='scatter')fails when checking if an np.array is in the DataFrame (GH 8852)
- Bug in - pd.infer_freq/DataFrame.inferred_freqthat prevented proper sub-daily frequency inference when the index contained DST days (GH 8772).
- Bug where index name was still used when plotting a series with - use_index=False(GH 8558).
- Bugs when trying to stack multiple columns, when some (or all) of the level names are numbers (GH 8584). 
- Bug in - MultiIndexwhere- __contains__returns wrong result if index is not lexically sorted or unique (GH 7724)
- BUG CSV: fix problem with trailing white space in skipped rows, (GH 8679), (GH 8661), (GH 8983) 
- Regression in - Timestampdoes not parse ‘Z’ zone designator for UTC (GH 8771)
- Bug in - StataWriterthe produces writes strings with 244 characters irrespective of actual size (GH 8969)
- Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. (GH 8965) 
- Bug in DataReader returns object dtype if there are missing values (GH 8980) 
- Bug in plotting if sharex was enabled and index was a timeseries, would show labels on multiple axes (GH 3964). 
- Bug where passing a unit to the TimedeltaIndex constructor applied the to nano-second conversion twice. (GH 9011). 
- Bug in plotting of a period-like array (GH 9012) 
Contributors#
A total of 49 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
- Aaron Staple 
- Angelos Evripiotis + 
- Artemy Kolchinsky 
- Benoit Pointet + 
- Brian Jacobowski + 
- Charalampos Papaloizou + 
- Chris Warth + 
- David Stephens 
- Fabio Zanini + 
- Francesc Via + 
- Henry Kleynhans + 
- Jake VanderPlas + 
- Jan Schulz 
- Jeff Reback 
- Jeff Tratner 
- Joris Van den Bossche 
- Kevin Sheppard 
- Matt Suggit + 
- Matthew Brett 
- Phillip Cloud 
- Rupert Thompson + 
- Scott E Lasley + 
- Stephan Hoyer 
- Stephen Simmons + 
- Sylvain Corlay + 
- Thomas Grainger + 
- Tiago Antao + 
- Tom Augspurger 
- Trent Hauck 
- Victor Chaves + 
- Victor Salgado + 
- Vikram Bhandoh + 
- WANG Aiyong 
- Will Holmgren + 
- behzad nouri 
- broessli + 
- charalampos papaloizou + 
- immerrr 
- jnmclarty 
- jreback 
- mgilbert + 
- onesandzeroes 
- peadarcoyle + 
- rockg 
- seth-p 
- sinhrks 
- unutbu 
- wavedatalab + 
- Åsmund Hjulstad +