Version 0.15.2 (December 12, 2014)¶
This is a minor release from 0.15.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs. We recommend that all users upgrade to this version.
API changes¶
Indexing in
MultiIndex
beyond lex-sort depth is now supported, though a lexically sorted index will have a better performance. (GH2646)In [1]: df = pd.DataFrame({'jim':[0, 0, 1, 1], ...: 'joe':['x', 'x', 'z', 'y'], ...: 'jolie':np.random.rand(4)}).set_index(['jim', 'joe']) ...: In [2]: df Out[2]: jolie jim joe 0 x 0.126970 x 0.966718 1 z 0.260476 y 0.897237 [4 rows x 1 columns] In [3]: df.index.lexsort_depth Out[3]: 1 # in prior versions this would raise a KeyError # will now show a PerformanceWarning In [4]: df.loc[(1, 'z')] Out[4]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns] # lexically sorting In [5]: df2 = df.sort_index() In [6]: df2 Out[6]: jolie jim joe 0 x 0.126970 x 0.966718 1 y 0.897237 z 0.260476 [4 rows x 1 columns] In [7]: df2.index.lexsort_depth Out[7]: 2 In [8]: df2.loc[(1,'z')] Out[8]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns]
Bug in unique of Series with
category
dtype, which returned all categories regardless whether they were “used” or not (see GH8559 for the discussion). Previous behaviour was to return all categories:In [3]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [4]: cat Out[4]: [a, b, a] Categories (3, object): [a < b < c] In [5]: cat.unique() Out[5]: array(['a', 'b', 'c'], dtype=object)
Now, only the categories that do effectively occur in the array are returned:
In [9]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [10]: cat.unique() Out[10]: ['a', 'b'] Categories (2, object): ['a', 'b']
Series.all
andSeries.any
now support thelevel
andskipna
parameters.Series.all
,Series.any
,Index.all
, andIndex.any
no longer support theout
andkeepdims
parameters, which existed for compatibility with ndarray. Various index types no longer support theall
andany
aggregation functions and will now raiseTypeError
. (GH8302).Allow equality comparisons of Series with a categorical dtype and object dtype; previously these would raise
TypeError
(GH8938)Bug in
NDFrame
: conflicting attribute/column names now behave consistently between getting and setting. Previously, when both a column and attribute namedy
existed,data.y
would return the attribute, whiledata.y = z
would update the column (GH8994)In [11]: data = pd.DataFrame({'x': [1, 2, 3]}) In [12]: data.y = 2 In [13]: data['y'] = [2, 4, 6] In [14]: data Out[14]: x y 0 1 2 1 2 4 2 3 6 [3 rows x 2 columns] # this assignment was inconsistent In [15]: data.y = 5
Old behavior:
In [6]: data.y Out[6]: 2 In [7]: data['y'].values Out[7]: array([5, 5, 5])
New behavior:
In [16]: data.y Out[16]: 5 In [17]: data['y'].values Out[17]: array([2, 4, 6])
Timestamp('now')
is now equivalent toTimestamp.now()
in that it returns the local time rather than UTC. Also,Timestamp('today')
is now equivalent toTimestamp.today()
and both havetz
as a possible argument. (GH9000)Fix negative step support for label-based slices (GH8753)
Old behavior:
In [1]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) Out[1]: a 0 b 1 c 2 dtype: int64 In [2]: s.loc['c':'a':-1] Out[2]: c 2 dtype: int64
New behavior:
In [18]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) In [19]: s.loc['c':'a':-1] Out[19]: c 2 b 1 a 0 Length: 3, dtype: int64
Enhancements¶
Categorical
enhancements:
Added ability to export Categorical data to Stata (GH8633). See here for limitations of categorical variables exported to Stata data files.
Added flag
order_categoricals
toStataReader
andread_stata
to select whether to order imported categorical data (GH8836). See here for more information on importing categorical variables from Stata data files.Added ability to export Categorical data to/from HDF5 (GH7621). Queries work the same as if it was an object array. However, the
category
dtyped data is stored in a more efficient manner. See here for an example and caveats w.r.t. prior versions of pandas.Added support for
searchsorted()
onCategorical
class (GH8420).
Other enhancements:
Added the ability to specify the SQL type of columns when writing a DataFrame to a database (GH8778). For example, specifying to use the sqlalchemy
String
type instead of the defaultText
type for string columns:from sqlalchemy.types import String data.to_sql('data_dtype', engine, dtype={'Col_1': String}) # noqa F821
Series.all
andSeries.any
now support thelevel
andskipna
parameters (GH8302):In [20]: s = pd.Series([False, True, False], index=[0, 0, 1]) In [21]: s.any(level=0) Out[21]: 0 True 1 False Length: 2, dtype: bool
Panel
now supports theall
andany
aggregation functions. (GH8302):>>> p = pd.Panel(np.random.rand(2, 5, 4) > 0.1) >>> p.all() 0 1 2 3 0 True True True True 1 True False True True 2 True True True True 3 False True False True 4 True True True True
Added support for
utcfromtimestamp()
,fromtimestamp()
, andcombine()
onTimestamp
class (GH5351).Added Google Analytics (pandas.io.ga) basic documentation (GH8835). See here.
Timedelta
arithmetic returnsNotImplemented
in unknown cases, allowing extensions by custom classes (GH8813).Timedelta
now supports arithmetic withnumpy.ndarray
objects of the appropriate dtype (numpy 1.8 or newer only) (GH8884).Added
Timedelta.to_timedelta64()
method to the public API (GH8884).Added
gbq.generate_bq_schema()
function to the gbq module (GH8325).Series
now works with map objects the same way as generators (GH8909).Added context manager to
HDFStore
for automatic closing (GH8791).to_datetime
gains anexact
keyword to allow for a format to not require an exact match for a provided format string (if itsFalse
).exact
defaults toTrue
(meaning that exact matching is still the default) (GH8904)Added
axvlines
boolean option to parallel_coordinates plot function, determines whether vertical lines will be printed, default is TrueAdded ability to read table footers to read_html (GH8552)
to_sql
now infers data types of non-NA values for columns that contain NA values and have dtypeobject
(GH8778).
Performance¶
Bug fixes¶
Bug in concat of Series with
category
dtype which were coercing toobject
. (GH8641)Bug in Timestamp-Timestamp not returning a Timedelta type and datelike-datelike ops with timezones (GH8865)
Made consistent a timezone mismatch exception (either tz operated with None or incompatible timezone), will now return
TypeError
rather thanValueError
(a couple of edge cases only), (GH8865)Bug in using a
pd.Grouper(key=...)
with no level/axis or level only (GH8795, GH8866)Report a
TypeError
when invalid/no parameters are passed in a groupby (GH8015)Bug in packaging pandas with
py2app/cx_Freeze
(GH8602, GH8831)Bug in
groupby
signatures that didn’t include *args or **kwargs (GH8733).io.data.Options
now raisesRemoteDataError
when no expiry dates are available from Yahoo and when it receives no data from Yahoo (GH8761), (GH8783).Unclear error message in csv parsing when passing dtype and names and the parsed data is a different data type (GH8833)
Bug in slicing a MultiIndex with an empty list and at least one boolean indexer (GH8781)
io.data.Options
now raisesRemoteDataError
when no expiry dates are available from Yahoo (GH8761).Timedelta
kwargs may now be numpy ints and floats (GH8757).Fixed several outstanding bugs for
Timedelta
arithmetic and comparisons (GH8813, GH5963, GH5436).sql_schema
now generates dialect appropriateCREATE TABLE
statements (GH8697)slice
string method now takes step into account (GH8754)Bug in
BlockManager
where setting values with different type would break block integrity (GH8850)Bug in
DatetimeIndex
when usingtime
object as key (GH8667)Bug in
merge
wherehow='left'
andsort=False
would not preserve left frame order (GH7331)Bug in
MultiIndex.reindex
where reindexing at level would not reorder labels (GH4088)Bug in certain operations with dateutil timezones, manifesting with dateutil 2.3 (GH8639)
Regression in DatetimeIndex iteration with a Fixed/Local offset timezone (GH8890)
Bug in
to_datetime
when parsing a nanoseconds using the%f
format (GH8989)io.data.Options
now raisesRemoteDataError
when no expiry dates are available from Yahoo and when it receives no data from Yahoo (GH8761), (GH8783).Fix: The font size was only set on x axis if vertical or the y axis if horizontal. (GH8765)
Fixed division by 0 when reading big csv files in python 3 (GH8621)
Bug in outputting a MultiIndex with
to_html,index=False
which would add an extra column (GH8452)Imported categorical variables from Stata files retain the ordinal information in the underlying data (GH8836).
Defined
.size
attribute acrossNDFrame
objects to provide compat with numpy >= 1.9.1; buggy withnp.array_split
(GH8846)Skip testing of histogram plots for matplotlib <= 1.2 (GH8648).
Bug where
get_data_google
returned object dtypes (GH3995)Bug in
DataFrame.stack(..., dropna=False)
when the DataFrame’scolumns
is aMultiIndex
whoselabels
do not reference all itslevels
. (GH8844)Bug in that Option context applied on
__enter__
(GH8514)Bug in resample that causes a ValueError when resampling across multiple days and the last offset is not calculated from the start of the range (GH8683)
Bug where
DataFrame.plot(kind='scatter')
fails when checking if an np.array is in the DataFrame (GH8852)Bug in
pd.infer_freq/DataFrame.inferred_freq
that prevented proper sub-daily frequency inference when the index contained DST days (GH8772).Bug where index name was still used when plotting a series with
use_index=False
(GH8558).Bugs when trying to stack multiple columns, when some (or all) of the level names are numbers (GH8584).
Bug in
MultiIndex
where__contains__
returns wrong result if index is not lexically sorted or unique (GH7724)BUG CSV: fix problem with trailing white space in skipped rows, (GH8679), (GH8661), (GH8983)
Regression in
Timestamp
does not parse ‘Z’ zone designator for UTC (GH8771)Bug in
StataWriter
the produces writes strings with 244 characters irrespective of actual size (GH8969)Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. (GH8965)
Bug in DataReader returns object dtype if there are missing values (GH8980)
Bug in plotting if sharex was enabled and index was a timeseries, would show labels on multiple axes (GH3964).
Bug where passing a unit to the TimedeltaIndex constructor applied the to nano-second conversion twice. (GH9011).
Bug in plotting of a period-like array (GH9012)
Contributors¶
A total of 49 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Aaron Staple
Angelos Evripiotis +
Artemy Kolchinsky
Benoit Pointet +
Brian Jacobowski +
Charalampos Papaloizou +
Chris Warth +
David Stephens
Fabio Zanini +
Francesc Via +
Henry Kleynhans +
Jake VanderPlas +
Jan Schulz
Jeff Reback
Jeff Tratner
Joris Van den Bossche
Kevin Sheppard
Matt Suggit +
Matthew Brett
Phillip Cloud
Rupert Thompson +
Scott E Lasley +
Stephan Hoyer
Stephen Simmons +
Sylvain Corlay +
Thomas Grainger +
Tiago Antao +
Tom Augspurger
Trent Hauck
Victor Chaves +
Victor Salgado +
Vikram Bhandoh +
WANG Aiyong
Will Holmgren +
behzad nouri
broessli +
charalampos papaloizou +
immerrr
jnmclarty
jreback
mgilbert +
onesandzeroes
peadarcoyle +
rockg
seth-p
sinhrks
unutbu
wavedatalab +
Åsmund Hjulstad +