Release Notes¶
This is the list of changes to pandas between each release. For full details, see the commit logs at http://github.com/pandas-dev/pandas
What is it
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.
Where to get it
- Source code: http://github.com/pandas-dev/pandas
- Binary installers on PyPI: http://pypi.python.org/pypi/pandas
- Documentation: http://pandas.pydata.org
pandas 0.19.2¶
Release date: December 24, 2016
This is a minor bug-fix release in the 0.19.x series and includes some small regression fixes, bug fixes and performance improvements.
Highlights include:
- Compatibility with Python 3.6
- Added a Pandas Cheat Sheet. (GH13202).
See the v0.19.2 Whatsnew page for an overview of all bugs that have been fixed in 0.19.2.
Thanks¶
- Ajay Saxena
- Ben Kandel
- Chris
- Chris Ham
- Christopher C. Aycock
- Daniel Himmelstein
- Dave Willmer
- Dr-Irv
- gfyoung
- hesham shabana
- Jeff Carey
- Jeff Reback
- Joe Jevnik
- Joris Van den Bossche
- Julian Santander
- Kerby Shedden
- Keshav Ramaswamy
- Kevin Sheppard
- Luca Scarabello
- Matti Picus
- Matt Roeschke
- Maximilian Roos
- Mykola Golubyev
- Nate Yoder
- Nicholas Ver Halen
- Pawel Kordek
- Pietro Battiston
- Rodolfo Fernandez
- sinhrks
- Tara Adiseshan
- Tom Augspurger
- wandersoncferreira
- Yaroslav Halchenko
pandas 0.19.1¶
Release date: November 3, 2016
This is a minor bug-fix release from 0.19.0 and includes some small regression fixes, bug fixes and performance improvements.
See the v0.19.1 Whatsnew page for an overview of all bugs that have been fixed in 0.19.1.
Thanks¶
- Adam Chainz
- Anthonios Partheniou
- Arash Rouhani
- Ben Kandel
- Brandon M. Burroughs
- Chris
- chris-b1
- Chris Warth
- David Krych
- dubourg
- gfyoung
- Iván Vallés Pérez
- Jeff Reback
- Joe Jevnik
- Jon M. Mease
- Joris Van den Bossche
- Josh Owen
- Keshav Ramaswamy
- Larry Ren
- mattrijk
- Michael Felt
- paul-mannino
- Piotr Chromiec
- Robert Bradshaw
- Sinhrks
- Thiago Serafim
- Tom Bird
pandas 0.19.0¶
Release date: October 2, 2016
This is a major release from 0.18.1 and includes number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
merge_asof()
for asof-style time-series joining, see here.rolling()
is now time-series aware, see hereread_csv()
now supports parsingCategorical
data, see here- A function
union_categorical()
has been added for combining categoricals, see here PeriodIndex
now has its ownperiod
dtype, and changed to be more consistent with otherIndex
classes. See here- Sparse data structures gained enhanced support of
int
andbool
dtypes, see here - Comparison operations with
Series
no longer ignores the index, see here for an overview of the API changes. - Introduction of a pandas development API for utility functions, see here.
- Deprecation of
Panel4D
andPanelND
. We recommend to represent these types of n-dimensional data with the xarray package. - Removal of the previously deprecated modules
pandas.io.data
,pandas.io.wb
,pandas.tools.rplot
.
See the v0.19.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.19.0.
Thanks¶
- adneu
- Adrien Emery
- agraboso
- Alex Alekseyev
- Alex Vig
- Allen Riddell
- Amol
- Amol Agrawal
- Andy R. Terrel
- Anthonios Partheniou
- babakkeyvani
- Ben Kandel
- Bob Baxley
- Brett Rosen
- c123w
- Camilo Cota
- Chris
- chris-b1
- Chris Grinolds
- Christian Hudon
- Christopher C. Aycock
- Chris Warth
- cmazzullo
- conquistador1492
- cr3
- Daniel Siladji
- Douglas McNeil
- Drewrey Lupton
- dsm054
- Eduardo Blancas Reyes
- Elliot Marsden
- Evan Wright
- Felix Marczinowski
- Francis T. O’Donovan
- Gábor Lipták
- Geraint Duck
- gfyoung
- Giacomo Ferroni
- Grant Roch
- Haleemur Ali
- harshul1610
- Hassan Shamim
- iamsimha
- Iulius Curt
- Ivan Nazarov
- jackieleng
- Jeff Reback
- Jeffrey Gerard
- Jenn Olsen
- Jim Crist
- Joe Jevnik
- John Evans
- John Freeman
- John Liekezer
- Johnny Gill
- John W. O’Brien
- John Zwinck
- Jordan Erenrich
- Joris Van den Bossche
- Josh Howes
- Jozef Brandys
- Kamil Sindi
- Ka Wo Chen
- Kerby Shedden
- Kernc
- Kevin Sheppard
- Matthieu Brucher
- Maximilian Roos
- Michael Scherer
- Mike Graham
- Mortada Mehyar
- mpuels
- Muhammad Haseeb Tariq
- Nate George
- Neil Parley
- Nicolas Bonnotte
- OXPHOS
- Pan Deng / Zora
- Paul
- Pauli Virtanen
- Paul Mestemaker
- Pawel Kordek
- Pietro Battiston
- pijucha
- Piotr Jucha
- priyankjain
- Ravi Kumar Nimmi
- Robert Gieseke
- Robert Kern
- Roger Thomas
- Roy Keyes
- Russell Smith
- Sahil Dua
- Sanjiv Lobo
- Sašo Stanovnik
- Shawn Heide
- sinhrks
- Sinhrks
- Stephen Kappel
- Steve Choi
- Stewart Henderson
- Sudarshan Konge
- Thomas A Caswell
- Tom Augspurger
- Tom Bird
- Uwe Hoffmann
- wcwagner
- WillAyd
- Xiang Zhang
- Yadunandan
- Yaroslav Halchenko
- YG-Riku
- Yuichiro Kaneko
- yui-knk
- zhangjinjie
- znmean
- 颜发才(Yan Facai)
pandas 0.18.1¶
Release date: (May 3, 2016)
This is a minor release from 0.18.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
.groupby(...)
has been enhanced to provide convenient syntax when working with.rolling(..)
,.expanding(..)
and.resample(..)
per group, see herepd.to_datetime()
has gained the ability to assemble dates from aDataFrame
, see here- Method chaining improvements, see here.
- Custom business hour offset, see here.
- Many bug fixes in the handling of
sparse
, see here - Expanded the Tutorials section with a feature on modern pandas, courtesy of @TomAugsburger. (GH13045).
See the v0.18.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.18.1.
Thanks¶
- Andrew Fiore-Gartland
- Bastiaan
- Benoît Vinot
- Brandon Rhodes
- DaCoEx
- Drew Fustin
- Ernesto Freitas
- Filip Ter
- Gregory Livschitz
- Gábor Lipták
- Hassan Kibirige
- Iblis Lin
- Israel Saeta Pérez
- Jason Wolosonovich
- Jeff Reback
- Joe Jevnik
- Joris Van den Bossche
- Joshua Storck
- Ka Wo Chen
- Kerby Shedden
- Kieran O’Mahony
- Leif Walsh
- Mahmoud Lababidi
- Maoyuan Liu
- Mark Roth
- Matt Wittmann
- MaxU
- Maximilian Roos
- Michael Droettboom
- Nick Eubank
- Nicolas Bonnotte
- OXPHOS
- Pauli Virtanen
- Peter Waller
- Pietro Battiston
- Prabhjot Singh
- Robin Wilson
- Roger Thomas
- Sebastian Bank
- Stephen Hoover
- Tim Hopper
- Tom Augspurger
- WANG Aiyong
- Wes Turner
- Winand
- Xbar
- Yan Facai
- adneu
- ajenkins-cargometrics
- behzad nouri
- chinskiy
- gfyoung
- jeps-journal
- jonaslb
- kotrfa
- nileracecrew
- onesandzeroes
- rs2
- sinhrks
- tsdlovell
pandas 0.18.0¶
Release date: (March 13, 2016)
This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Moving and expanding window functions are now methods on Series and DataFrame,
similar to
.groupby
, see here. - Adding support for a
RangeIndex
as a specialized form of theInt64Index
for memory savings, see here. - API breaking change to the
.resample
method to make it more.groupby
like, see here. - Removal of support for positional indexing with floats, which was deprecated
since 0.14.0. This will now raise a
TypeError
, see here. - The
.to_xarray()
function has been added for compatibility with the xarray package, see here. - The
read_sas
function has been enhanced to readsas7bdat
files, see here. - Addition of the .str.extractall() method, and API changes to the .str.extract() method and .str.cat() method.
pd.test()
top-level nose test runner is available (GH4327).
See the v0.18.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.18.0.
Thanks¶
- ARF
- Alex Alekseyev
- Andrew McPherson
- Andrew Rosenfeld
- Anthonios Partheniou
- Anton I. Sipos
- Ben
- Ben North
- Bran Yang
- Chris
- Chris Carroux
- Christopher C. Aycock
- Christopher Scanlin
- Cody
- Da Wang
- Daniel Grady
- Dorozhko Anton
- Dr-Irv
- Erik M. Bray
- Evan Wright
- Francis T. O’Donovan
- Frank Cleary
- Gianluca Rossi
- Graham Jeffries
- Guillaume Horel
- Henry Hammond
- Isaac Schwabacher
- Jean-Mathieu Deschenes
- Jeff Reback
- Joe Jevnik
- John Freeman
- John Fremlin
- Jonas Hoersch
- Joris Van den Bossche
- Joris Vankerschaver
- Justin Lecher
- Justin Lin
- Ka Wo Chen
- Keming Zhang
- Kerby Shedden
- Kyle
- Marco Farrugia
- MasonGallo
- MattRijk
- Matthew Lurie
- Maximilian Roos
- Mayank Asthana
- Mortada Mehyar
- Moussa Taifi
- Navreet Gill
- Nicolas Bonnotte
- Paul Reiners
- Philip Gura
- Pietro Battiston
- RahulHP
- Randy Carnevale
- Rinoc Johnson
- Rishipuri
- Sangmin Park
- Scott E Lasley
- Sereger13
- Shannon Wang
- Skipper Seabold
- Thierry Moisan
- Thomas A Caswell
- Toby Dylan Hocking
- Tom Augspurger
- Travis
- Trent Hauck
- Tux1
- Varun
- Wes McKinney
- Will Thompson
- Yoav Ram
- Yoong Kang Lim
- Yoshiki Vázquez Baeza
- Young Joong Kim
- Younggun Kim
- Yuval Langer
- alex argunov
- behzad nouri
- boombard
- brian-pantano
- chromy
- daniel
- dgram0
- gfyoung
- hack-c
- hcontrast
- jfoo
- kaustuv deolal
- llllllllll
- ranarag
- rockg
- scls19fr
- seales
- sinhrks
- srib
- surveymedia.ca
- tworec
pandas 0.17.1¶
Release date: (November 21, 2015)
This is a minor release from 0.17.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
- Support for Conditional HTML Formatting, see here
- Releasing the GIL on the csv reader & other ops, see here
- Regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (GH11376)
See the v0.17.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.17.1.
Thanks¶
- Aleksandr Drozd
- Alex Chase
- Anthonios Partheniou
- BrenBarn
- Brian J. McGuirk
- Chris
- Christian Berendt
- Christian Perez
- Cody Piersall
- Data & Code Expert Experimenting with Code on Data
- DrIrv
- Evan Wright
- Guillaume Gay
- Hamed Saljooghinejad
- Iblis Lin
- Jake VanderPlas
- Jan Schulz
- Jean-Mathieu Deschenes
- Jeff Reback
- Jimmy Callin
- Joris Van den Bossche
- K.-Michael Aye
- Ka Wo Chen
- Loïc Séguin-C
- Luo Yicheng
- Magnus Jöud
- Manuel Leonhardt
- Matthew Gilbert
- Maximilian Roos
- Michael
- Nicholas Stahl
- Nicolas Bonnotte
- Pastafarianist
- Petra Chong
- Phil Schaf
- Philipp A
- Rob deCarvalho
- Roman Khomenko
- Rémy Léone
- Sebastian Bank
- Thierry Moisan
- Tom Augspurger
- Tux1
- Varun
- Wieland Hoffmann
- Winterflower
- Yoav Ram
- Younggun Kim
- Zeke
- ajcr
- azuranski
- behzad nouri
- cel4
- emilydolson
- hironow
- lexual
- llllllllll
- rockg
- silentquasar
- sinhrks
- taeold
pandas 0.17.0¶
Release date: (October 9, 2015)
This is a major release from 0.16.2 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Release the Global Interpreter Lock (GIL) on some cython operations, see here
- Plotting methods are now available as attributes of the
.plot
accessor, see here - The sorting API has been revamped to remove some long-time inconsistencies, see here
- Support for a
datetime64[ns]
with timezones as a first-class dtype, see here - The default for
to_datetime
will now be toraise
when presented with unparseable formats, previously this would return the original input. Also, date parse functions now return consistent results. See here - The default for
dropna
inHDFStore
has changed toFalse
, to store by default all rows even if they are allNaN
, see here - Datetime accessor (
dt
) now supportsSeries.dt.strftime
to generate formatted strings for datetime-likes, andSeries.dt.total_seconds
to generate each duration of the timedelta in seconds. See here Period
andPeriodIndex
can handle multiplied freq like3D
, which corresponding to 3 days span. See here- Development installed versions of pandas will now have
PEP440
compliant version strings (GH9518) - Development support for benchmarking with the Air Speed Velocity library (GH8316)
- Support for reading SAS xport files, see here
- Documentation comparing SAS to pandas, see here
- Removal of the automatic TimeSeries broadcasting, deprecated since 0.8.0, see here
- Display format with plain text can optionally align with Unicode East Asian Width, see here
- Compatibility with Python 3.5 (GH11097)
- Compatibility with matplotlib 1.5.0 (GH11111)
See the v0.17.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.17.0.
Thanks¶
- Alex Rothberg
- Andrea Bedini
- Andrew Rosenfeld
- Andy Li
- Anthonios Partheniou
- Artemy Kolchinsky
- Bernard Willers
- Charlie Clark
- Chris
- Chris Whelan
- Christoph Gohlke
- Christopher Whelan
- Clark Fitzgerald
- Clearfield Christopher
- Dan Ringwalt
- Daniel Ni
- Data & Code Expert Experimenting with Code on Data
- David Cottrell
- David John Gagne
- David Kelly
- ETF
- Eduardo Schettino
- Egor
- Egor Panfilov
- Evan Wright
- Frank Pinter
- Gabriel Araujo
- Garrett-R
- Gianluca Rossi
- Guillaume Gay
- Guillaume Poulin
- Harsh Nisar
- Ian Henriksen
- Ian Hoegen
- Jaidev Deshpande
- Jan Rudolph
- Jan Schulz
- Jason Swails
- Jeff Reback
- Jonas Buyl
- Joris Van den Bossche
- Joris Vankerschaver
- Josh Levy-Kramer
- Julien Danjou
- Ka Wo Chen
- Karrie Kehoe
- Kelsey Jordahl
- Kerby Shedden
- Kevin Sheppard
- Lars Buitinck
- Leif Johnson
- Luis Ortiz
- Mac
- Matt Gambogi
- Matt Savoie
- Matthew Gilbert
- Maximilian Roos
- Michelangelo D’Agostino
- Mortada Mehyar
- Nick Eubank
- Nipun Batra
- Ondřej Čertík
- Phillip Cloud
- Pratap Vardhan
- Rafal Skolasinski
- Richard Lewis
- Rinoc Johnson
- Rob Levy
- Robert Gieseke
- Safia Abdalla
- Samuel Denny
- Saumitra Shahapure
- Sebastian Pölsterl
- Sebastian Rubbert
- Sheppard, Kevin
- Sinhrks
- Siu Kwan Lam
- Skipper Seabold
- Spencer Carrucciu
- Stephan Hoyer
- Stephen Hoover
- Stephen Pascoe
- Terry Santegoeds
- Thomas Grainger
- Tjerk Santegoeds
- Tom Augspurger
- Vincent Davis
- Winterflower
- Yaroslav Halchenko
- Yuan Tang (Terry)
- agijsberts
- ajcr
- behzad nouri
- cel4
- cyrusmaher
- davidovitch
- ganego
- jreback
- juricast
- larvian
- maximilianr
- msund
- rekcahpassyla
- robertzk
- scls19fr
- seth-p
- sinhrks
- springcoil
- terrytangyuan
- tzinckgraf
pandas 0.16.2¶
Release date: (June 12, 2015)
This is a minor release from 0.16.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
See the v0.16.2 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.16.2.
Thanks¶
- Andrew Rosenfeld
- Artemy Kolchinsky
- Bernard Willers
- Christer van der Meeren
- Christian Hudon
- Constantine Glen Evans
- Daniel Julius Lasiman
- Evan Wright
- Francesco Brundu
- Gaëtan de Menten
- Jake VanderPlas
- James Hiebert
- Jeff Reback
- Joris Van den Bossche
- Justin Lecher
- Ka Wo Chen
- Kevin Sheppard
- Mortada Mehyar
- Morton Fox
- Robin Wilson
- Thomas Grainger
- Tom Ajamian
- Tom Augspurger
- Yoshiki Vázquez Baeza
- Younggun Kim
- austinc
- behzad nouri
- jreback
- lexual
- rekcahpassyla
- scls19fr
- sinhrks
pandas 0.16.1¶
Release date: (May 11, 2015)
This is a minor release from 0.16.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs.
See the v0.16.1 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.16.1.
Thanks¶
- Alfonso MHC
- Andy Hayden
- Artemy Kolchinsky
- Chris Gilmer
- Chris Grinolds
- Dan Birken
- David BROCHART
- David Hirschfeld
- David Stephens
- Dr. Leo
- Evan Wright
- Frans van Dunné
- Hatem Nassrat
- Henning Sperr
- Hugo Herter
- Jan Schulz
- Jeff Blackburne
- Jeff Reback
- Jim Crist
- Jonas Abernot
- Joris Van den Bossche
- Kerby Shedden
- Leo Razoumov
- Manuel Riel
- Mortada Mehyar
- Nick Burns
- Nick Eubank
- Olivier Grisel
- Phillip Cloud
- Pietro Battiston
- Roy Hyunjin Han
- Sam Zhang
- Scott Sanderson
- Stephan Hoyer
- Tiago Antao
- Tom Ajamian
- Tom Augspurger
- Tomaz Berisa
- Vikram Shirgur
- Vladimir Filimonov
- William Hogman
- Yasin A
- Younggun Kim
- behzad nouri
- dsm054
- floydsoft
- flying-sheep
- gfr
- jnmclarty
- jreback
- ksanghai
- lucas
- mschmohl
- ptype
- rockg
- scls19fr
- sinhrks
pandas 0.16.0¶
Release date: (March 22, 2015)
This is a major release from 0.15.2 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
DataFrame.assign
method, see hereSeries.to_coo/from_coo
methods to interact withscipy.sparse
, see here- Backwards incompatible change to
Timedelta
to conform the.seconds
attribute withdatetime.timedelta
, see here - Changes to the
.loc
slicing API to conform with the behavior of.ix
see here - Changes to the default for ordering in the
Categorical
constructor, see here - The
pandas.tools.rplot
,pandas.sandbox.qtpandas
andpandas.rpy
modules are deprecated. We refer users to external packages like seaborn, pandas-qt and rpy2 for similar or equivalent functionality, see here
See the v0.16.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.16.0.
Thanks¶
- Aaron Toth
- Alan Du
- Alessandro Amici
- Artemy Kolchinsky
- Ashwini Chaudhary
- Ben Schiller
- Bill Letson
- Brandon Bradley
- Chau Hoang
- Chris Reynolds
- Chris Whelan
- Christer van der Meeren
- David Cottrell
- David Stephens
- Ehsan Azarnasab
- Garrett-R
- Guillaume Gay
- Jake Torcasso
- Jason Sexauer
- Jeff Reback
- John McNamara
- Joris Van den Bossche
- Joschka zur Jacobsmühlen
- Juarez Bochi
- Junya Hayashi
- K.-Michael Aye
- Kerby Shedden
- Kevin Sheppard
- Kieran O’Mahony
- Kodi Arfer
- Matti Airas
- Min RK
- Mortada Mehyar
- Robert
- Scott E Lasley
- Scott Lasley
- Sergio Pascual
- Skipper Seabold
- Stephan Hoyer
- Thomas Grainger
- Tom Augspurger
- TomAugspurger
- Vladimir Filimonov
- Vyomkesh Tripathi
- Will Holmgren
- Yulong Yang
- behzad nouri
- bertrandhaut
- bjonen
- cel4
- clham
- hsperr
- ischwabacher
- jnmclarty
- josham
- jreback
- omtinez
- roch
- sinhrks
- unutbu
pandas 0.15.2¶
Release date: (December 12, 2014)
This is a minor release from 0.15.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs.
See the v0.15.2 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.2.
Thanks¶
- Aaron Staple
- Angelos Evripiotis
- Artemy Kolchinsky
- Benoit Pointet
- Brian Jacobowski
- Charalampos Papaloizou
- Chris Warth
- David Stephens
- Fabio Zanini
- Francesc Via
- Henry Kleynhans
- Jake VanderPlas
- Jan Schulz
- Jeff Reback
- Jeff Tratner
- Joris Van den Bossche
- Kevin Sheppard
- Matt Suggit
- Matthew Brett
- Phillip Cloud
- Rupert Thompson
- Scott E Lasley
- Stephan Hoyer
- Stephen Simmons
- Sylvain Corlay
- Thomas Grainger
- Tiago Antao
- Trent Hauck
- Victor Chaves
- Victor Salgado
- Vikram Bhandoh
- WANG Aiyong
- Will Holmgren
- behzad nouri
- broessli
- charalampos papaloizou
- immerrr
- jnmclarty
- jreback
- mgilbert
- onesandzeroes
- peadarcoyle
- rockg
- seth-p
- sinhrks
- unutbu
- wavedatalab
- Åsmund Hjulstad
pandas 0.15.1¶
Release date: (November 9, 2014)
This is a minor release from 0.15.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
See the v0.15.1 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.1.
Thanks¶
- Aaron Staple
- Andrew Rosenfeld
- Anton I. Sipos
- Artemy Kolchinsky
- Bill Letson
- Dave Hughes
- David Stephens
- Guillaume Horel
- Jeff Reback
- Joris Van den Bossche
- Kevin Sheppard
- Nick Stahl
- Sanghee Kim
- Stephan Hoyer
- TomAugspurger
- WANG Aiyong
- behzad nouri
- immerrr
- jnmclarty
- jreback
- pallav-fdsi
- unutbu
pandas 0.15.0¶
Release date: (October 18, 2014)
This is a major release from 0.14.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- Drop support for numpy < 1.7.0 (GH7711)
- The
Categorical
type was integrated as a first-class pandas type, see here - New scalar type
Timedelta
, and a new index typeTimedeltaIndex
, see here - New DataFrame default display for
df.info()
to include memory usage, see Memory Usage - New datetimelike properties accessor
.dt
for Series, see Datetimelike Properties - Split indexing documentation into Indexing and Selecting Data and MultiIndex / Advanced Indexing
- Split out string methods documentation into Working with Text Data
read_csv
will now by default ignore blank lines when parsing, see here- API change in using Indexes in set operations, see here
- Internal refactoring of the
Index
class to no longer sub-classndarray
, see Internal Refactoring - dropping support for
PyTables
less than version 3.0.0, andnumexpr
less than version 2.1 (GH7990)
See the v0.15.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.0.
Thanks¶
- Aaron Schumacher
- Adam Greenhall
- Andy Hayden
- Anthony O’Brien
- Artemy Kolchinsky
- behzad nouri
- Benedikt Sauer
- benjamin
- Benjamin Thyreau
- Ben Schiller
- bjonen
- BorisVerk
- Chris Reynolds
- Chris Stoafer
- Dav Clark
- dlovell
- DSM
- dsm054
- FragLegs
- German Gomez-Herrero
- Hsiaoming Yang
- Huan Li
- hunterowens
- Hyungtae Kim
- immerrr
- Isaac Slavitt
- ischwabacher
- Jacob Schaer
- Jacob Wasserman
- Jan Schulz
- Jeff Tratner
- Jesse Farnham
- jmorris0x0
- jnmclarty
- Joe Bradish
- Joerg Rittinger
- John W. O’Brien
- Joris Van den Bossche
- jreback
- Kevin Sheppard
- klonuo
- Kyle Meyer
- lexual
- Max Chang
- mcjcode
- Michael Mueller
- Michael W Schatzow
- Mike Kelly
- Mortada Mehyar
- mtrbean
- Nathan Sanders
- Nathan Typanski
- onesandzeroes
- Paul Masurel
- Phillip Cloud
- Pietro Battiston
- RenzoBertocchi
- rockg
- Ross Petchler
- seth-p
- Shahul Hameed
- Shashank Agarwal
- sinhrks
- someben
- stahlous
- stas-sl
- Stephan Hoyer
- thatneat
- tom-alcorn
- TomAugspurger
- Tom Augspurger
- Tony Lorenzo
- unknown
- unutbu
- Wes Turner
- Wilfred Hughes
- Yevgeniy Grechka
- Yoshiki Vázquez Baeza
- zachcp
pandas 0.14.1¶
Release date: (July 11, 2014)
This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- New methods
select_dtypes()
to select columns based on the dtype andsem()
to calculate the standard error of the mean. - Support for dateutil timezones (see docs).
- Support for ignoring full line comments in the
read_csv()
text parser. - New documentation section on Options and Settings.
- Lots of bug fixes.
See the v0.14.1 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.14.1.
Thanks¶
- Andrew Rosenfeld
- Andy Hayden
- Benjamin Adams
- Benjamin M. Gross
- Brian Quistorff
- Brian Wignall
- bwignall
- clham
- Daniel Waeber
- David Bew
- David Stephens
- DSM
- dsm054
- helger
- immerrr
- Jacob Schaer
- jaimefrio
- Jan Schulz
- John David Reaver
- John W. O’Brien
- Joris Van den Bossche
- jreback
- Julien Danjou
- Kevin Sheppard
- K.-Michael Aye
- Kyle Meyer
- lexual
- Matthew Brett
- Matt Wittmann
- Michael Mueller
- Mortada Mehyar
- onesandzeroes
- Phillip Cloud
- Rob Levy
- rockg
- sanguineturtle
- Schaer, Jacob C
- seth-p
- sinhrks
- Stephan Hoyer
- Thomas Kluyver
- Todd Jennings
- TomAugspurger
- unknown
- yelite
pandas 0.14.0¶
Release date: (May 31, 2014)
This is a major release from 0.13.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- Officially support Python 3.4
- SQL interfaces updated to use
sqlalchemy
, see here. - Display interface changes, see here
- MultiIndexing using Slicers, see here.
- Ability to join a singly-indexed DataFrame with a multi-indexed DataFrame, see here
- More consistency in groupby results and more flexible groupby specifications, see here
- Holiday calendars are now supported in
CustomBusinessDay
, see here - Several improvements in plotting functions, including: hexbin, area and pie plots, see here.
- Performance doc section on I/O operations, see here
See the v0.14.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.14.0.
Thanks¶
- Acanthostega
- Adam Marcus
- agijsberts
- akittredge
- Alex Gaudio
- Alex Rothberg
- AllenDowney
- Andrew Rosenfeld
- Andy Hayden
- ankostis
- anomrake
- Antoine Mazières
- anton-d
- bashtage
- Benedikt Sauer
- benjamin
- Brad Buran
- bwignall
- cgohlke
- chebee7i
- Christopher Whelan
- Clark Fitzgerald
- clham
- Dale Jung
- Dan Allan
- Dan Birken
- danielballan
- Daniel Waeber
- David Jung
- David Stephens
- Douglas McNeil
- DSM
- Garrett Drapala
- Gouthaman Balaraman
- Guillaume Poulin
- hshimizu77
- hugo
- immerrr
- ischwabacher
- Jacob Howard
- Jacob Schaer
- jaimefrio
- Jason Sexauer
- Jeff Reback
- Jeffrey Starr
- Jeff Tratner
- John David Reaver
- John McNamara
- John W. O’Brien
- Jonathan Chambers
- Joris Van den Bossche
- jreback
- jsexauer
- Julia Evans
- Júlio
- Katie Atkinson
- kdiether
- Kelsey Jordahl
- Kevin Sheppard
- K.-Michael Aye
- Matthias Kuhn
- Matt Wittmann
- Max Grender-Jones
- Michael E. Gruen
- michaelws
- mikebailey
- Mike Kelly
- Nipun Batra
- Noah Spies
- ojdo
- onesandzeroes
- Patrick O’Keeffe
- phaebz
- Phillip Cloud
- Pietro Battiston
- PKEuS
- Randy Carnevale
- ribonoous
- Robert Gibboni
- rockg
- sinhrks
- Skipper Seabold
- SplashDance
- Stephan Hoyer
- Tim Cera
- Tobias Brandt
- Todd Jennings
- TomAugspurger
- Tom Augspurger
- unutbu
- westurner
- Yaroslav Halchenko
- y-p
- zach powers
pandas 0.13.1¶
Release date: (February 3, 2014)
API Changes¶
Series.sort
will raise aValueError
(rather than aTypeError
) on sorting an object that is a view of another (GH5856, GH5853)- Raise/Warn
SettingWithCopyError
(according to the optionchained_assignment
in more cases, when detecting chained assignment, related (GH5938, GH6025) - DataFrame.head(0) returns self instead of empty frame (GH5846)
autocorrelation_plot
now accepts**kwargs
. (GH5623)convert_objects
now accepts aconvert_timedeltas='coerce'
argument to allow forced dtype conversion of timedeltas (GH5458,:issue:5689)- Add
-NaN
and-nan
to the default set of NA values (GH5952). See NA Values. NDFrame
now has anequals
method. (GH5283)DataFrame.apply
will use thereduce
argument to determine whether aSeries
or aDataFrame
should be returned when theDataFrame
is empty (GH6007).
Experimental Features¶
Improvements to existing features¶
- perf improvements in Series datetime/timedelta binary operations (GH5801)
- option_context context manager now available as top-level API (GH5752)
- df.info() view now display dtype info per column (GH5682)
- df.info() now honors option max_info_rows, disable null counts for large frames (GH5974)
- perf improvements in DataFrame
count/dropna
foraxis=1
- Series.str.contains now has a regex=False keyword which can be faster for plain (non-regex) string patterns. (GH5879)
- support
dtypes
property onSeries/Panel/Panel4D
- extend
Panel.apply
to allow arbitrary functions (rather than only ufuncs) (GH1148) allow multiple axes to be used to operate on slabs of aPanel
- The
ArrayFormatter
fordatetime
andtimedelta64
now intelligently limit precision based on the values in the array (GH3401) pd.show_versions()
is now available for convenience when reporting issues.- perf improvements to Series.str.extract (GH5944)
- perf improvements in
dtypes/ftypes
methods (GH5968) - perf improvements in indexing with object dtypes (GH5968)
- improved dtype inference for
timedelta
like passed to constructors (GH5458, GH5689) - escape special characters when writing to latex (:issue: 5374)
- perf improvements in
DataFrame.apply
(GH6013) pd.read_csv
andpd.to_datetime
learned a newinfer_datetime_format
keyword which greatly improves parsing perf in many cases. Thanks to @lexual for suggesting and @danbirken for rapidly implementing. (GH5490,:issue:6021)- add ability to recognize ‘%p’ format code (am/pm) to date parsers when the specific format is supplied (GH5361)
- Fix performance regression in JSON IO (GH5765)
- performance regression in Index construction from Series (GH6150)
Bug Fixes¶
- Bug in
io.wb.get_countries
not including all countries (GH6008) - Bug in Series replace with timestamp dict (GH5797)
- read_csv/read_table now respects the prefix kwarg (GH5732).
- Bug in selection with missing values via
.ix
from a duplicate indexed DataFrame failing (GH5835) - Fix issue of boolean comparison on empty DataFrames (GH5808)
- Bug in isnull handling
NaT
in an object array (GH5443) - Bug in
to_datetime
when passed anp.nan
or integer datelike and a format string (GH5863) - Bug in groupby dtype conversion with datetimelike (GH5869)
- Regression in handling of empty Series as indexers to Series (GH5877)
- Bug in internal caching, related to (GH5727)
- Testing bug in reading JSON/msgpack from a non-filepath on windows under py3 (GH5874)
- Bug when assigning to .ix[tuple(...)] (GH5896)
- Bug in fully reindexing a Panel (GH5905)
- Bug in idxmin/max with object dtypes (GH5914)
- Bug in
BusinessDay
when adding n days to a date not on offset when n>5 and n%5==0 (GH5890) - Bug in assigning to chained series with a series via ix (GH5928)
- Bug in creating an empty DataFrame, copying, then assigning (GH5932)
- Bug in DataFrame.tail with empty frame (GH5846)
- Bug in propagating metadata on
resample
(GH5862) - Fixed string-representation of
NaT
to be “NaT” (GH5708) - Fixed string-representation for Timestamp to show nanoseconds if present (GH5912)
pd.match
not returning passed sentinelPanel.to_frame()
no longer fails whenmajor_axis
is aMultiIndex
(GH5402).- Bug in
pd.read_msgpack
with inferring aDateTimeIndex
frequency incorrectly (GH5947) - Fixed
to_datetime
for array with both Tz-aware datetimes andNaT
‘s (GH5961) - Bug in rolling skew/kurtosis when passed a Series with bad data (GH5749)
- Bug in scipy
interpolate
methods with a datetime index (GH5975) - Bug in NaT comparison if a mixed datetime/np.datetime64 with NaT were passed (GH5968)
- Fixed bug with
pd.concat
losing dtype information if all inputs are empty (GH5742) - Recent changes in IPython cause warnings to be emitted when using previous versions of pandas in QTConsole, now fixed. If you’re using an older version and need to suppress the warnings, see (GH5922).
- Bug in merging
timedelta
dtypes (GH5695) - Bug in plotting.scatter_matrix function. Wrong alignment among diagonal and off-diagonal plots, see (GH5497).
- Regression in Series with a multi-index via ix (GH6018)
- Bug in Series.xs with a multi-index (GH6018)
- Bug in Series construction of mixed type with datelike and an integer (which should result in object type and not automatic conversion) (GH6028)
- Possible segfault when chained indexing with an object array under numpy 1.7.1 (GH6026, GH6056)
- Bug in setting using fancy indexing a single element with a non-scalar (e.g. a list), (GH6043)
to_sql
did not respectif_exists
(GH4110 GH4304)- Regression in
.get(None)
indexing from 0.12 (GH5652) - Subtle
iloc
indexing bug, surfaced in (GH6059) - Bug with insert of strings into DatetimeIndex (GH5818)
- Fixed unicode bug in to_html/HTML repr (GH6098)
- Fixed missing arg validation in get_options_data (GH6105)
- Bug in assignment with duplicate columns in a frame where the locations are a slice (e.g. next to each other) (GH6120)
- Bug in propogating _ref_locs during construction of a DataFrame with dups index/columns (GH6121)
- Bug in
DataFrame.apply
when using mixed datelike reductions (GH6125) - Bug in
DataFrame.append
when appending a row with different columns (GH6129) - Bug in DataFrame construction with recarray and non-ns datetime dtype (GH6140)
- Bug in
.loc
setitem indexing with a dataframe on rhs, multiple item setting, and a datetimelike (GH6152) - Fixed a bug in
query
/eval
during lexicographic string comparisons (GH6155). - Fixed a bug in
query
where the index of a single-elementSeries
was being thrown away (GH6148). - Bug in
HDFStore
on appending a dataframe with multi-indexed columns to an existing table (GH6167) - Consistency with dtypes in setting an empty DataFrame (GH6171)
- Bug in selecting on a multi-index
HDFStore
even in the presence of under specified column spec (GH6169) - Bug in
nanops.var
withddof=1
and 1 elements would sometimes returninf
rather thannan
on some platforms (GH6136) - Bug in Series and DataFrame bar plots ignoring the
use_index
keyword (GH6209) - Bug in groupby with mixed str/int under python3 fixed;
argsort
was failing (GH6212)
pandas 0.13.0¶
Release date: January 3, 2014
New Features¶
plot(kind='kde')
now accepts the optional parametersbw_method
andind
, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set the bandwidth, and to gkde.evaluate() to specify the indicies at which it is evaluated, respectively. See scipy docs. (GH4298)- Added
isin
method to DataFrame (GH4211) df.to_clipboard()
learned a newexcel
keyword that let’s you paste df data directly into excel (enabled by default). (GH5070).- Clipboard functionality now works with PySide (GH4282)
- New
extract
string method returns regex matches more conveniently (GH4685) - Auto-detect field widths in read_fwf when unspecified (GH4488)
to_csv()
now outputs datetime objects according to a specified format string via thedate_format
keyword (GH4313)- Added
LastWeekOfMonth
DateOffset (GH4637) - Added
cumcount
groupby method (GH4646) - Added
FY5253
, andFY5253Quarter
DateOffsets (GH4511) - Added
mode()
method toSeries
andDataFrame
to get the statistical mode(s) of a column/series. (GH5367)
Experimental Features¶
- The new
eval()
function implements expression evaluation usingnumexpr
behind the scenes. This results in large speedups for complicated expressions involving large DataFrames/Series. DataFrame
has a neweval()
that evaluates an expression in the context of theDataFrame
; allows inline expression assignment- A
query()
method has been added that allows you to select elements of aDataFrame
using a natural query syntax nearly identical to Python syntax. pd.eval
and friends now evaluate operations involvingdatetime64
objects in Python space becausenumexpr
cannot handleNaT
values (GH4897).- Add msgpack support via
pd.read_msgpack()
andpd.to_msgpack()
/df.to_msgpack()
for serialization of arbitrary pandas (and python objects) in a lightweight portable binary format (GH686, GH5506) - Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
- Added
pandas.io.gbq
for reading from (and writing to) Google BigQuery into a DataFrame. (GH4140)
Improvements to existing features¶
read_html
now raises aURLError
instead of catching and raising aValueError
(GH4303, GH4305)read_excel
now supports an integer in itssheetname
argument giving the index of the sheet to read in (GH4301).get_dummies
works with NaN (GH4446)- Added a test for
read_clipboard()
andto_clipboard()
(GH4282) - Added bins argument to
value_counts
(GH3945), also sort and ascending, now available in Series method as well as top-level function. - Text parser now treats anything that reads like inf (“inf”, “Inf”, “-Inf”,
“iNf”, etc.) to infinity. (GH4220, GH4219), affecting
read_table
,read_csv
, etc. - Added a more informative error message when plot arguments contain overlapping color and style arguments (GH4402)
- Significant table writing performance improvements in
HDFStore
- JSON date serialization now performed in low-level C code.
- JSON support for encoding datetime.time
- Expanded JSON docs, more info about orient options and the use of the numpy param when decoding.
- Add
drop_level
argument to xs (GH4180) - Can now resample a DataFrame with ohlc (GH2320)
Index.copy()
andMultiIndex.copy()
now accept keyword arguments to change attributes (i.e.,names
,levels
,labels
) (GH4039)- Add
rename
andset_names
methods toIndex
as well asset_names
,set_levels
,set_labels
toMultiIndex
. (GH4039) with improved validation for all (GH4039, GH4794) - A Series of dtype
timedelta64[ns]
can now be divided/multiplied by an integer series (GH4521) - A Series of dtype
timedelta64[ns]
can now be divided by anothertimedelta64[ns]
object to yield afloat64
dtyped Series. This is frequency conversion; astyping is also supported. - Timedelta64 support
fillna/ffill/bfill
with an integer interpreted as seconds, or atimedelta
(GH3371) - Box numeric ops on
timedelta
Series (GH4984) - Datetime64 support
ffill/bfill
- Performance improvements with
__getitem__
onDataFrames
with when the key is a column - Support for using a
DatetimeIndex/PeriodsIndex
directly in a datelike calculation e.g. s-s.index (GH4629) - Better/cleaned up exceptions in core/common, io/excel and core/format (GH4721, GH3954), as well as cleaned up test cases in tests/test_frame, tests/test_multilevel (GH4732).
- Performance improvement of timeseries plotting with PeriodIndex and added test to vbench (GH4705 and GH4722)
- Add
axis
andlevel
keywords towhere
, so that theother
argument can now be an alignable pandas object. to_datetime
with a format of ‘%Y%m%d’ now parses much faster- It’s now easier to hook new Excel writers into pandas (just subclass
ExcelWriter
and register your engine). You can specify anengine
into_excel
or inExcelWriter
. You can also specify which writers you want to use by default with config optionsio.excel.xlsx.writer
andio.excel.xls.writer
. (GH4745, GH4750) Panel.to_excel()
now accepts keyword arguments that will be passed to itsDataFrame
‘sto_excel()
methods. (GH4750)- Added XlsxWriter as an optional
ExcelWriter
engine. This is about 5x faster than the default openpyxl xlsx writer and is equivalent in speed to the xlwt xls writer module. (GH4542) - allow DataFrame constructor to accept more list-like objects, e.g. list of
collections.Sequence
andarray.Array
objects (GH3783, GH4297, GH4851), thanks @lgautier - DataFrame constructor now accepts a numpy masked record array (GH3478), thanks @jnothman
__getitem__
withtuple
key (e.g.,[:, 2]
) onSeries
withoutMultiIndex
raisesValueError
(GH4759, GH4837)read_json
now raises a (more informative)ValueError
when the dict contains a bad key andorient='split'
(GH4730, GH4838)read_stata
now accepts Stata 13 format (GH4291)ExcelWriter
andExcelFile
can be used as contextmanagers. (GH3441, GH4933)pandas
is now tested with two different versions ofstatsmodels
(0.4.3 and 0.5.0) (GH4981).- Better string representations of
MultiIndex
(including ability to roundtrip viarepr
). (GH3347, GH4935) - Both ExcelFile and read_excel to accept an xlrd.Book for the io (formerly path_or_buf) argument; this requires engine to be set. (GH4961).
concat
now gives a more informative error message when passed objects that cannot be concatenated (GH4608).- Add
halflife
option to exponentially weighted moving functions (PR GH4998) to_dict
now takesrecords
as a possible outtype. Returns an array of column-keyed dictionaries. (GH4936)tz_localize
can infer a fall daylight savings transition based on the structure of unlocalized data (GH4230)- DatetimeIndex is now in the API documentation
- Improve support for converting R datasets to pandas objects (more informative index for timeseries and numeric, support for factors, dist, and high-dimensional arrays).
read_html()
now supports theparse_dates
,tupleize_cols
andthousands
parameters (GH4770).json_normalize()
is a new method to allow you to create a flat table from semi-structured JSON data. See the docs (GH1067)DataFrame.from_records()
will now accept generators (GH4910)DataFrame.interpolate()
andSeries.interpolate()
have been expanded to include interpolation methods from scipy. (GH4434, GH1892)Series
now supports ato_frame
method to convert it to a single-column DataFrame (GH5164)- DatetimeIndex (and date_range) can now be constructed in a left- or
right-open fashion using the
closed
parameter (GH4579) - Python csv parser now supports usecols (GH4335)
- Added support for Google Analytics v3 API segment IDs that also supports v2 IDs. (GH5271)
NDFrame.drop()
now accepts names as well as integers for the axis argument. (GH5354)- Added short docstrings to a few methods that were missing them + fixed the docstrings for Panel flex methods. (GH5336)
NDFrame.drop()
,NDFrame.dropna()
, and.drop_duplicates()
all acceptinplace
as a keyword argument; however, this only means that the wrapper is updated inplace, a copy is still made internally. (GH1960, GH5247, GH5628, and related GH2325 [still not closed])- Fixed bug in tools.plotting.andrews_curvres so that lines are drawn grouped by color as expected.
read_excel()
now tries to convert integral floats (like1.0
) to int by default. (GH5394)- Excel writers now have a default option
merge_cells
into_excel()
to merge cells in MultiIndex and Hierarchical Rows. Note: using this option it is no longer possible to round trip Excel files with merged MultiIndex and Hierarchical Rows. Set themerge_cells
toFalse
to restore the previous behaviour. (GH5254) - The FRED DataReader now accepts multiple series (:issue`3413`)
- StataWriter adjusts variable names to Stata’s limitations (GH5709)
API Changes¶
DataFrame.reindex()
and forward/backward filling now raises ValueError if either index is not monotonic (GH4483, GH4484).pandas
now is Python 2/3 compatible without the need for 2to3 thanks to @jtratner. As a result, pandas now uses iterators more extensively. This also led to the introduction of substantive parts of the Benjamin Peterson’ssix
library into compat. (GH4384, GH4375, GH4372)pandas.util.compat
andpandas.util.py3compat
have been merged intopandas.compat
.pandas.compat
now includes many functions allowing 2/3 compatibility. It contains both list and iterator versions of range, filter, map and zip, plus other necessary elements for Python 3 compatibility.lmap
,lzip
,lrange
andlfilter
all produce lists instead of iterators, for compatibility withnumpy
, subscripting andpandas
constructors.(GH4384, GH4375, GH4372)- deprecated
iterkv
, which will be removed in a future release (was just an alias of iteritems used to get around2to3
‘s changes). (GH4384, GH4375, GH4372) Series.get
with negative indexers now returns the same as[]
(GH4390)- allow
ix/loc
for Series/DataFrame/Panel to set on any axis even when the single-key is not currently contained in the index for that axis (GH2578, GH5226, GH5632, GH5720, GH5744, GH5756) - Default export for
to_clipboard
is now csv with a sep of t for compat (GH3368) at
now will enlarge the object inplace (and return the same) (GH2578)DataFrame.plot
will scatter plot x versus y by passingkind='scatter'
(GH2215)HDFStore
append_to_multiple
automatically synchronizes writing rows to multiple tables and adds adropna
kwarg (GH4698)- handle a passed
Series
in table format (GH4330) - added an
is_open
property to indicate if the underlying file handle is_open; a closed store will now report ‘CLOSED’ when viewing the store (rather than raising an error) (GH4409) - a close of a
HDFStore
now will close that instance of theHDFStore
but will only close the actual file if the ref count (byPyTables
) w.r.t. all of the open handles are 0. Essentially you have a local instance ofHDFStore
referenced by a variable. Once you close it, it will report closed. Other references (to the same file) will continue to operate until they themselves are closed. Performing an action on a closed file will raiseClosedFileError
- removed the
_quiet
attribute, replace by aDuplicateWarning
if retrieving duplicate rows from a table (GH4367) - removed the
warn
argument fromopen
. Instead aPossibleDataLossError
exception will be raised if you try to usemode='w'
with an OPEN file handle (GH4367) - allow a passed locations array or mask as a
where
condition (GH4467) - add the keyword
dropna=True
toappend
to change whether ALL nan rows are not written to the store (default isTrue
, ALL nan rows are NOT written), also settable via the optionio.hdf.dropna_table
(GH4625) - the
format
keyword now replaces thetable
keyword; allowed values arefixed(f)|table(t)
theStorer
format has been renamed toFixed
- a column multi-index will be recreated properly (GH4710); raise on trying to use a multi-index with data_columns on the same axis
select_as_coordinates
will now return anInt64Index
of the resultant selection set- support
timedelta64[ns]
as a serialization type (GH3577) - store datetime.date objects as ordinals rather then timetuples to avoid timezone issues (GH2852), thanks @tavistmorph and @numpand
numexpr
2.2.2 fixes incompatibility in PyTables 2.4 (GH4908)flush
now accepts anfsync
parameter, which defaults toFalse
(GH5364)unicode
indices not supported ontable
formats (GH5386)- pass thru store creation arguments; can be used to support in-memory stores
JSON
Index
andMultiIndex
changes (GH4039):- Setting
levels
andlabels
directly onMultiIndex
is now deprecated. Instead, you can use theset_levels()
andset_labels()
methods. levels
,labels
andnames
properties no longer return lists, but instead return containers that do not allow setting of items (‘mostly immutable’)levels
,labels
andnames
are validated upon setting and are either copied or shallow-copied.- inplace setting of
levels
orlabels
now correctly invalidates the cached properties. (GH5238). __deepcopy__
now returns a shallow copy (currently: a view) of the data - allowing metadata changes.MultiIndex.astype()
now only allowsnp.object_
-like dtypes and now returns aMultiIndex
rather than anIndex
. (GH4039)- Added
is_
method toIndex
that allows fast equality comparison of views (similar tonp.may_share_memory
but no false positives, and changes onlevels
andlabels
setting onMultiIndex
). (GH4859 , GH4909) - Aliased
__iadd__
to__add__
. (GH4996) - Added
is_
method toIndex
that allows fast equality comparison of views (similar tonp.may_share_memory
but no false positives, and changes onlevels
andlabels
setting onMultiIndex
). (GH4859, GH4909)
- Setting
- Infer and downcast dtype if
downcast='infer'
is passed tofillna/ffill/bfill
(GH4604) __nonzero__
for all NDFrame objects, will now raise aValueError
, this reverts back to (GH1073, GH4633) behavior. Add.bool()
method toNDFrame
objects to facilitate evaluating of single-element boolean SeriesDataFrame.update()
no longer raises aDataConflictError
, it now will raise aValueError
instead (if necessary) (GH4732)Series.isin()
andDataFrame.isin()
now raise aTypeError
when passed a string (GH4763). Pass alist
of one element (containing the string) instead.- Remove undocumented/unused
kind
keyword argument fromread_excel
, andExcelFile
. (GH4713, GH4712) - The
method
argument ofNDFrame.replace()
is valid again, so that a a list can be passed toto_replace
(GH4743). - provide automatic dtype conversions on _reduce operations (GH3371)
- exclude non-numerics if mixed types with datelike in _reduce operations (GH3371)
- default for
tupleize_cols
is nowFalse
for bothto_csv
andread_csv
. Fair warning in 0.12 (GH3604) - moved timedeltas support to pandas.tseries.timedeltas.py; add timedeltas
string parsing, add top-level
to_timedelta
function NDFrame
now is compatible with Python’s toplevelabs()
function (GH4821).- raise a
TypeError
on invalid comparison ops on Series/DataFrame (e.g. integer/datetime) (GH4968) - Added a new index type,
Float64Index
. This will be automatically created when passing floating values in index creation. This enables a pure label-based slicing paradigm that makes[],ix,loc
for scalar indexing and slicing work exactly the same. Indexing on other index types are preserved (and positional fallback for[],ix
), with the exception, that floating point slicing on indexes on nonFloat64Index
will raise aTypeError
, e.g.Series(range(5))[3.5:4.5]
(GH263,:issue:5375) - Make Categorical repr nicer (GH4368)
- Remove deprecated
Factor
(GH3650) - Remove deprecated
set_printoptions/reset_printoptions
(:issue:3046
) - Remove deprecated
_verbose_info
(GH3215) - Begin removing methods that don’t make sense on
GroupBy
objects (GH4887). - Remove deprecated
read_clipboard/to_clipboard/ExcelFile/ExcelWriter
frompandas.io.parsers
(GH3717) - All non-Index NDFrames (
Series
,DataFrame
,Panel
,Panel4D
,SparsePanel
, etc.), now support the entire set of arithmetic operators and arithmetic flex methods (add, sub, mul, etc.).SparsePanel
does not supportpow
ormod
with non-scalars. (GH3765) - Arithmetic func factories are now passed real names (suitable for using with super) (GH5240)
- Provide numpy compatibility with 1.7 for a calling convention like
np.prod(pandas_object)
as numpy call with additional keyword args (GH4435) - Provide __dir__ method (and local context) for tab completion / remove ipython completers code (GH4501)
- Support non-unique axes in a Panel via indexing operations (GH4960)
.truncate
will raise aValueError
if invalid before and afters dates are given (GH5242)Timestamp
now supportsnow/today/utcnow
class methods (GH5339)- default for display.max_seq_len is now 100 rather then None. This activates truncated display (”...”) of long sequences in various places. (GH3391)
- All division with
NDFrame
- likes is now truedivision, regardless of the future import. You can use//
andfloordiv
to do integer division.
In [3]: arr = np.array([1, 2, 3, 4])
In [4]: arr2 = np.array([5, 3, 2, 1])
In [5]: arr / arr2
Out[5]: array([0, 0, 1, 4])
In [6]: pd.Series(arr) / pd.Series(arr2) # no future import required
Out[6]:
0 0.200000
1 0.666667
2 1.500000
3 4.000000
dtype: float64
- raise/warn
SettingWithCopyError/Warning
exception/warning when setting of a copy thru chained assignment is detected, settable via optionmode.chained_assignment
- test the list of
NA
values in the csv parser. addN/A
,#NA
as independent default na values (GH5521) - The refactoring involving``Series`` deriving from
NDFrame
breaksrpy2<=2.3.8
. an Issue has been opened against rpy2 and a workaround is detailed in GH5698. Thanks @JanSchulz. Series.argmin
andSeries.argmax
are now aliased toSeries.idxmin
andSeries.idxmax
. These return the index of the min or max element respectively. Prior to 0.13.0 these would return the position of the min / max element (GH6214)
Internal Refactoring¶
In 0.13.0 there is a major refactor primarily to subclass Series
from
NDFrame
, which is the base class currently for DataFrame
and Panel
,
to unify methods and behaviors. Series formerly subclassed directly from
ndarray
. (GH4080, GH3862, GH816)
See Internal Refactoring
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
- added
_setup_axes
to created generic NDFrame structures- moved methods
from_axes
,_wrap_array
,axes
,ix
,loc
,iloc
,shape
,empty
,swapaxes
,transpose
,pop
__iter__
,keys
,__contains__
,__len__
,__neg__
,__invert__
convert_objects
,as_blocks
,as_matrix
,values
__getstate__
,__setstate__
(compat remains in frame/panel)__getattr__
,__setattr__
_indexed_same
,reindex_like
,align
,where
,mask
fillna
,replace
(Series
replace is now consistent withDataFrame
)filter
(also added axis argument to selectively filter on a different axis)reindex
,reindex_axis
,take
truncate
(moved to become part ofNDFrame
)isnull/notnull
now available onNDFrame
objects
- These are API changes which make
Panel
more consistent withDataFrame
swapaxes
on aPanel
with the same axes specified now return a copy- support attribute access for setting
filter
supports same API as originalDataFrame
filterfillna
refactored tocore/generic.py
, while > 3ndim isNotImplemented
- Series now inherits from
NDFrame
rather than directly fromndarray
. There are several minor changes that affect the API.
- numpy functions that do not support the array interface will now return
ndarrays
rather than series, e.g.np.diff
,np.ones_like
,np.where
Series(0.5)
would previously return the scalar0.5
, this is no longer supportedTimeSeries
is now an alias forSeries
. the propertyis_time_series
can be used to distinguish (if desired)
- Refactor of Sparse objects to use BlockManager
- Created a new block type in internals,
SparseBlock
, which can hold multi-dtypes and is non-consolidatable.SparseSeries
andSparseDataFrame
now inherit more methods from there hierarchy (Series/DataFrame), and no longer inherit fromSparseArray
(which instead is the object of theSparseBlock
)- Sparse suite now supports integration with non-sparse data. Non-float sparse data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness, merging type operations will convert to dense (and back to sparse), so might be somewhat inefficient
- enable setitem on
SparseSeries
for boolean/integer/slicesSparsePanels
implementation is unchanged (e.g. not using BlockManager, needs work)
- added
ftypes
method to Series/DataFame, similar todtypes
, but indicates if the underlying is sparse/dense (as well as the dtype) - All
NDFrame
objects now have a_prop_attributes
, which can be used to indicate various values to propagate to a new object from an existing (e.g. name inSeries
will follow more automatically now) - Internal type checking is now done via a suite of generated classes,
allowing
isinstance(value, klass)
without having to directly import the klass, courtesy of @jtratner - Bug in Series update where the parent frame is not updating its cache based on changes (GH4080, GH5216) or types (GH3217), fillna (GH3386)
- Indexing with dtype conversions fixed (GH4463, GH4204)
- Refactor
Series.reindex
to core/generic.py (GH4604, GH4618), allowmethod=
in reindexing on a Series to work Series.copy
no longer accepts theorder
parameter and is now consistent withNDFrame
copy- Refactor
rename
methods to core/generic.py; fixesSeries.rename
for (GH4605), and addsrename
with the same signature forPanel
- Series (for index) / Panel (for items) now as attribute access to its elements (GH1903)
- Refactor
clip
methods to core/generic.py (GH4798) - Refactor of
_get_numeric_data/_get_bool_data
to core/generic.py, allowing Series/Panel functionality - Refactor of Series arithmetic with time-like objects (datetime/timedelta/time etc.) into a separate, cleaned up wrapper class. (GH4613)
- Complex compat for
Series
withndarray
. (GH4819) - Removed unnecessary
rwproperty
from codebase in favor of builtin property. (GH4843) - Refactor object level numeric methods (mean/sum/min/max...) from object
level modules to
core/generic.py
(GH4435). - Refactor cum objects to core/generic.py (GH4435), note that these have a more numpy-like function signature.
read_html()
now usesTextParser
to parse HTML data from bs4/lxml (GH4770).- Removed the
keep_internal
keyword parameter inpandas/core/groupby.py
because it wasn’t being used (GH5102). - Base
DateOffsets
are no longer all instantiated on importing pandas, instead they are generated and cached on the fly. The internal representation and handling of DateOffsets has also been clarified. (GH5189, related GH5004) MultiIndex
constructor now validates that passed levels and labels are compatible. (GH5213, GH5214)- Unity
dropna
for Series/DataFrame signature (GH5250), tests from GH5234, courtesy of @rockg - Rewrite assert_almost_equal() in cython for performance (GH4398)
- Added an internal
_update_inplace
method to facilitate updatingNDFrame
wrappers on inplace ops (only is for convenience of caller, doesn’t actually prevent copies). (GH5247)
Bug Fixes¶
HDFStore
- raising an invalid
TypeError
rather thanValueError
when appending with a different block ordering (GH4096) read_hdf
was not respecting as passedmode
(GH4504)- appending a 0-len table will work correctly (GH4273)
to_hdf
was raising when passing both argumentsappend
andtable
(GH4584)- reading from a store with duplicate columns across dtypes would raise (GH4767)
- Fixed a bug where
ValueError
wasn’t correctly raised when column names weren’t strings (GH4956) - A zero length series written in Fixed format not deserializing properly. (GH4708)
- Fixed decoding perf issue on pyt3 (GH5441)
- Validate levels in a multi-index before storing (GH5527)
- Correctly handle
data_columns
with a Panel (GH5717)
- raising an invalid
- Fixed bug in tslib.tz_convert(vals, tz1, tz2): it could raise IndexError exception while trying to access trans[pos + 1] (GH4496)
- The
by
argument now works correctly with thelayout
argument (GH4102, GH4014) in*.hist
plotting methods - Fixed bug in
PeriodIndex.map
where usingstr
would return the str representation of the index (GH4136) - Fixed test failure
test_time_series_plot_color_with_empty_kwargs
when using custom matplotlib default colors (GH4345) - Fix running of stata IO tests. Now uses temporary files to write (GH4353)
- Fixed an issue where
DataFrame.sum
was slower thanDataFrame.mean
for integer valued frames (GH4365) read_html
tests now work with Python 2.6 (GH4351)- Fixed bug where
network
testing was throwingNameError
because a local variable was undefined (GH4381) - In
to_json
, raise if a passedorient
would cause loss of data because of a duplicate index (GH4359) - In
to_json
, fix date handling so milliseconds are the default timestamp as the docstring says (GH4362). as_index
is no longer ignored when doing groupby apply (GH4648, GH3417)- JSON NaT handling fixed, NaTs are now serialized to null (GH4498)
- Fixed JSON handling of escapable characters in JSON object keys (GH4593)
- Fixed passing
keep_default_na=False
whenna_values=None
(GH4318) - Fixed bug with
values
raising an error on a DataFrame with duplicate columns and mixed dtypes, surfaced in (GH4377) - Fixed bug with duplicate columns and type conversion in
read_json
whenorient='split'
(GH4377) - Fixed JSON bug where locales with decimal separators other than ‘.’ threw exceptions when encoding / decoding certain values. (GH4918)
- Fix
.iat
indexing with aPeriodIndex
(GH4390) - Fixed an issue where
PeriodIndex
joining with self was returning a new instance rather than the same instance (GH4379); also adds a test for this for the other index types - Fixed a bug with all the dtypes being converted to object when using the CSV cparser with the usecols parameter (GH3192)
- Fix an issue in merging blocks where the resulting DataFrame had partially set _ref_locs (GH4403)
- Fixed an issue where hist subplots were being overwritten when they were called using the top level matplotlib API (GH4408)
- Fixed a bug where calling
Series.astype(str)
would truncate the string (GH4405, GH4437) - Fixed a py3 compat issue where bytes were being repr’d as tuples (GH4455)
- Fixed Panel attribute naming conflict if item is named ‘a’ (GH3440)
- Fixed an issue where duplicate indexes were raising when plotting (GH4486)
- Fixed an issue where cumsum and cumprod didn’t work with bool dtypes (GH4170, GH4440)
- Fixed Panel slicing issued in
xs
that was returning an incorrect dimmed object (GH4016) - Fix resampling bug where custom reduce function not used if only one group (GH3849, GH4494)
- Fixed Panel assignment with a transposed frame (GH3830)
- Raise on set indexing with a Panel and a Panel as a value which needs alignment (GH3777)
- frozenset objects now raise in the
Series
constructor (GH4482, GH4480) - Fixed issue with sorting a duplicate multi-index that has multiple dtypes (GH4516)
- Fixed bug in
DataFrame.set_values
which was causing name attributes to be lost when expanding the index. (GH3742, GH4039) - Fixed issue where individual
names
,levels
andlabels
could be set onMultiIndex
without validation (GH3714, GH4039) - Fixed (GH3334) in pivot_table. Margins did not compute if values is the index.
- Fix bug in having a rhs of
np.timedelta64
ornp.offsets.DateOffset
when operating with datetimes (GH4532) - Fix arithmetic with series/datetimeindex and
np.timedelta64
not working the same (GH4134) and buggy timedelta in numpy 1.6 (GH4135) - Fix bug in
pd.read_clipboard
on windows with PY3 (GH4561); not decoding properly tslib.get_period_field()
andtslib.get_period_field_arr()
now raise if code argument out of range (GH4519, GH4520)- Fix boolean indexing on an empty series loses index names (GH4235), infer_dtype works with empty arrays.
- Fix reindexing with multiple axes; if an axes match was not replacing the current axes, leading to a possible lazay frequency inference issue (GH3317)
- Fixed issue where
DataFrame.apply
was reraising exceptions incorrectly (causing the original stack trace to be truncated). - Fix selection with
ix/loc
and non_unique selectors (GH4619) - Fix assignment with iloc/loc involving a dtype change in an existing column (GH4312, GH5702) have internal setitem_with_indexer in core/indexing to use Block.setitem
- Fixed bug where thousands operator was not handled correctly for floating point numbers in csv_import (GH4322)
- Fix an issue with CacheableOffset not properly being used by many DateOffset; this prevented the DateOffset from being cached (GH4609)
- Fix boolean comparison with a DataFrame on the lhs, and a list/tuple on the rhs (GH4576)
- Fix error/dtype conversion with setitem of
None
onSeries/DataFrame
(GH4667) - Fix decoding based on a passed in non-default encoding in
pd.read_stata
(GH4626) - Fix
DataFrame.from_records
with a plain-vanillandarray
. (GH4727) - Fix some inconsistencies with
Index.rename
andMultiIndex.rename
, etc. (GH4718, GH4628) - Bug in using
iloc/loc
with a cross-sectional and duplicate indicies (GH4726) - Bug with using
QUOTE_NONE
withto_csv
causingException
. (GH4328) - Bug with Series indexing not raising an error when the right-hand-side has an incorrect length (GH2702)
- Bug in multi-indexing with a partial string selection as one part of a MultIndex (GH4758)
- Bug with reindexing on the index with a non-unique index will now raise
ValueError
(GH4746) - Bug in setting with
loc/ix
a single indexer with a multi-index axis and a numpy array, related to (GH3777) - Bug in concatenation with duplicate columns across dtypes not merging with axis=0 (GH4771, GH4975)
- Bug in
iloc
with a slice index failing (GH4771) - Incorrect error message with no colspecs or width in
read_fwf
. (GH4774) - Fix bugs in indexing in a Series with a duplicate index (GH4548, GH4550)
- Fixed bug with reading compressed files with
read_fwf
in Python 3. (GH3963) - Fixed an issue with a duplicate index and assignment with a dtype change (GH4686)
- Fixed bug with reading compressed files in as
bytes
rather thanstr
in Python 3. Simplifies bytes-producing file-handling in Python 3 (GH3963, GH4785). - Fixed an issue related to ticklocs/ticklabels with log scale bar plots across different versions of matplotlib (GH4789)
- Suppressed DeprecationWarning associated with internal calls issued by repr() (GH4391)
- Fixed an issue with a duplicate index and duplicate selector with
.loc
(GH4825) - Fixed an issue with
DataFrame.sort_index
where, when sorting by a single column and passing a list forascending
, the argument forascending
was being interpreted asTrue
(GH4839, GH4846) - Fixed
Panel.tshift
not working. Added freq support toPanel.shift
(GH4853) - Fix an issue in TextFileReader w/ Python engine (i.e. PythonParser) with thousands != ”,” (GH4596)
- Bug in getitem with a duplicate index when using where (GH4879)
- Fix Type inference code coerces float column into datetime (GH4601)
- Fixed
_ensure_numeric
does not check for complex numbers (GH4902) - Fixed a bug in
Series.hist
where two figures were being created when theby
argument was passed (GH4112, GH4113). - Fixed a bug in
convert_objects
for > 2 ndims (GH4937) - Fixed a bug in DataFrame/Panel cache insertion and subsequent indexing (GH4939, GH5424)
- Fixed string methods for
FrozenNDArray
andFrozenList
(GH4929) - Fixed a bug with setting invalid or out-of-range values in indexing enlargement scenarios (GH4940)
- Tests for fillna on empty Series (GH4346), thanks @immerrr
- Fixed
copy()
to shallow copy axes/indices as well and thereby keep separate metadata. (GH4202, GH4830) - Fixed skiprows option in Python parser for read_csv (GH4382)
- Fixed bug preventing
cut
from working withnp.inf
levels without explicitly passing labels (GH3415) - Fixed wrong check for overlapping in
DatetimeIndex.union
(GH4564) - Fixed conflict between thousands separator and date parser in csv_parser (GH4678)
- Fix appending when dtypes are not the same (error showing mixing float/np.datetime64) (GH4993)
- Fix repr for DateOffset. No longer show duplicate entries in kwds. Removed unused offset fields. (GH4638)
- Fixed wrong index name during read_csv if using usecols. Applies to c parser only. (GH4201)
Timestamp
objects can now appear in the left hand side of a comparison operation with aSeries
orDataFrame
object (GH4982).- Fix a bug when indexing with
np.nan
viailoc/loc
(GH5016) - Fixed a bug where low memory c parser could create different types in different chunks of the same file. Now coerces to numerical type or raises warning. (GH3866)
- Fix a bug where reshaping a
Series
to its own shape raisedTypeError
(GH4554) and other reshaping issues. - Bug in setting with
ix/loc
and a mixed int/string index (GH4544) - Make sure series-series boolean comparisons are label based (GH4947)
- Bug in multi-level indexing with a Timestamp partial indexer (GH4294)
- Tests/fix for multi-index construction of an all-nan frame (GH4078)
- Fixed a bug where
read_html()
wasn’t correctly inferring values of tables with commas (GH5029) - Fixed a bug where
read_html()
wasn’t providing a stable ordering of returned tables (GH4770, GH5029). - Fixed a bug where
read_html()
was incorrectly parsing when passedindex_col=0
(GH5066). - Fixed a bug where
read_html()
was incorrectly inferring the type of headers (GH5048). - Fixed a bug where
DatetimeIndex
joins withPeriodIndex
caused a stack overflow (GH3899). - Fixed a bug where
groupby
objects didn’t allow plots (GH5102). - Fixed a bug where
groupby
objects weren’t tab-completing column names (GH5102). - Fixed a bug where
groupby.plot()
and friends were duplicating figures multiple times (GH5102). - Provide automatic conversion of
object
dtypes on fillna, related (GH5103) - Fixed a bug where default options were being overwritten in the option parser cleaning (GH5121).
- Treat a list/ndarray identically for
iloc
indexing with list-like (GH5006) - Fix
MultiIndex.get_level_values()
with missing values (GH5074) - Fix bound checking for Timestamp() with datetime64 input (GH4065)
- Fix a bug where
TestReadHtml
wasn’t calling the correctread_html()
function (GH5150). - Fix a bug with
NDFrame.replace()
which made replacement appear as though it was (incorrectly) using regular expressions (GH5143). - Fix better error message for to_datetime (GH4928)
- Made sure different locales are tested on travis-ci (GH4918). Also adds a couple of utilities for getting locales and setting locales with a context manager.
- Fixed segfault on
isnull(MultiIndex)
(now raises an error instead) (GH5123, GH5125) - Allow duplicate indices when performing operations that align (GH5185, GH5639)
- Compound dtypes in a constructor raise
NotImplementedError
(GH5191) - Bug in comparing duplicate frames (GH4421) related
- Bug in describe on duplicate frames
- Bug in
to_datetime
with a format andcoerce=True
not raising (GH5195) - Bug in
loc
setting with multiple indexers and a rhs of a Series that needs broadcasting (GH5206) - Fixed bug where inplace setting of levels or labels on
MultiIndex
would not clear cachedvalues
property and therefore return wrongvalues
. (GH5215) - Fixed bug where filtering a grouped DataFrame or Series did not maintain the original ordering (GH4621).
- Fixed
Period
with a business date freq to always roll-forward if on a non-business date. (GH5203) - Fixed bug in Excel writers where frames with duplicate column names weren’t written correctly. (GH5235)
- Fixed issue with
drop
and a non-unique index on Series (GH5248) - Fixed seg fault in C parser caused by passing more names than columns in the file. (GH5156)
- Fix
Series.isin
with date/time-like dtypes (GH5021) - C and Python Parser can now handle the more common multi-index column format which doesn’t have a row for index names (GH4702)
- Bug when trying to use an out-of-bounds date as an object dtype (GH5312)
- Bug when trying to display an embedded PandasObject (GH5324)
- Allows operating of Timestamps to return a datetime if the result is out-of-bounds related (GH5312)
- Fix return value/type signature of
initObjToJSON()
to be compatible with numpy’simport_array()
(GH5334, GH5326) - Bug when renaming then set_index on a DataFrame (GH5344)
- Test suite no longer leaves around temporary files when testing graphics. (GH5347) (thanks for catching this @yarikoptic!)
- Fixed html tests on win32. (GH4580)
- Make sure that
head/tail
areiloc
based, (GH5370) - Fixed bug for
PeriodIndex
string representation if there are 1 or 2 elements. (GH5372) - The GroupBy methods
transform
andfilter
can be used on Series and DataFrames that have repeated (non-unique) indices. (GH4620) - Fix empty series not printing name in repr (GH4651)
- Make tests create temp files in temp directory by default. (GH5419)
pd.to_timedelta
of a scalar returns a scalar (GH5410)pd.to_timedelta
acceptsNaN
andNaT
, returningNaT
instead of raising (GH5437)- performance improvements in
isnull
on larger size pandas objects - Fixed various setitem with 1d ndarray that does not have a matching length to the indexer (GH5508)
- Bug in getitem with a multi-index and
iloc
(GH5528) - Bug in delitem on a Series (GH5542)
- Bug fix in apply when using custom function and objects are not mutated (GH5545)
- Bug in selecting from a non-unique index with
loc
(GH5553) - Bug in groupby returning non-consistent types when user function returns a
None
, (GH5592) - Work around regression in numpy 1.7.0 which erroneously raises IndexError from
ndarray.item
(GH5666) - Bug in repeated indexing of object with resultant non-unique index (GH5678)
- Bug in fillna with Series and a passed series/dict (GH5703)
- Bug in groupby transform with a datetime-like grouper (GH5712)
- Bug in multi-index selection in PY3 when using certain keys (GH5725)
- Row-wise concat of differing dtypes failing in certain cases (GH5754)
pandas 0.12.0¶
Release date: 2013-07-24
New Features¶
pd.read_html()
can now parse HTML strings, files or urls and returns a list ofDataFrame
s courtesy of @cpcloud. (GH3477, GH3605, GH3606)- Support for reading Amazon S3 files. (GH3504)
- Added module for reading and writing JSON strings/files: pandas.io.json
includes
to_json
DataFrame/Series method, and aread_json
top-level reader various issues (GH1226, GH3804, GH3876, GH3867, GH1305) - Added module for reading and writing Stata files: pandas.io.stata (GH1512)
includes
to_stata
DataFrame method, and aread_stata
top-level reader - Added support for writing in
to_csv
and reading inread_csv
, multi-index columns. Theheader
option inread_csv
now accepts a list of the rows from which to read the index. Added the option,tupleize_cols
to provide compatibility for the pre 0.12 behavior of writing and reading multi-index columns via a list of tuples. The default in 0.12 is to write lists of tuples and not interpret list of tuples as a multi-index column. Note: The default value will change in 0.12 to make the default to write and read multi-index columns in the new format. (GH3571, GH1651, GH3141) - Add iterator to
Series.str
(GH3638) pd.set_option()
now allows N option, value pairs (GH3667).- Added keyword parameters for different types of scatter_matrix subplots
- A
filter
method on grouped Series or DataFrames returns a subset of the original (GH3680, GH919) - Access to historical Google Finance data in pandas.io.data (GH3814)
- DataFrame plotting methods can sample column colors from a Matplotlib
colormap via the
colormap
keyword. (GH3860)
Improvements to existing features¶
- Fixed various issues with internal pprinting code, the repr() for various objects including TimeStamp and Index now produces valid python code strings and can be used to recreate the object, (GH3038, GH3379, GH3251, GH3460)
convert_objects
now accepts acopy
parameter (defaults toTrue
)HDFStore
- will retain index attributes (freq,tz,name) on recreation (GH3499,:issue:4098)
- will warn with a
AttributeConflictWarning
if you are attempting to append an index with a different frequency than the existing, or attempting to append an index with a different name than the existing - support datelike columns with a timezone as data_columns (GH2852)
- table writing performance improvements.
- support python3 (via
PyTables 3.0.0
) (GH3750)
- Add modulo operator to Series, DataFrame
- Add
date
method to DatetimeIndex - Add
dropna
argument to pivot_table (:issue: 3820) - Simplified the API and added a describe method to Categorical
melt
now accepts the optional parametersvar_name
andvalue_name
to specify custom column names of the returned DataFrame (GH3649), thanks @hoechenberger. Ifvar_name
is not specified anddataframe.columns.name
is not None, then this will be used as thevar_name
(GH4144). Also support for MultiIndex columns.- clipboard functions use pyperclip (no dependencies on Windows, alternative dependencies offered for Linux) (GH3837).
- Plotting functions now raise a
TypeError
before trying to plot anything if the associated objects have have a dtype ofobject
(GH1818, GH3572, GH3911, GH3912), but they will try to convert object arrays to numeric arrays if possible so that you can still plot, for example, an object array with floats. This happens before any drawing takes place which eliminates any spurious plots from showing up. - Added Faq section on repr display options, to help users customize their setup.
where
operations that result in block splitting are much faster (GH3733)- Series and DataFrame hist methods now take a
figsize
argument (GH3834) - DatetimeIndexes no longer try to convert mixed-integer indexes during join operations (GH3877)
- Add
unit
keyword toTimestamp
andto_datetime
to enable passing of integers or floats that are in an epoch unit ofD, s, ms, us, ns
, thanks @mtkini (GH3969) (e.g. unix timestamps or epochs
, with fractional seconds allowed) (GH3540) - DataFrame corr method (spearman) is now cythonized.
- Improved
network
test decorator to catchIOError
(and thereforeURLError
as well). Addedwith_connectivity_check
decorator to allow explicitly checking a website as a proxy for seeing if there is network connectivity. Plus, newoptional_args
decorator factory for decorators. (GH3910, GH3914) read_csv
will now throw a more informative error message when a file contains no columns, e.g., all newline characters- Added
layout
keyword to DataFrame.hist() for more customizable layout (GH4050) - Timestamp.min and Timestamp.max now represent valid Timestamp instances instead of the default datetime.min and datetime.max (respectively), thanks @SleepingPills
read_html
now raises when no tables are found and BeautifulSoup==4.2.0 is detected (GH4214)
API Changes¶
HDFStore
- When removing an object,
remove(key)
raisesKeyError
if the key is not a valid store object. - raise a
TypeError
on passingwhere
orcolumns
to select with a Storer; these are invalid parameters at this time (GH4189) - can now specify an
encoding
option toappend/put
to enable alternate encodings (GH3750) - enable support for
iterator/chunksize
withread_hdf
- When removing an object,
- The repr() for (Multi)Index now obeys display.max_seq_items rather then numpy threshold print options. (GH3426, GH3466)
- Added mangle_dupe_cols option to read_table/csv, allowing users to control legacy behaviour re dupe cols (A, A.1, A.2 vs A, A ) (GH3468) Note: The default value will change in 0.12 to the “no mangle” behaviour, If your code relies on this behaviour, explicitly specify mangle_dupe_cols=True in your calls.
- Do not allow astypes on
datetime64[ns]
except toobject
, andtimedelta64[ns]
toobject/int
(GH3425) - The behavior of
datetime64
dtypes has changed with respect to certain so-called reduction operations (GH3726). The following operations now raise aTypeError
when performed on aSeries
and return an emptySeries
when performed on aDataFrame
similar to performing these operations on, for example, aDataFrame
ofslice
objects: - sum, prod, mean, std, var, skew, kurt, corr, and cov - Do not allow datetimelike/timedeltalike creation except with valid types
(e.g. cannot pass
datetime64[ms]
) (GH3423) - Add
squeeze
keyword togroupby
to allow reduction from DataFrame -> Series if groups are unique. Regression from 0.10.1, partial revert on (GH2893) with (GH3596) - Raise on
iloc
when boolean indexing with a label based indexer mask e.g. a boolean Series, even with integer labels, will raise. Sinceiloc
is purely positional based, the labels on the Series are not alignable (GH3631) - The
raise_on_error
option to plotting methods is obviated by GH3572, so it is removed. Plots now always raise when data cannot be plotted or the object being plotted has a dtype ofobject
. DataFrame.interpolate()
is now deprecated. Please useDataFrame.fillna()
andDataFrame.replace()
instead (GH3582, GH3675, GH3676).- the
method
andaxis
arguments ofDataFrame.replace()
are deprecated DataFrame.replace
‘sinfer_types
parameter is removed and now performs conversion by default. (GH3907)- Deprecated display.height, display.width is now only a formatting option does not control triggering of summary, similar to < 0.11.0.
- Add the keyword
allow_duplicates
toDataFrame.insert
to allow a duplicate column to be inserted ifTrue
, default isFalse
(same as prior to 0.12) (GH3679) - io API changes
- added
pandas.io.api
for i/o imports - removed
Excel
support topandas.io.excel
- added top-level
pd.read_sql
andto_sql
DataFrame methods - removed
clipboard
support topandas.io.clipboard
- replace top-level and instance methods
save
andload
with top-levelread_pickle
andto_pickle
instance method,save
andload
will give deprecation warning.
- added
- the
method
andaxis
arguments ofDataFrame.replace()
are deprecated - set FutureWarning to require data_source, and to replace year/month with expiry date in pandas.io options. This is in preparation to add options data from Google (GH3822)
- the
method
andaxis
arguments ofDataFrame.replace()
are deprecated - Implement
__nonzero__
forNDFrame
objects (GH3691, GH3696) as_matrix
with mixed signed and unsigned dtypes will result in 2 x the lcd of the unsigned as an int, maxing withint64
, to avoid precision issues (GH3733)na_values
in a list provided toread_csv/read_excel
will match string and numeric versions e.g.na_values=['99']
will match 99 whether the column ends up being int, float, or string (GH3611)read_html
now defaults toNone
when reading, and falls back onbs4
+html5lib
when lxml fails to parse. a list of parsers to try until success is also valid- more consistency in the to_datetime return types (give string/array of string inputs) (GH3888)
- The internal
pandas
class hierarchy has changed (slightly). The previousPandasObject
now is calledPandasContainer
and a newPandasObject
has become the baseclass forPandasContainer
as well asIndex
,Categorical
,GroupBy
,SparseList
, andSparseArray
(+ their base classes). Currently,PandasObject
provides string methods (fromStringMixin
). (GH4090, GH4092) - New
StringMixin
that, given a__unicode__
method, gets python 2 and python 3 compatible string methods (__str__
,__bytes__
, and__repr__
). Plus string safety throughout. Now employed in many places throughout the pandas library. (GH4090, GH4092)
Experimental Features¶
- Added experimental
CustomBusinessDay
class to supportDateOffsets
with custom holiday calendars and custom weekmasks. (GH2301)
Bug Fixes¶
- Fixed an esoteric excel reading bug, xlrd>= 0.9.0 now required for excel support. Should provide python3 support (for reading) which has been lacking. (GH3164)
- Disallow Series constructor called with MultiIndex which caused segfault (GH4187)
- Allow unioning of date ranges sharing a timezone (GH3491)
- Fix to_csv issue when having a large number of rows and
NaT
in some columns (GH3437) .loc
was not raising when passed an integer list (GH3449)- Unordered time series selection was misbehaving when using label slicing (GH3448)
- Fix sorting in a frame with a list of columns which contains datetime64[ns] dtypes (GH3461)
- DataFrames fetched via FRED now handle ‘.’ as a NaN. (GH3469)
- Fix regression in a DataFrame apply with axis=1, objects were not being converted back to base dtypes correctly (GH3480)
- Fix issue when storing uint dtypes in an HDFStore. (GH3493)
- Non-unique index support clarified (GH3468)
- Addressed handling of dupe columns in df.to_csv new and old (GH3454, GH3457)
- Fix assigning a new index to a duplicate index in a DataFrame would fail (GH3468)
- Fix construction of a DataFrame with a duplicate index
- ref_locs support to allow duplicative indices across dtypes, allows iget support to always find the index (even across dtypes) (GH2194)
- applymap on a DataFrame with a non-unique index now works (removed warning) (GH2786), and fix (GH3230)
- Fix to_csv to handle non-unique columns (GH3495)
- Duplicate indexes with getitem will return items in the correct order (GH3455, GH3457) and handle missing elements like unique indices (GH3561)
- Duplicate indexes with and empty DataFrame.from_records will return a correct frame (GH3562)
- Concat to produce a non-unique columns when duplicates are across dtypes is fixed (GH3602)
- Non-unique indexing with a slice via
loc
and friends fixed (GH3659) - Allow insert/delete to non-unique columns (GH3679)
- Extend
reindex
to correctly deal with non-unique indices (GH3679) DataFrame.itertuples()
now works with frames with duplicate column names (GH3873)- Bug in non-unique indexing via
iloc
(GH4017); addedtakeable
argument toreindex
for location-based taking - Allow non-unique indexing in series via
.ix/.loc
and__getitem__
(GH4246) - Fixed non-unique indexing memory allocation issue with
.ix/.loc
(GH4280)
- Fixed bug in groupby with empty series referencing a variable before assignment. (GH3510)
- Allow index name to be used in groupby for non MultiIndex (GH4014)
- Fixed bug in mixed-frame assignment with aligned series (GH3492)
- Fixed bug in selecting month/quarter/year from a series would not select the time element on the last day (GH3546)
- Fixed a couple of MultiIndex rendering bugs in df.to_html() (GH3547, GH3553)
- Properly convert np.datetime64 objects in a Series (GH3416)
- Raise a
TypeError
on invalid datetime/timedelta operations e.g. add datetimes, multiple timedelta x datetime - Fix
.diff
on datelike and timedelta operations (GH3100) combine_first
not returning the same dtype in cases where it can (GH3552)- Fixed bug with
Panel.transpose
argument aliases (GH3556) - Fixed platform bug in
PeriodIndex.take
(GH3579) - Fixed bud in incorrect conversion of datetime64[ns] in
combine_first
(GH3593) - Fixed bug in reset_index with
NaN
in a multi-index (GH3586) fillna
methods now raise aTypeError
when thevalue
parameter is alist
ortuple
.- Fixed bug where a time-series was being selected in preference to an actual column name in a frame (GH3594)
- Make secondary_y work properly for bar plots (GH3598)
- Fix modulo and integer division on Series,DataFrames to act similary to
float
dtypes to returnnp.nan
ornp.inf
as appropriate (GH3590) - Fix incorrect dtype on groupby with
as_index=False
(GH3610) - Fix
read_csv/read_excel
to correctly encode identical na_values, e.g.na_values=[-999.0,-999]
was failing (GH3611) - Disable HTML output in qtconsole again. (GH3657)
- Reworked the new repr display logic, which users found confusing. (GH3663)
- Fix indexing issue in ndim >= 3 with
iloc
(GH3617) - Correctly parse date columns with embedded (nan/NaT) into datetime64[ns] dtype in
read_csv
whenparse_dates
is specified (GH3062) - Fix not consolidating before to_csv (GH3624)
- Fix alignment issue when setitem in a DataFrame with a piece of a DataFrame (GH3626) or a mixed DataFrame and a Series (GH3668)
- Fix plotting of unordered DatetimeIndex (GH3601)
sql.write_frame
failing when writing a single column to sqlite (GH3628), thanks to @stonebig- Fix pivoting with
nan
in the index (GH3558) - Fix running of bs4 tests when it is not installed (GH3605)
- Fix parsing of html table (GH3606)
read_html()
now only allows a single backend:html5lib
(GH3616)convert_objects
withconvert_dates='coerce'
was parsing some single-letter strings into today’s dateDataFrame.from_records
did not accept empty recarrays (GH3682)DataFrame.to_csv
will succeed with the deprecated optionnanRep
, @tdsmithDataFrame.to_html
andDataFrame.to_latex
now accept a path for their first argument (GH3702)- Fix file tokenization error with r delimiter and quoted fields (GH3453)
- Groupby transform with item-by-item not upcasting correctly (GH3740)
- Incorrectly read a HDFStore multi-index Frame with a column specification (GH3748)
read_html
now correctly skips tests (GH3741)- PandasObjects raise TypeError when trying to hash (GH3882)
- Fix incorrect arguments passed to concat that are not list-like (e.g. concat(df1,df2)) (GH3481)
- Correctly parse when passed the
dtype=str
(or other variable-len string dtypes) inread_csv
(GH3795) - Fix index name not propagating when using
loc/ix
(GH3880) - Fix groupby when applying a custom function resulting in a returned DataFrame was not converting dtypes (GH3911)
- Fixed a bug where
DataFrame.replace
with a compiled regular expression in theto_replace
argument wasn’t working (GH3907) - Fixed
__truediv__
in Python 2.7 withnumexpr
installed to actually do true division when dividing two integer arrays with at least 10000 cells total (GH3764) - Indexing with a string with seconds resolution not selecting from a time index (GH3925)
- csv parsers would loop infinitely if
iterator=True
but nochunksize
was specified (GH3967), python parser failing withchunksize=1
- Fix index name not propagating when using
shift
- Fixed dropna=False being ignored with multi-index stack (GH3997)
- Fixed flattening of columns when renaming MultiIndex columns DataFrame (GH4004)
- Fix
Series.clip
for datetime series. NA/NaN threshold values will now throw ValueError (GH3996) - Fixed insertion issue into DataFrame, after rename (GH4032)
- Fixed testing issue where too many sockets where open thus leading to a connection reset issue (GH3982, GH3985, GH4028, GH4054)
- Fixed failing tests in test_yahoo, test_google where symbols were not retrieved but were being accessed (GH3982, GH3985, GH4028, GH4054)
Series.hist
will now take the figure from the current environment if one is not passed- Fixed bug where a 1xN DataFrame would barf on a 1xN mask (GH4071)
- Fixed running of
tox
under python3 where the pickle import was getting rewritten in an incompatible way (GH4062, GH4063) - Fixed bug where sharex and sharey were not being passed to grouped_hist (GH4089)
- Fix bug where
HDFStore
will fail to append because of a different block ordering on-disk (GH4096) - Better error messages on inserting incompatible columns to a frame (GH4107)
- Fixed bug in
DataFrame.replace
where a nested dict wasn’t being iterated over when regex=False (GH4115) - Fixed bug in
convert_objects(convert_numeric=True)
where a mixed numeric and object Series/Frame was not converting properly (GH4119) - Fixed bugs in multi-index selection with column multi-index and duplicates (GH4145, GH4146)
- Fixed bug in the parsing of microseconds when using the
format
argument into_datetime
(GH4152) - Fixed bug in
PandasAutoDateLocator
whereinvert_xaxis
triggered incorrectlyMilliSecondLocator
(GH3990) - Fixed bug in
Series.where
where broadcasting a single element input vector to the length of the series resulted in multiplying the value inside the input (GH4192) - Fixed bug in plotting that wasn’t raising on invalid colormap for matplotlib 1.1.1 (GH4215)
- Fixed the legend displaying in
DataFrame.plot(kind='kde')
(GH4216) - Fixed bug where Index slices weren’t carrying the name attribute (GH4226)
- Fixed bug in initializing
DatetimeIndex
with an array of strings in a certain time zone (GH4229) - Fixed bug where html5lib wasn’t being properly skipped (GH4265)
- Fixed bug where get_data_famafrench wasn’t using the correct file edges (GH4281)
pandas 0.11.0¶
Release date: 2013-04-22
New Features¶
- New documentation section,
10 Minutes to Pandas
- New documentation section,
Cookbook
- Allow mixed dtypes (e.g
float32/float64/int32/int16/int8
) to coexist in DataFrames and propagate in operations - Add function to pandas.io.data for retrieving stock index components from Yahoo! finance (GH2795)
- Support slicing with time objects (GH2681)
- Added
.iloc
attribute, to support strict integer based indexing, analogous to.ix
(GH2922) - Added
.loc
attribute, to support strict label based indexing, analogous to.ix
(GH3053) - Added
.iat
attribute, to support fast scalar access via integers (replacesiget_value/iset_value
) - Added
.at
attribute, to support fast scalar access via labels (replacesget_value/set_value
) - Moved functionality from
irow,icol,iget_value/iset_value
to.iloc
indexer (via_ixs
methods in each object) - Added support for expression evaluation using the
numexpr
library - Added
convert=boolean
totake
routines to translate negative indices to positive, defaults to True - Added to_series() method to indices, to facilitate the creation of indexers (GH3275)
Improvements to existing features¶
Improved performance of df.to_csv() by up to 10x in some cases. (GH3059)
added
blocks
attribute to DataFrames, to return a dict of dtypes to homogeneously dtyped DataFramesadded keyword
convert_numeric
toconvert_objects()
to try to convert object dtypes to numeric types (default is False)convert_dates
inconvert_objects
can now becoerce
which will return a datetime64[ns] dtype with non-convertibles set asNaT
; will preserve an all-nan object (e.g. strings), default is True (to perform soft-conversionSeries print output now includes the dtype by default
describe_option()
now reports the default and current value of options.Add
format
option topandas.to_datetime
with faster conversion of strings that can be parsed with datetime.strptimeAdd
axes
property toSeries
for compatibilityAdd
xs
function toSeries
for compatibilityAllow setitem in a frame where only mixed numerics are present (e.g. int and float), (GH3037)
HDFStore
Add
squeeze
method to possibly remove length 1 dimensions from an object.In [1]: p = pd.Panel(np.random.randn(3,4,4),items=['ItemA','ItemB','ItemC'], ...: major_axis=pd.date_range('20010102',periods=4), ...: minor_axis=['A','B','C','D']) ...: In [2]: p Out[2]: <class 'pandas.core.panel.Panel'> Dimensions: 3 (items) x 4 (major_axis) x 4 (minor_axis) Items axis: ItemA to ItemC Major_axis axis: 2001-01-02 00:00:00 to 2001-01-05 00:00:00 Minor_axis axis: A to D In [3]: p.reindex(items=['ItemA']).squeeze() Out[3]: A B C D 2001-01-02 0.469112 -0.282863 -1.509059 -1.135632 2001-01-03 1.212112 -0.173215 0.119209 -1.044236 2001-01-04 -0.861849 -2.104569 -0.494929 1.071804 2001-01-05 0.721555 -0.706771 -1.039575 0.271860 In [4]: p.reindex(items=['ItemA'],minor=['B']).squeeze() Out[4]: 2001-01-02 -0.282863 2001-01-03 -0.173215 2001-01-04 -2.104569 2001-01-05 -0.706771 Freq: D, Name: B, dtype: float64
Improvement to Yahoo API access in
pd.io.data.Options
(GH2758)added option display.max_seq_items to control the number of elements printed per sequence pprinting it. (GH2979)
added option display.chop_threshold to control display of small numerical values. (GH2739)
added option display.max_info_rows to prevent verbose_info from being calculated for frames above 1M rows (configurable). (GH2807, GH2918)
value_counts() now accepts a “normalize” argument, for normalized histograms. (GH2710).
DataFrame.from_records now accepts not only dicts but any instance of the collections.Mapping ABC.
Allow selection semantics via a string with a datelike index to work in both Series and DataFrames (GH3070)
In [5]: idx = pd.date_range("2001-10-1", periods=5, freq='M') In [6]: ts = pd.Series(np.random.rand(len(idx)),index=idx) In [7]: ts['2001'] Out[7]: 2001-10-31 0.838796 2001-11-30 0.897333 2001-12-31 0.732592 Freq: M, dtype: float64 In [8]: df = pd.DataFrame(dict(A = ts)) In [9]: df['2001'] Out[9]: A 2001-10-31 0.838796 2001-11-30 0.897333 2001-12-31 0.732592
added option display.mpl_style providing a sleeker visual style for plots. Based on https://gist.github.com/huyng/816622 (GH3075).
Improved performance across several core functions by taking memory ordering of arrays into account. Courtesy of @stephenwlin (GH3130)
Improved performance of groupby transform method (GH2121)
Handle “ragged” CSV files missing trailing delimiters in rows with missing fields when also providing explicit list of column names (so the parser knows how many columns to expect in the result) (GH2981)
On a mixed DataFrame, allow setting with indexers with ndarray/DataFrame on rhs (GH3216)
Treat boolean values as integers (values 1 and 0) for numeric operations. (GH2641)
Add
time
method to DatetimeIndex (GH3180)Return NA when using Series.str[...] for values that are not long enough (GH3223)
Display cursor coordinate information in time-series plots (GH1670)
to_html() now accepts an optional “escape” argument to control reserved HTML character escaping (enabled by default) and escapes
&
, in addition to<
and>
. (GH2919)
API Changes¶
Do not automatically upcast numeric specified dtypes to
int64
orfloat64
(GH622 and GH797)DataFrame construction of lists and scalars, with no dtype present, will result in casting to
int64
orfloat64
, regardless of platform. This is not an apparent change in the API, but noting it.Guarantee that
convert_objects()
for Series/DataFrame always returns a copygroupby operations will respect dtypes for numeric float operations (float32/float64); other types will be operated on, and will try to cast back to the input dtype (e.g. if an int is passed, as long as the output doesn’t have nans, then an int will be returned)
backfill/pad/take/diff/ohlc will now support
float32/int16/int8
operationsBlock types will upcast as needed in where/masking operations (GH2793)
Series now automatically will try to set the correct dtype based on passed datetimelike objects (datetime/Timestamp)
- timedelta64 are returned in appropriate cases (e.g. Series - Series, when both are datetime64)
- mixed datetimes and objects (GH2751) in a constructor will be cast correctly
- astype on datetimes to object are now handled (as well as NaT conversions to np.nan)
- all timedelta like objects will be correctly assigned to
timedelta64
with mixedNaN
and/orNaT
allowed
arguments to DataFrame.clip were inconsistent to numpy and Series clipping (GH2747)
util.testing.assert_frame_equal now checks the column and index names (GH2964)
Constructors will now return a more informative ValueError on failures when invalid shapes are passed
Don’t suppress TypeError in GroupBy.agg (GH3238)
Methods return None when inplace=True (GH1893)
HDFStore
- added the method
select_column
to select a single column from a table as a Series. - deprecated the
unique
method, can be replicated byselect_column(key,column).unique()
min_itemsize
parameter will now automatically create data_columns for passed keys
- added the method
Downcast on pivot if possible (GH3283), adds argument
downcast
tofillna
Introduced options display.height/width for explicitly specifying terminal height/width in characters. Deprecated display.line_width, now replaced by display.width. These defaults are in effect for scripts as well, so unless disabled, previously very wide output will now be output as “expand_repr” style wrapped output.
Various defaults for options (including display.max_rows) have been revised, after a brief survey concluded they were wrong for everyone. Now at w=80,h=60.
HTML repr output in IPython qtconsole is once again controlled by the option display.notebook_repr_html, and on by default.
Bug Fixes¶
- Fix seg fault on empty data frame when fillna with
pad
orbackfill
(GH2778) - Single element ndarrays of datetimelike objects are handled (e.g. np.array(datetime(2001,1,1,0,0))), w/o dtype being passed
- 0-dim ndarrays with a passed dtype are handled correctly (e.g. np.array(0.,dtype=’float32’))
- Fix some boolean indexing inconsistencies in Series.__getitem__/__setitem__ (GH2776)
- Fix issues with DataFrame and Series constructor with integers that
overflow
int64
and some mixed typed type lists (GH2845) HDFStore
- Fix weird PyTables error when using too many selectors in a where also correctly filter on any number of values in a Term expression (so not using numexpr filtering, but isin filtering)
- Internally, change all variables to be private-like (now have leading underscore)
- Fixes for query parsing to correctly interpret boolean and != (GH2849, GH2973)
- Fixes for pathological case on SparseSeries with 0-len array and compression (GH2931)
- Fixes bug with writing rows if part of a block was all-nan (GH3012)
- Exceptions are now ValueError or TypeError as needed
- A table will now raise if min_itemsize contains fields which are not queryables
- Bug showing up in applymap where some object type columns are converted (GH2909) had an incorrect default in convert_objects
- TimeDeltas
- Series ops with a Timestamp on the rhs was throwing an exception (GH2898) added tests for Series ops with datetimes,timedeltas,Timestamps, and datelike Series on both lhs and rhs
- Fixed subtle timedelta64 inference issue on py3 & numpy 1.7.0 (GH3094)
- Fixed some formatting issues on timedelta when negative
- Support null checking on timedelta64, representing (and formatting) with NaT
- Support setitem with np.nan value, converts to NaT
- Support min/max ops in a Dataframe (abs not working, nor do we error on non-supported ops)
- Support idxmin/idxmax/abs/max/min in a Series (GH2989, GH2982)
- Bug on in-place putmasking on an
integer
series that needs to be converted tofloat
(GH2746) - Bug in argsort of
datetime64[ns]
Series withNaT
(GH2967) - Bug in value_counts of
datetime64[ns]
Series (GH3002) - Fixed printing of
NaT
in an index - Bug in idxmin/idxmax of
datetime64[ns]
Series withNaT
(GH2982) - Bug in
icol, take
with negative indicies was producing incorrect return values (see GH2922, GH2892), also check for out-of-bounds indices (GH3029) - Bug in DataFrame column insertion when the column creation fails, existing frame is left in an irrecoverable state (GH3010)
- Bug in DataFrame update, combine_first where non-specified values could cause dtype changes (GH3016, GH3041)
- Bug in groupby with first/last where dtypes could change (GH3041, GH2763)
- Formatting of an index that has
nan
was inconsistent or wrong (would fill from other values), (GH2850) - Unstack of a frame with no nans would always cause dtype upcasting (GH2929)
- Fix scalar datetime.datetime parsing bug in read_csv (GH3071)
- Fixed slow printing of large Dataframes, due to inefficient dtype reporting (GH2807)
- Fixed a segfault when using a function as grouper in groupby (GH3035)
- Fix pretty-printing of infinite data structures (closes GH2978)
- Fixed exception when plotting timeseries bearing a timezone (closes GH2877)
- str.contains ignored na argument (GH2806)
- Substitute warning for segfault when grouping with categorical grouper of mismatched length (GH3011)
- Fix exception in SparseSeries.density (GH2083)
- Fix upsampling bug with closed=’left’ and daily to daily data (GH3020)
- Fixed missing tick bars on scatter_matrix plot (GH3063)
- Fixed bug in Timestamp(d,tz=foo) when d is date() rather then datetime() (GH2993)
- series.plot(kind=’bar’) now respects pylab color schem (GH3115)
- Fixed bug in reshape if not passed correct input, now raises TypeError (GH2719)
- Fixed a bug where Series ctor did not respect ordering if OrderedDict passed in (GH3282)
- Fix NameError issue on RESO_US (GH2787)
- Allow selection in an unordered timeseries to work similary to an ordered timeseries (GH2437).
- Fix implemented
.xs
when called withaxes=1
and a level parameter (GH2903) - Timestamp now supports the class method fromordinal similar to datetimes (GH3042)
- Fix issue with indexing a series with a boolean key and specifiying a 1-len list on the rhs (GH2745) or a list on the rhs (GH3235)
- Fixed bug in groupby apply when kernel generate list of arrays having unequal len (GH1738)
- fixed handling of rolling_corr with center=True which could produce corr>1 (GH3155)
- Fixed issues where indices can be passed as ‘index/column’ in addition to 0/1 for the axis parameter
- PeriodIndex.tolist now boxes to Period (GH3178)
- PeriodIndex.get_loc KeyError now reports Period instead of ordinal (GH3179)
- df.to_records bug when handling MultiIndex (GH3189)
- Fix Series.__getitem__ segfault when index less than -length (GH3168)
- Fix bug when using Timestamp as a date parser (GH2932)
- Fix bug creating date range from Timestamp with time zone and passing same time zone (GH2926)
- Add comparison operators to Period object (GH2781)
- Fix bug when concatenating two Series into a DataFrame when they have the same name (GH2797)
- Fix automatic color cycling when plotting consecutive timeseries without color arguments (GH2816)
- fixed bug in the pickling of PeriodIndex (GH2891)
- Upcast/split blocks when needed in a mixed DataFrame when setitem with an indexer (GH3216)
- Invoking df.applymap on a dataframe with dupe cols now raises a ValueError (GH2786)
- Apply with invalid returned indices raise correct Exception (GH2808)
- Fixed a bug in plotting log-scale bar plots (GH3247)
- df.plot() grid on/off now obeys the mpl default style, just like series.plot(). (GH3233)
- Fixed a bug in the legend of plotting.andrews_curves() (GH3278)
- Produce a series on apply if we only generate a singular series and have a simple index (GH2893)
- Fix Python ASCII file parsing when integer falls outside of floating point spacing (GH3258)
- fixed pretty priniting of sets (GH3294)
- Panel() and Panel.from_dict() now respects ordering when give OrderedDict (GH3303)
- DataFrame where with a datetimelike incorrectly selecting (GH3311)
- Ensure index casts work even in Int64Index
- Fix set_index segfault when passing MultiIndex (GH3308)
- Ensure pickles created in py2 can be read in py3
- Insert ellipsis in MultiIndex summary repr (GH3348)
- Groupby will handle mutation among an input groups columns (and fallback to non-fast apply) (GH3380)
- Eliminated unicode errors on FreeBSD when using MPL GTK backend (GH3360)
- Period.strftime should return unicode strings always (GH3363)
- Respect passed read_* chunksize in get_chunk function (GH3406)
pandas 0.10.1¶
Release date: 2013-01-22
API Changes¶
- Restored inplace=True behavior returning self (same object) with deprecation warning until 0.11 (GH1893)
HDFStore
- refactored HFDStore to deal with non-table stores as objects, will allow future enhancements
- removed keyword
compression
fromput
(replaced by keywordcomplib
to be consistent across library) - warn PerformanceWarning if you are attempting to store types that will be pickled by PyTables
Improvements to existing features¶
HDFStore
- enables storing of multi-index dataframes (closes GH1277)
- support data column indexing and selection, via
data_columns
keyword in append - support write chunking to reduce memory footprint, via
chunksize
keyword to append - support automagic indexing via
index
keyword to append - support
expectedrows
keyword in append to informPyTables
about the expected tablesize - support
start
andstop
keywords in select to limit the row selection space - added
get_store
context manager to automatically import with pandas - added column filtering via
columns
keyword in select - added methods append_to_multiple/select_as_multiple/select_as_coordinates to do multiple-table append/selection
- added support for datetime64 in columns
- added method
unique
to select the unique values in an indexable or data column - added method
copy
to copy an existing store (and possibly upgrade) - show the shape of the data on disk for non-table stores when printing the store
- added ability to read PyTables flavor tables (allows compatibility to other HDF5 systems)
- Add
logx
option to DataFrame/Series.plot (GH2327, GH2565) - Support reading gzipped data from file-like object
pivot_table
aggfunc can be anything used in GroupBy.aggregate (GH2643)- Implement DataFrame merges in case where set cardinalities might overflow 64-bit integer (GH2690)
- Raise exception in C file parser if integer dtype specified and have NA values. (GH2631)
- Attempt to parse ISO8601 format dates when parse_dates=True in read_csv for major performance boost in such cases (GH2698)
- Add methods
neg
andinv
to Series - Implement
kind
option inExcelFile
to indicate whether it’s an XLS or XLSX file (GH2613) - Documented a fast-path in pd.read_csv when parsing iso8601 datetime strings yielding as much as a 20x speedup. (GH5993)
Bug Fixes¶
- Fix read_csv/read_table multithreading issues (GH2608)
HDFStore
- correctly handle
nan
elements in string columns; serialize via thenan_rep
keyword to append - raise correctly on non-implemented column types (unicode/date)
- handle correctly
Term
passed types (e.g.index<1000
, when index isInt64
), (closes GH512) - handle Timestamp correctly in data_columns (closes GH2637)
- contains correctly matches on non-natural names
- correctly store
float32
dtypes in tables (if not other float types in the same table)
- correctly handle
- Fix DataFrame.info bug with UTF8-encoded columns. (GH2576)
- Fix DatetimeIndex handling of FixedOffset tz (GH2604)
- More robust detection of being in IPython session for wide DataFrame console formatting (GH2585)
- Fix platform issues with
file:///
in unit test (GH2564) - Fix bug and possible segfault when grouping by hierarchical level that contains NA values (GH2616)
- Ensure that MultiIndex tuples can be constructed with NAs (GH2616)
- Fix int64 overflow issue when unstacking MultiIndex with many levels (GH2616)
- Exclude non-numeric data from DataFrame.quantile by default (GH2625)
- Fix a Cython C int64 boxing issue causing read_csv to return incorrect results (GH2599)
- Fix groupby summing performance issue on boolean data (GH2692)
- Don’t bork Series containing datetime64 values with to_datetime (GH2699)
- Fix DataFrame.from_records corner case when passed columns, index column, but empty record list (GH2633)
- Fix C parser-tokenizer bug with trailing fields. (GH2668)
- Don’t exclude non-numeric data from GroupBy.max/min (GH2700)
- Don’t lose time zone when calling DatetimeIndex.drop (GH2621)
- Fix setitem on a Series with a boolean key and a non-scalar as value (GH2686)
- Box datetime64 values in Series.apply/map (GH2627, GH2689)
- Upconvert datetime + datetime64 values when concatenating frames (GH2624)
- Raise a more helpful error message in merge operations when one DataFrame has duplicate columns (GH2649)
- Fix partial date parsing issue occuring only when code is run at EOM (GH2618)
- Prevent MemoryError when using counting sort in sortlevel with high-cardinality MultiIndex objects (GH2684)
- Fix Period resampling bug when all values fall into a single bin (GH2070)
- Fix buggy interaction with usecols argument in read_csv when there is an implicit first index column (GH2654)
- Fix bug in
Index.summary()
where string format methods were being called incorrectly. (GH3869)
pandas 0.10.0¶
Release date: 2012-12-17
New Features¶
- Brand new high-performance delimited file parsing engine written in C and Cython. 50% or better performance in many standard use cases with a fraction as much memory usage. (GH407, GH821)
- Many new file parser (read_csv, read_table) features:
- Support for on-the-fly gzip or bz2 decompression (compression option)
- Ability to get back numpy.recarray instead of DataFrame (as_recarray=True)
- dtype option: explicit column dtypes
- usecols option: specify list of columns to be read from a file. Good for reading very wide files with many irrelevant columns (GH1216 GH926, GH2465)
- Enhanced unicode decoding support via encoding option
- skipinitialspace dialect option
- Can specify strings to be recognized as True (true_values) or False (false_values)
- High-performance delim_whitespace option for whitespace-delimited files; a preferred alternative to the ‘s+’ regular expression delimiter
- Option to skip “bad” lines (wrong number of fields) that would otherwise have caused an error in the past (error_bad_lines and warn_bad_lines options)
- Substantially improved performance in the parsing of integers with thousands markers and lines with comments
- Easy of European (and other) decimal formats (decimal option) (GH584, GH2466)
- Custom line terminators (e.g. lineterminator=’~’) (GH2457)
- Handling of no trailing commas in CSV files (GH2333)
- Ability to handle fractional seconds in date_converters (GH2209)
- read_csv allow scalar arg to na_values (GH1944)
- Explicit column dtype specification in read_* functions (GH1858)
- Easier CSV dialect specification (GH1743)
- Improve parser performance when handling special characters (GH1204)
- Google Analytics API integration with easy oauth2 workflow (GH2283)
- Add error handling to Series.str.encode/decode (GH2276)
- Add
where
andmask
to Series (GH2337) - Grouped histogram via by keyword in Series/DataFrame.hist (GH2186)
- Support optional
min_periods
keyword incorr
andcov
for both Series and DataFrame (GH2002) - Add
duplicated
anddrop_duplicates
functions to Series (GH1923) - Add docs for
HDFStore table
format - ‘density’ property in SparseSeries (GH2384)
- Add
ffill
andbfill
convenience functions for forward- and backfilling time series data (GH2284) - New option configuration system and functions set_option, get_option,
describe_option, and reset_option. Deprecate set_printoptions and
reset_printoptions (GH2393).
You can also access options as attributes via
pandas.options.X
- Wide DataFrames can be viewed more easily in the console with new expand_frame_repr and line_width configuration options. This is on by default now (GH2436)
- Scikits.timeseries-like moving window functions via
rolling_window
(GH1270)
Experimental Features¶
- Add support for Panel4D, a named 4 Dimensional structure
- Add support for ndpanel factory functions, to create custom, domain-specific N-Dimensional containers
API Changes¶
- The default binning/labeling behavior for
resample
has been changed to closed=’left’, label=’left’ for daily and lower frequencies. This had been a large source of confusion for users. See “what’s new” page for more on this. (GH2410) - Methods with
inplace
option now return None instead of the calling (modified) object (GH1893) - The special case DataFrame - TimeSeries doing column-by-column broadcasting has been deprecated. Users should explicitly do e.g. df.sub(ts, axis=0) instead. This is a legacy hack and can lead to subtle bugs.
- inf/-inf are no longer considered as NA by isnull/notnull. To be clear, this
is legacy cruft from early pandas. This behavior can be globally re-enabled
using the new option
mode.use_inf_as_null
(GH2050, GH1919) pandas.merge
will now default tosort=False
. For many use cases sorting the join keys is not necessary, and doing it by default is wasteful- Specify
header=0
explicitly to replace existing column names in file in read_* functions. - Default column names for header-less parsed files (yielded by read_csv,
etc.) are now the integers 0, 1, .... A new argument prefix has been
added; to get the v0.9.x behavior specify
prefix='X'
(GH2034). This API change was made to make the default column names more consistent with the DataFrame constructor’s default column names when none are specified. - DataFrame selection using a boolean frame now preserves input shape
- If function passed to Series.apply yields a Series, result will be a DataFrame (GH2316)
- Values like YES/NO/yes/no will not be considered as boolean by default any
longer in the file parsers. This can be customized using the new
true_values
andfalse_values
options (GH2360) - obj.fillna() is no longer valid; make method=’pad’ no longer the default option, to be more explicit about what kind of filling to perform. Add ffill/bfill convenience functions per above (GH2284)
- HDFStore.keys() now returns an absolute path-name for each key
- to_string() now always returns a unicode string. (GH2224)
- File parsers will not handle NA sentinel values arising from passed converter functions
Improvements to existing features¶
- Add
nrows
option to DataFrame.from_records for iterators (GH1794) - Unstack/reshape algorithm rewrite to avoid high memory use in cases where the number of observed key-tuples is much smaller than the total possible number that could occur (GH2278). Also improves performance in most cases.
- Support duplicate columns in DataFrame.from_records (GH2179)
- Add
normalize
option to Series/DataFrame.asfreq (GH2137) - SparseSeries and SparseDataFrame construction from empty and scalar values now no longer create dense ndarrays unnecessarily (GH2322)
HDFStore
now supports hierarchical keys (GH2397)- Support multiple query selection formats for
HDFStore tables
(GH1996) - Support
del store['df']
syntax to delete HDFStores - Add multi-dtype support for
HDFStore tables
min_itemsize
parameter can be specified inHDFStore table
creation- Indexing support in
HDFStore tables
(GH698) - Add line_terminator option to DataFrame.to_csv (GH2383)
- added implementation of str(x)/unicode(x)/bytes(x) to major pandas data structures, which should do the right thing on both py2.x and py3.x. (GH2224)
- Reduce groupby.apply overhead substantially by low-level manipulation of internal NumPy arrays in DataFrames (GH535)
- Implement
value_vars
inmelt
and addmelt
to pandas namespace (GH2412) - Added boolean comparison operators to Panel
- Enable
Series.str.strip/lstrip/rstrip
methods to take an argument (GH2411) - The DataFrame ctor now respects column ordering when given an OrderedDict (GH2455)
- Assigning DatetimeIndex to Series changes the class to TimeSeries (GH2139)
- Improve performance of .value_counts method on non-integer data (GH2480)
get_level_values
method for MultiIndex return Index instead of ndarray (GH2449)convert_to_r_dataframe
conversion for datetime values (GH2351)- Allow
DataFrame.to_csv
to represent inf and nan differently (GH2026) - Add
min_i
argument tonancorr
to specify minimum required observations (GH2002) - Add
inplace
option tosortlevel
/sort
functions on DataFrame (GH1873) - Enable DataFrame to accept scalar constructor values like Series (GH1856)
- DataFrame.from_records now takes optional
size
parameter (GH1794) - include iris dataset (GH1709)
- No datetime64 DataFrame column conversion of datetime.datetime with tzinfo (GH1581)
- Micro-optimizations in DataFrame for tracking state of internal consolidation (GH217)
- Format parameter in DataFrame.to_csv (GH1525)
- Partial string slicing for
DatetimeIndex
for daily and higher frequencies (GH2306) - Implement
col_space
parameter into_html
andto_string
in DataFrame (GH1000) - Override
Series.tolist
and box datetime64 types (GH2447) - Optimize
unstack
memory usage by compressing indices (GH2278) - Fix HTML repr in IPython qtconsole if opening window is small (GH2275)
- Escape more special characters in console output (GH2492)
- df.select now invokes bool on the result of crit(x) (GH2487)
Bug Fixes¶
- Fix major performance regression in DataFrame.iteritems (GH2273)
- Fixes bug when negative period passed to Series/DataFrame.diff (GH2266)
- Escape tabs in console output to avoid alignment issues (GH2038)
- Properly box datetime64 values when retrieving cross-section from mixed-dtype DataFrame (GH2272)
- Fix concatenation bug leading to GH2057, GH2257
- Fix regression in Index console formatting (GH2319)
- Box Period data when assigning PeriodIndex to frame column (GH2243, GH2281)
- Raise exception on calling reset_index on Series with inplace=True (GH2277)
- Enable setting multiple columns in DataFrame with hierarchical columns (GH2295)
- Respect dtype=object in DataFrame constructor (GH2291)
- Fix DatetimeIndex.join bug with tz-aware indexes and how=’outer’ (GH2317)
- pop(...) and del works with DataFrame with duplicate columns (GH2349)
- Treat empty strings as NA in date parsing (rather than let dateutil do something weird) (GH2263)
- Prevent uint64 -> int64 overflows (GH2355)
- Enable joins between MultiIndex and regular Index (GH2024)
- Fix time zone metadata issue when unioning non-overlapping DatetimeIndex objects (GH2367)
- Raise/handle int64 overflows in parsers (GH2247)
- Deleting of consecutive rows in
HDFStore tables`
is much faster than before - Appending on a HDFStore would fail if the table was not first created via
put
- Use col_space argument as minimum column width in DataFrame.to_html (GH2328)
- Fix tz-aware DatetimeIndex.to_period (GH2232)
- Fix DataFrame row indexing case with MultiIndex (GH2314)
- Fix to_excel exporting issues with Timestamp objects in index (GH2294)
- Fixes assigning scalars and array to hierarchical column chunk (GH1803)
- Fixed a UnicodeDecodeError with series tidy_repr (GH2225)
- Fixed issued with duplicate keys in an index (GH2347, GH2380)
- Fixed issues re: Hash randomization, default on starting w/ py3.3 (GH2331)
- Fixed issue with missing attributes after loading a pickled dataframe (GH2431)
- Fix Timestamp formatting with tzoffset time zone in dateutil 2.1 (GH2443)
- Fix GroupBy.apply issue when using BinGrouper to do ts binning (GH2300)
- Fix issues resulting from datetime.datetime columns being converted to datetime64 when calling DataFrame.apply. (GH2374)
- Raise exception when calling to_panel on non uniquely-indexed frame (GH2441)
- Improved detection of console encoding on IPython zmq frontends (GH2458)
- Preserve time zone when .append-ing two time series (GH2260)
- Box timestamps when calling reset_index on time-zone-aware index rather than creating a tz-less datetime64 column (GH2262)
- Enable searching non-string columns in DataFrame.filter(like=...) (GH2467)
- Fixed issue with losing nanosecond precision upon conversion to DatetimeIndex(GH2252)
- Handle timezones in Datetime.normalize (GH2338)
- Fix test case where dtype specification with endianness causes failures on big endian machines (GH2318)
- Fix plotting bug where upsampling causes data to appear shifted in time (GH2448)
- Fix
read_csv
failure for UTF-16 with BOM and skiprows(GH2298) - read_csv with names arg not implicitly setting header=None(GH2459)
- Unrecognized compression mode causes segfault in read_csv(GH2474)
- In read_csv, header=0 and passed names should discard first row(GH2269)
- Correctly route to stdout/stderr in read_table (GH2071)
- Fix exception when Timestamp.to_datetime is called on a Timestamp with tzoffset (GH2471)
- Fixed unintentional conversion of datetime64 to long in groupby.first() (GH2133)
- Union of empty DataFrames now return empty with concatenated index (GH2307)
- DataFrame.sort_index raises more helpful exception if sorting by column with duplicates (GH2488)
- DataFrame.to_string formatters can be list, too (GH2520)
- DataFrame.combine_first will always result in the union of the index and columns, even if one DataFrame is length-zero (GH2525)
- Fix several DataFrame.icol/irow with duplicate indices issues (GH2228, GH2259)
- Use Series names for column names when using concat with axis=1 (GH2489)
- Raise Exception if start, end, periods all passed to date_range (GH2538)
- Fix Panel resampling issue (GH2537)
pandas 0.9.1¶
Release date: 2012-11-14
New Features¶
- Can specify multiple sort orders in DataFrame/Series.sort/sort_index (GH928)
- New top and bottom options for handling NAs in rank (GH1508, GH2159)
- Add where and mask functions to DataFrame (GH2109, GH2151)
- Add at_time and between_time functions to DataFrame (GH2149)
- Add flexible pow and rpow methods to DataFrame (GH2190)
API Changes¶
- Upsampling period index “spans” intervals. Example: annual periods upsampled to monthly will span all months in each year
- Period.end_time will yield timestamp at last nanosecond in the interval (GH2124, GH2125, GH1764)
- File parsers no longer coerce to float or bool for columns that have custom converters specified (GH2184)
Improvements to existing features¶
- Time rule inference for week-of-month (e.g. WOM-2FRI) rules (GH2140)
- Improve performance of datetime + business day offset with large number of offset periods
- Improve HTML display of DataFrame objects with hierarchical columns
- Enable referencing of Excel columns by their column names (GH1936)
- DataFrame.dot can accept ndarrays (GH2042)
- Support negative periods in Panel.shift (GH2164)
- Make .drop(...) work with non-unique indexes (GH2101)
- Improve performance of Series/DataFrame.diff (re: GH2087)
- Support unary ~ (__invert__) in DataFrame (GH2110)
- Turn off pandas-style tick locators and formatters (GH2205)
- DataFrame[DataFrame] uses DataFrame.where to compute masked frame (GH2230)
Bug Fixes¶
- Fix some duplicate-column DataFrame constructor issues (GH2079)
- Fix bar plot color cycle issues (GH2082)
- Fix off-center grid for stacked bar plots (GH2157)
- Fix plotting bug if inferred frequency is offset with N > 1 (GH2126)
- Implement comparisons on date offsets with fixed delta (GH2078)
- Handle inf/-inf correctly in read_* parser functions (GH2041)
- Fix matplotlib unicode interaction bug
- Make WLS r-squared match statsmodels 0.5.0 fixed value
- Fix zero-trimming DataFrame formatting bug
- Correctly compute/box datetime64 min/max values from Series.min/max (GH2083)
- Fix unstacking edge case with unrepresented groups (GH2100)
- Fix Series.str failures when using pipe pattern ‘|’ (GH2119)
- Fix pretty-printing of dict entries in Series, DataFrame (GH2144)
- Cast other datetime64 values to nanoseconds in DataFrame ctor (GH2095)
- Alias Timestamp.astimezone to tz_convert, so will yield Timestamp (GH2060)
- Fix timedelta64 formatting from Series (GH2165, GH2146)
- Handle None values gracefully in dict passed to Panel constructor (GH2075)
- Box datetime64 values as Timestamp objects in Series/DataFrame.iget (GH2148)
- Fix Timestamp indexing bug in DatetimeIndex.insert (GH2155)
- Use index name(s) (if any) in DataFrame.to_records (GH2161)
- Don’t lose index names in Panel.to_frame/DataFrame.to_panel (GH2163)
- Work around length-0 boolean indexing NumPy bug (GH2096)
- Fix partial integer indexing bug in DataFrame.xs (GH2107)
- Fix variety of cut/qcut string-bin formatting bugs (GH1978, GH1979)
- Raise Exception when xs view not possible of MultiIndex’d DataFrame (GH2117)
- Fix groupby(...).first() issue with datetime64 (GH2133)
- Better floating point error robustness in some rolling_* functions (GH2114, GH2527)
- Fix ewma NA handling in the middle of Series (GH2128)
- Fix numerical precision issues in diff with integer data (GH2087)
- Fix bug in MultiIndex.__getitem__ with NA values (GH2008)
- Fix DataFrame.from_records dict-arg bug when passing columns (GH2179)
- Fix Series and DataFrame.diff for integer dtypes (GH2087, GH2174)
- Fix bug when taking intersection of DatetimeIndex with empty index (GH2129)
- Pass through timezone information when calling DataFrame.align (GH2127)
- Properly sort when joining on datetime64 values (GH2196)
- Fix indexing bug in which False/True were being coerced to 0/1 (GH2199)
- Many unicode formatting fixes (GH2201)
- Fix improper MultiIndex conversion issue when assigning e.g. DataFrame.index (GH2200)
- Fix conversion of mixed-type DataFrame to ndarray with dup columns (GH2236)
- Fix duplicate columns issue (GH2218, GH2219)
- Fix SparseSeries.__pow__ issue with NA input (GH2220)
- Fix icol with integer sequence failure (GH2228)
- Fixed resampling tz-aware time series issue (GH2245)
- SparseDataFrame.icol was not returning SparseSeries (GH2227, GH2229)
- Enable ExcelWriter to handle PeriodIndex (GH2240)
- Fix issue constructing DataFrame from empty Series with name (GH2234)
- Use console-width detection in interactive sessions only (GH1610)
- Fix parallel_coordinates legend bug with mpl 1.2.0 (GH2237)
- Make tz_localize work in corner case of empty Series (GH2248)
pandas 0.9.0¶
Release date: 10/7/2012
New Features¶
- Add
str.encode
andstr.decode
to Series (GH1706) - Add to_latex method to DataFrame (GH1735)
- Add convenient expanding window equivalents of all rolling_* ops (GH1785)
- Add Options class to pandas.io.data for fetching options data from Yahoo! Finance (GH1748, GH1739)
- Recognize and convert more boolean values in file parsing (Yes, No, TRUE, FALSE, variants thereof) (GH1691, GH1295)
- Add Panel.update method, analogous to DataFrame.update (GH1999, GH1988)
Improvements to existing features¶
- Proper handling of NA values in merge operations (GH1990)
- Add
flags
option forre.compile
in some Series.str methods (GH1659) - Parsing of UTC date strings in read_* functions (GH1693)
- Handle generator input to Series (GH1679)
- Add na_action=’ignore’ to Series.map to quietly propagate NAs (GH1661)
- Add args/kwds options to Series.apply (GH1829)
- Add inplace option to Series/DataFrame.reset_index (GH1797)
- Add
level
parameter toSeries.reset_index
- Add quoting option for DataFrame.to_csv (GH1902)
- Indicate long column value truncation in DataFrame output with ... (GH1854)
- DataFrame.dot will not do data alignment, and also work with Series (GH1915)
- Add
na
option for missing data handling in some vectorized string methods (GH1689) - If index_label=False in DataFrame.to_csv, do not print fields/commas in the text output. Results in easier importing into R (GH1583)
- Can pass tuple/list of axes to DataFrame.dropna to simplify repeated calls (dropping both columns and rows) (GH924)
- Improve DataFrame.to_html output for hierarchically-indexed rows (do not repeat levels) (GH1929)
- TimeSeries.between_time can now select times across midnight (GH1871)
- Enable skip_footer parameter in ExcelFile.parse (GH1843)
API Changes¶
- Change default header names in read_* functions to more Pythonic X0, X1, etc. instead of X.1, X.2. (GH2000)
- Deprecated
day_of_year
API removed from PeriodIndex, usedayofyear
(GH1723) - Don’t modify NumPy suppress printoption at import time
- The internal HDF5 data arrangement for DataFrames has been transposed. Legacy files will still be readable by HDFStore (GH1834, GH1824)
- Legacy cruft removed: pandas.stats.misc.quantileTS
- Use ISO8601 format for Period repr: monthly, daily, and on down (GH1776)
- Empty DataFrame columns are now created as object dtype. This will prevent a class of TypeErrors that was occurring in code where the dtype of a column would depend on the presence of data or not (e.g. a SQL query having results) (GH1783)
- Setting parts of DataFrame/Panel using ix now aligns input Series/DataFrame (GH1630)
- first and last methods in GroupBy no longer drop non-numeric columns (GH1809)
- Resolved inconsistencies in specifying custom NA values in text parser. na_values of type dict no longer override default NAs unless keep_default_na is set to false explicitly (GH1657)
- Enable skipfooter parameter in text parsers as an alias for skip_footer
Bug Fixes¶
- Perform arithmetic column-by-column in mixed-type DataFrame to avoid type upcasting issues. Caused downstream DataFrame.diff bug (GH1896)
- Fix matplotlib auto-color assignment when no custom spectrum passed. Also respect passed color keyword argument (GH1711)
- Fix resampling logical error with closed=’left’ (GH1726)
- Fix critical DatetimeIndex.union bugs (GH1730, GH1719, GH1745, GH1702, GH1753)
- Fix critical DatetimeIndex.intersection bug with unanchored offsets (GH1708)
- Fix MM-YYYY time series indexing case (GH1672)
- Fix case where Categorical group key was not being passed into index in GroupBy result (GH1701)
- Handle Ellipsis in Series.__getitem__/__setitem__ (GH1721)
- Fix some bugs with handling datetime64 scalars of other units in NumPy 1.6 and 1.7 (GH1717)
- Fix performance issue in MultiIndex.format (GH1746)
- Fixed GroupBy bugs interacting with DatetimeIndex asof / map methods (GH1677)
- Handle factors with NAs in pandas.rpy (GH1615)
- Fix statsmodels import in pandas.stats.var (GH1734)
- Fix DataFrame repr/info summary with non-unique columns (GH1700)
- Fix Series.iget_value for non-unique indexes (GH1694)
- Don’t lose tzinfo when passing DatetimeIndex as DataFrame column (GH1682)
- Fix tz conversion with time zones that haven’t had any DST transitions since first date in the array (GH1673)
- Fix field access with UTC->local conversion on unsorted arrays (GH1756)
- Fix isnull handling of array-like (list) inputs (GH1755)
- Fix regression in handling of Series in Series constructor (GH1671)
- Fix comparison of Int64Index with DatetimeIndex (GH1681)
- Fix min_periods handling in new rolling_max/min at array start (GH1695)
- Fix errors with how=’median’ and generic NumPy resampling in some cases caused by SeriesBinGrouper (GH1648, GH1688)
- When grouping by level, exclude unobserved levels (GH1697)
- Don’t lose tzinfo in DatetimeIndex when shifting by different offset (GH1683)
- Hack to support storing data with a zero-length axis in HDFStore (GH1707)
- Fix DatetimeIndex tz-aware range generation issue (GH1674)
- Fix method=’time’ interpolation with intraday data (GH1698)
- Don’t plot all-NA DataFrame columns as zeros (GH1696)
- Fix bug in scatter_plot with by option (GH1716)
- Fix performance problem in infer_freq with lots of non-unique stamps (GH1686)
- Fix handling of PeriodIndex as argument to create MultiIndex (GH1705)
- Fix re: unicode MultiIndex level names in Series/DataFrame repr (GH1736)
- Handle PeriodIndex in to_datetime instance method (GH1703)
- Support StaticTzInfo in DatetimeIndex infrastructure (GH1692)
- Allow MultiIndex setops with length-0 other type indexes (GH1727)
- Fix handling of DatetimeIndex in DataFrame.to_records (GH1720)
- Fix handling of general objects in isnull on which bool(...) fails (GH1749)
- Fix .ix indexing with MultiIndex ambiguity (GH1678)
- Fix .ix setting logic error with non-unique MultiIndex (GH1750)
- Basic indexing now works on MultiIndex with > 1000000 elements, regression from earlier version of pandas (GH1757)
- Handle non-float64 dtypes in fast DataFrame.corr/cov code paths (GH1761)
- Fix DatetimeIndex.isin to function properly (GH1763)
- Fix conversion of array of tz-aware datetime.datetime to DatetimeIndex with right time zone (GH1777)
- Fix DST issues with generating ancxhored date ranges (GH1778)
- Fix issue calling sort on result of Series.unique (GH1807)
- Fix numerical issue leading to square root of negative number in rolling_std (GH1840)
- Let Series.str.split accept no arguments (like str.split) (GH1859)
- Allow user to have dateutil 2.1 installed on a Python 2 system (GH1851)
- Catch ImportError less aggressively in pandas/__init__.py (GH1845)
- Fix pip source installation bug when installing from GitHub (GH1805)
- Fix error when window size > array size in rolling_apply (GH1850)
- Fix pip source installation issues via SSH from GitHub
- Fix OLS.summary when column is a tuple (GH1837)
- Fix bug in __doc__ patching when -OO passed to interpreter (GH1792 GH1741 GH1774)
- Fix unicode console encoding issue in IPython notebook (GH1782, GH1768)
- Fix unicode formatting issue with Series.name (GH1782)
- Fix bug in DataFrame.duplicated with datetime64 columns (GH1833)
- Fix bug in Panel internals resulting in error when doing fillna after truncate not changing size of panel (GH1823)
- Prevent segfault due to MultiIndex not being supported in HDFStore table format (GH1848)
- Fix UnboundLocalError in Panel.__setitem__ and add better error (GH1826)
- Fix to_csv issues with list of string entries. Isnull works on list of strings now too (GH1791)
- Fix Timestamp comparisons with datetime values outside the nanosecond range (1677-2262)
- Revert to prior behavior of normalize_date with datetime.date objects (return datetime)
- Fix broken interaction between np.nansum and Series.any/all
- Fix bug with multiple column date parsers (GH1866)
- DatetimeIndex.union(Int64Index) was broken
- Make plot x vs y interface consistent with integer indexing (GH1842)
- set_index inplace modified data even if unique check fails (GH1831)
- Only use Q-OCT/NOV/DEC in quarterly frequency inference (GH1789)
- Upcast to dtype=object when unstacking boolean DataFrame (GH1820)
- Fix float64/float32 merging bug (GH1849)
- Fixes to Period.start_time for non-daily frequencies (GH1857)
- Fix failure when converter used on index_col in read_csv (GH1835)
- Implement PeriodIndex.append so that pandas.concat works correctly (GH1815)
- Avoid Cython out-of-bounds access causing segfault sometimes in pad_2d, backfill_2d
- Fix resampling error with intraday times and anchored target time (like AS-DEC) (GH1772)
- Fix .ix indexing bugs with mixed-integer indexes (GH1799)
- Respect passed color keyword argument in Series.plot (GH1890)
- Fix rolling_min/max when the window is larger than the size of the input array. Check other malformed inputs (GH1899, GH1897)
- Rolling variance / standard deviation with only a single observation in window (GH1884)
- Fix unicode sheet name failure in to_excel (GH1828)
- Override DatetimeIndex.min/max to return Timestamp objects (GH1895)
- Fix column name formatting issue in length-truncated column (GH1906)
- Fix broken handling of copying Index metadata to new instances created by view(...) calls inside the NumPy infrastructure
- Support datetime.date again in DateOffset.rollback/rollforward
- Raise Exception if set passed to Series constructor (GH1913)
- Add TypeError when appending HDFStore table w/ wrong index type (GH1881)
- Don’t raise exception on empty inputs in EW functions (e.g. ewma) (GH1900)
- Make asof work correctly with PeriodIndex (GH1883)
- Fix extlinks in doc build
- Fill boolean DataFrame with NaN when calling shift (GH1814)
- Fix setuptools bug causing pip not to Cythonize .pyx files sometimes
- Fix negative integer indexing regression in .ix from 0.7.x (GH1888)
- Fix error while retrieving timezone and utc offset from subclasses of datetime.tzinfo without .zone and ._utcoffset attributes (GH1922)
- Fix DataFrame formatting of small, non-zero FP numbers (GH1911)
- Various fixes by upcasting of date -> datetime (GH1395)
- Raise better exception when passing multiple functions with the same name, such as lambdas, to GroupBy.aggregate
- Fix DataFrame.apply with axis=1 on a non-unique index (GH1878)
- Proper handling of Index subclasses in pandas.unique (GH1759)
- Set index names in DataFrame.from_records (GH1744)
- Fix time series indexing error with duplicates, under and over hash table size cutoff (GH1821)
- Handle list keys in addition to tuples in DataFrame.xs when partial-indexing a hierarchically-indexed DataFrame (GH1796)
- Support multiple column selection in DataFrame.__getitem__ with duplicate columns (GH1943)
- Fix time zone localization bug causing improper fields (e.g. hours) in time zones that have not had a UTC transition in a long time (GH1946)
- Fix errors when parsing and working with with fixed offset timezones (GH1922, GH1928)
- Fix text parser bug when handling UTC datetime objects generated by dateutil (GH1693)
- Fix plotting bug when ‘B’ is the inferred frequency but index actually contains weekends (GH1668, GH1669)
- Fix plot styling bugs (GH1666, GH1665, GH1658)
- Fix plotting bug with index/columns with unicode (GH1685)
- Fix DataFrame constructor bug when passed Series with datetime64 dtype in a dict (GH1680)
- Fixed regression in generating DatetimeIndex using timezone aware datetime.datetime (GH1676)
- Fix DataFrame bug when printing concatenated DataFrames with duplicated columns (GH1675)
- Fixed bug when plotting time series with multiple intraday frequencies (GH1732)
- Fix bug in DataFrame.duplicated to enable iterables other than list-types as input argument (GH1773)
- Fix resample bug when passed list of lambdas as how argument (GH1808)
- Repr fix for MultiIndex level with all NAs (GH1971)
- Fix PeriodIndex slicing bug when slice start/end are out-of-bounds (GH1977)
- Fix read_table bug when parsing unicode (GH1975)
- Fix BlockManager.iget bug when dealing with non-unique MultiIndex as columns (GH1970)
- Fix reset_index bug if both drop and level are specified (GH1957)
- Work around unsafe NumPy object->int casting with Cython function (GH1987)
- Fix datetime64 formatting bug in DataFrame.to_csv (GH1993)
- Default start date in pandas.io.data to 1/1/2000 as the docs say (GH2011)
pandas 0.8.1¶
Release date: July 22, 2012
New Features¶
- Add vectorized, NA-friendly string methods to Series (GH1621, GH620)
- Can pass dict of per-column line styles to DataFrame.plot (GH1559)
- Selective plotting to secondary y-axis on same subplot (GH1640)
- Add new
bootstrap_plot
plot function - Add new
parallel_coordinates
plot function (GH1488) - Add
radviz
plot function (GH1566) - Add
multi_sparse
option toset_printoptions
to modify display of hierarchical indexes (GH1538) - Add
dropna
method to Panel (GH171)
Improvements to existing features¶
- Use moving min/max algorithms from Bottleneck in rolling_min/rolling_max for > 100x speedup. (GH1504, GH50)
- Add Cython group median method for >15x speedup (GH1358)
- Drastically improve
to_datetime
performance on ISO8601 datetime strings (with no time zones) (GH1571) - Improve single-key groupby performance on large data sets, accelerate use of groupby with a Categorical variable
- Add ability to append hierarchical index levels with
set_index
and to drop single levels withreset_index
(GH1569, GH1577) - Always apply passed functions in
resample
, even if upsampling (GH1596) - Avoid unnecessary copies in DataFrame constructor with explicit dtype (GH1572)
- Cleaner DatetimeIndex string representation with 1 or 2 elements (GH1611)
- Improve performance of array-of-Period to PeriodIndex, convert such arrays to PeriodIndex inside Index (GH1215)
- More informative string representation for weekly Period objects (GH1503)
- Accelerate 3-axis multi data selection from homogeneous Panel (GH979)
- Add
adjust
option to ewma to disable adjustment factor (GH1584) - Add new matplotlib converters for high frequency time series plotting (GH1599)
- Handling of tz-aware datetime.datetime objects in to_datetime; raise Exception unless utc=True given (GH1581)
Bug Fixes¶
- Fix NA handling in DataFrame.to_panel (GH1582)
- Handle TypeError issues inside PyObject_RichCompareBool calls in khash (GH1318)
- Fix resampling bug to lower case daily frequency (GH1588)
- Fix kendall/spearman DataFrame.corr bug with no overlap (GH1595)
- Fix bug in DataFrame.set_index (GH1592)
- Don’t ignore axes in boxplot if by specified (GH1565)
- Fix Panel .ix indexing with integers bug (GH1603)
- Fix Partial indexing bugs (years, months, ...) with PeriodIndex (GH1601)
- Fix MultiIndex console formatting issue (GH1606)
- Unordered index with duplicates doesn’t yield scalar location for single entry (GH1586)
- Fix resampling of tz-aware time series with “anchored” freq (GH1591)
- Fix DataFrame.rank error on integer data (GH1589)
- Selection of multiple SparseDataFrame columns by list in __getitem__ (GH1585)
- Override Index.tolist for compatibility with MultiIndex (GH1576)
- Fix hierarchical summing bug with MultiIndex of length 1 (GH1568)
- Work around numpy.concatenate use/bug in Series.set_value (GH1561)
- Ensure Series/DataFrame are sorted before resampling (GH1580)
- Fix unhandled IndexError when indexing very large time series (GH1562)
- Fix DatetimeIndex intersection logic error with irregular indexes (GH1551)
- Fix unit test errors on Python 3 (GH1550)
- Fix .ix indexing bugs in duplicate DataFrame index (GH1201)
- Better handle errors with non-existing objects in HDFStore (GH1254)
- Don’t copy int64 array data in DatetimeIndex when copy=False (GH1624)
- Fix resampling of conforming periods quarterly to annual (GH1622)
- Don’t lose index name on resampling (GH1631)
- Support python-dateutil version 2.1 (GH1637)
- Fix broken scatter_matrix axis labeling, esp. with time series (GH1625)
- Fix cases where extra keywords weren’t being passed on to matplotlib from Series.plot (GH1636)
- Fix BusinessMonthBegin logic for dates before 1st bday of month (GH1645)
- Ensure string alias converted (valid in DatetimeIndex.get_loc) in DataFrame.xs / __getitem__ (GH1644)
- Fix use of string alias timestamps with tz-aware time series (GH1647)
- Fix Series.max/min and Series.describe on len-0 series (GH1650)
- Handle None values in dict passed to concat (GH1649)
- Fix Series.interpolate with method=’values’ and DatetimeIndex (GH1646)
- Fix IndexError in left merges on a DataFrame with 0-length (GH1628)
- Fix DataFrame column width display with UTF-8 encoded characters (GH1620)
- Handle case in pandas.io.data.get_data_yahoo where Yahoo! returns duplicate dates for most recent business day
- Avoid downsampling when plotting mixed frequencies on the same subplot (GH1619)
- Fix read_csv bug when reading a single line (GH1553)
- Fix bug in C code causing monthly periods prior to December 1969 to be off (GH1570)
pandas 0.8.0¶
Release date: 6/29/2012
New Features¶
- New unified DatetimeIndex class for nanosecond-level timestamp data
- New Timestamp datetime.datetime subclass with easy time zone conversions, and support for nanoseconds
- New PeriodIndex class for timespans, calendar logic, and Period scalar object
- High performance resampling of timestamp and period data. New resample method of all pandas data structures
- New frequency names plus shortcut string aliases like ‘15h’, ‘1h30min’
- Time series string indexing shorthand (GH222)
- Add week, dayofyear array and other timestamp array-valued field accessor functions to DatetimeIndex
- Add GroupBy.prod optimized aggregation function and ‘prod’ fast time series conversion method (GH1018)
- Implement robust frequency inference function and inferred_freq attribute on DatetimeIndex (GH391)
- New
tz_convert
andtz_localize
methods in Series / DataFrame - Convert DatetimeIndexes to UTC if time zones are different in join/setops (GH864)
- Add limit argument for forward/backward filling to reindex, fillna, etc. (GH825 and others)
- Add support for indexes (dates or otherwise) with duplicates and common sense indexing/selection functionality
- Series/DataFrame.update methods, in-place variant of combine_first (GH961)
- Add
match
function to API (GH502) - Add Cython-optimized first, last, min, max, prod functions to GroupBy (GH994, GH1043)
- Dates can be split across multiple columns (GH1227, GH1186)
- Add experimental support for converting pandas DataFrame to R data.frame via rpy2 (GH350, GH1212)
- Can pass list of (name, function) to GroupBy.aggregate to get aggregates in a particular order (GH610)
- Can pass dicts with lists of functions or dicts to GroupBy aggregate to do much more flexible multiple function aggregation (GH642, GH610)
- New ordered_merge functions for merging DataFrames with ordered data. Also supports group-wise merging for panel data (GH813)
- Add keys() method to DataFrame
- Add flexible replace method for replacing potentially values to Series and DataFrame (GH929, GH1241)
- Add ‘kde’ plot kind for Series/DataFrame.plot (GH1059)
- More flexible multiple function aggregation with GroupBy
- Add pct_change function to Series/DataFrame
- Add option to interpolate by Index values in Series.interpolate (GH1206)
- Add
max_colwidth
option for DataFrame, defaulting to 50 - Conversion of DataFrame through rpy2 to R data.frame (GH1282, )
- Add keys() method on DataFrame (GH1240)
- Add new
match
function to API (similar to R) (GH502) - Add dayfirst option to parsers (GH854)
- Add
method
argument toalign
method for forward/backward fillin (GH216) - Add Panel.transpose method for rearranging axes (GH695)
- Add new
cut
function (patterned after R) for discretizing data into equal range-length bins or arbitrary breaks of your choosing (GH415) - Add new
qcut
for cutting with quantiles (GH1378) - Add
value_counts
top level array method (GH1392) - Added Andrews curves plot tupe (GH1325)
- Add lag plot (GH1440)
- Add autocorrelation_plot (GH1425)
- Add support for tox and Travis CI (GH1382)
- Add support for Categorical use in GroupBy (GH292)
- Add
any
andall
methods to DataFrame (GH1416) - Add
secondary_y
option to Series.plot - Add experimental
lreshape
function for reshaping wide to long
Improvements to existing features¶
- Switch to klib/khash-based hash tables in Index classes for better performance in many cases and lower memory footprint
- Shipping some functions from scipy.stats to reduce dependency, e.g. Series.describe and DataFrame.describe (GH1092)
- Can create MultiIndex by passing list of lists or list of arrays to Series, DataFrame constructor, etc. (GH831)
- Can pass arrays in addition to column names to DataFrame.set_index (GH402)
- Improve the speed of “square” reindexing of homogeneous DataFrame objects by significant margin (GH836)
- Handle more dtypes when passed MaskedArrays in DataFrame constructor (GH406)
- Improved performance of join operations on integer keys (GH682)
- Can pass multiple columns to GroupBy object, e.g. grouped[[col1, col2]] to only aggregate a subset of the value columns (GH383)
- Add histogram / kde plot options for scatter_matrix diagonals (GH1237)
- Add inplace option to Series/DataFrame.rename and sort_index, DataFrame.drop_duplicates (GH805, GH207)
- More helpful error message when nothing passed to Series.reindex (GH1267)
- Can mix array and scalars as dict-value inputs to DataFrame ctor (GH1329)
- Use DataFrame columns’ name for legend title in plots
- Preserve frequency in DatetimeIndex when possible in boolean indexing operations
- Promote datetime.date values in data alignment operations (GH867)
- Add
order
method to Index classes (GH1028) - Avoid hash table creation in large monotonic hash table indexes (GH1160)
- Store time zones in HDFStore (GH1232)
- Enable storage of sparse data structures in HDFStore (GH85)
- Enable Series.asof to work with arrays of timestamp inputs
- Cython implementation of DataFrame.corr speeds up by > 100x (GH1349, GH1354)
- Exclude “nuisance” columns automatically in GroupBy.transform (GH1364)
- Support functions-as-strings in GroupBy.transform (GH1362)
- Use index name as xlabel/ylabel in plots (GH1415)
- Add
convert_dtype
option to Series.apply to be able to leave data as dtype=object (GH1414) - Can specify all index level names in concat (GH1419)
- Add
dialect
keyword to parsers for quoting conventions (GH1363) - Enable DataFrame[bool_DataFrame] += value (GH1366)
- Add
retries
argument toget_data_yahoo
to try to prevent Yahoo! API 404s (GH826) - Improve performance of reshaping by using O(N) categorical sorting
- Series names will be used for index of DataFrame if no index passed (GH1494)
- Header argument in DataFrame.to_csv can accept a list of column names to use instead of the object’s columns (GH921)
- Add
raise_conflict
argument to DataFrame.update (GH1526) - Support file-like objects in ExcelFile (GH1529)
API Changes¶
- Rename pandas._tseries to pandas.lib
- Rename Factor to Categorical and add improvements. Numerous Categorical bug fixes
- Frequency name overhaul, WEEKDAY/EOM and rules with @ deprecated. get_legacy_offset_name backwards compatibility function added
- Raise ValueError in DataFrame.__nonzero__, so “if df” no longer works (GH1073)
- Change BDay (business day) to not normalize dates by default (GH506)
- Remove deprecated DataMatrix name
- Default merge suffixes for overlap now have underscores instead of periods to facilitate tab completion, etc. (GH1239)
- Deprecation of offset, time_rule timeRule parameters throughout codebase
- Series.append and DataFrame.append no longer check for duplicate indexes by default, add verify_integrity parameter (GH1394)
- Refactor Factor class, old constructor moved to Factor.from_array
- Modified internals of MultiIndex to use less memory (no longer represented as array of tuples) internally, speed up construction time and many methods which construct intermediate hierarchical indexes (GH1467)
Bug Fixes¶
- Fix OverflowError from storing pre-1970 dates in HDFStore by switching to datetime64 (GH179)
- Fix logical error with February leap year end in YearEnd offset
- Series([False, nan]) was getting casted to float64 (GH1074)
- Fix binary operations between boolean Series and object Series with booleans and NAs (GH1074, GH1079)
- Couldn’t assign whole array to column in mixed-type DataFrame via .ix (GH1142)
- Fix label slicing issues with float index values (GH1167)
- Fix segfault caused by empty groups passed to groupby (GH1048)
- Fix occasionally misbehaved reindexing in the presence of NaN labels (GH522)
- Fix imprecise logic causing weird Series results from .apply (GH1183)
- Unstack multiple levels in one shot, avoiding empty columns in some cases. Fix pivot table bug (GH1181)
- Fix formatting of MultiIndex on Series/DataFrame when index name coincides with label (GH1217)
- Handle Excel 2003 #N/A as NaN from xlrd (GH1213, GH1225)
- Fix timestamp locale-related deserialization issues with HDFStore by moving to datetime64 representation (GH1081, GH809)
- Fix DataFrame.duplicated/drop_duplicates NA value handling (GH557)
- Actually raise exceptions in fast reducer (GH1243)
- Fix various timezone-handling bugs from 0.7.3 (GH969)
- GroupBy on level=0 discarded index name (GH1313)
- Better error message with unmergeable DataFrames (GH1307)
- Series.__repr__ alignment fix with unicode index values (GH1279)
- Better error message if nothing passed to reindex (GH1267)
- More robust NA handling in DataFrame.drop_duplicates (GH557)
- Resolve locale-based and pre-epoch HDF5 timestamp deserialization issues (GH973, GH1081, GH179)
- Implement Series.repeat (GH1229)
- Fix indexing with namedtuple and other tuple subclasses (GH1026)
- Fix float64 slicing bug (GH1167)
- Parsing integers with commas (GH796)
- Fix groupby improper data type when group consists of one value (GH1065)
- Fix negative variance possibility in nanvar resulting from floating point error (GH1090)
- Consistently set name on groupby pieces (GH184)
- Treat dict return values as Series in GroupBy.apply (GH823)
- Respect column selection for DataFrame in in GroupBy.transform (GH1365)
- Fix MultiIndex partial indexing bug (GH1352)
- Enable assignment of rows in mixed-type DataFrame via .ix (GH1432)
- Reset index mapping when grouping Series in Cython (GH1423)
- Fix outer/inner DataFrame.join with non-unique indexes (GH1421)
- Fix MultiIndex groupby bugs with empty lower levels (GH1401)
- Calling fillna with a Series will have same behavior as with dict (GH1486)
- SparseSeries reduction bug (GH1375)
- Fix unicode serialization issue in HDFStore (GH1361)
- Pass keywords to pyplot.boxplot in DataFrame.boxplot (GH1493)
- Bug fixes in MonthBegin (GH1483)
- Preserve MultiIndex names in drop (GH1513)
- Fix Panel DataFrame slice-assignment bug (GH1533)
- Don’t use locals() in read_* functions (GH1547)
pandas 0.7.3¶
Release date: April 12, 2012
New Features¶
- Support for non-unique indexes: indexing and selection, many-to-one and many-to-many joins (GH1306)
- Added fixed-width file reader, read_fwf (GH952)
- Add group_keys argument to groupby to not add group names to MultiIndex in result of apply (GH938)
- DataFrame can now accept non-integer label slicing (GH946). Previously only DataFrame.ix was able to do so.
- DataFrame.apply now retains name attributes on Series objects (GH983)
- Numeric DataFrame comparisons with non-numeric values now raises proper TypeError (GH943). Previously raise “PandasError: DataFrame constructor not properly called!”
- Add
kurt
methods to Series and DataFrame (GH964) - Can pass dict of column -> list/set NA values for text parsers (GH754)
- Allows users specified NA values in text parsers (GH754)
- Parsers checks for openpyxl dependency and raises ImportError if not found (GH1007)
- New factory function to create HDFStore objects that can be used in a with statement so users do not have to explicitly call HDFStore.close (GH1005)
- pivot_table is now more flexible with same parameters as groupby (GH941)
- Added stacked bar plots (GH987)
- scatter_matrix method in pandas/tools/plotting.py (GH935)
- DataFrame.boxplot returns plot results for ex-post styling (GH985)
- Short version number accessible as pandas.version.short_version (GH930)
- Additional documentation in panel.to_frame (GH942)
- More informative Series.apply docstring regarding element-wise apply (GH977)
- Notes on rpy2 installation (GH1006)
- Add rotation and font size options to hist method (GH1012)
- Use exogenous / X variable index in result of OLS.y_predict. Add OLS.predict method (GH1027, GH1008)
API Changes¶
Bug Fixes¶
- Fix logic error when selecting part of a row in a DataFrame with a MultiIndex index (GH1013)
- Series comparison with Series of differing length causes crash (GH1016).
- Fix bug in indexing when selecting section of hierarchically-indexed row (GH1013)
- DataFrame.plot(logy=True) has no effect (GH1011).
- Broken arithmetic operations between SparsePanel-Panel (GH1015)
- Unicode repr issues in MultiIndex with non-ASCII characters (GH1010)
- DataFrame.lookup() returns inconsistent results if exact match not present (GH1001)
- DataFrame arithmetic operations not treating None as NA (GH992)
- DataFrameGroupBy.apply returns incorrect result (GH991)
- Series.reshape returns incorrect result for multiple dimensions (GH989)
- Series.std and Series.var ignores ddof parameter (GH934)
- DataFrame.append loses index names (GH980)
- DataFrame.plot(kind=’bar’) ignores color argument (GH958)
- Inconsistent Index comparison results (GH948)
- Improper int dtype DataFrame construction from data with NaN (GH846)
- Removes default ‘result’ name in groupby results (GH995)
- DataFrame.from_records no longer mutate input columns (GH975)
- Use Index name when grouping by it (GH1313)
pandas 0.7.2¶
Release date: March 16, 2012
New Features¶
API Changes¶
- Series.sum returns 0 instead of NA when called on an empty series. Analogously for a DataFrame whose rows or columns are length 0 (GH844)
Improvements to existing features¶
- Don’t use groups dict in Grouper.size (GH860)
- Use khash for Series.value_counts, add raw function to algorithms.py (GH861)
- Enable column access via attributes on GroupBy (GH882)
- Enable setting existing columns (only) via attributes on DataFrame, Panel (GH883)
- Intercept __builtin__.sum in groupby (GH885)
- Can pass dict to DataFrame.fillna to use different values per column (GH661)
- Can select multiple hierarchical groups by passing list of values in .ix (GH134)
- Add level keyword to
drop
for dropping values from a level (GH159) - Add
coerce_float
option on DataFrame.from_records (GH893) - Raise exception if passed date_parser fails in
read_csv
- Add
axis
option to DataFrame.fillna (GH174) - Fixes to Panel to make it easier to subclass (GH888)
Bug Fixes¶
- Fix overflow-related bugs in groupby (GH850, GH851)
- Fix unhelpful error message in parsers (GH856)
- Better err msg for failed boolean slicing of dataframe (GH859)
- Series.count cannot accept a string (level name) in the level argument (GH869)
- Group index platform int check (GH870)
- concat on axis=1 and ignore_index=True raises TypeError (GH871)
- Further unicode handling issues resolved (GH795)
- Fix failure in multiindex-based access in Panel (GH880)
- Fix DataFrame boolean slice assignment failure (GH881)
- Fix combineAdd NotImplementedError for SparseDataFrame (GH887)
- Fix DataFrame.to_html encoding and columns (GH890, GH891, GH909)
- Fix na-filling handling in mixed-type DataFrame (GH910)
- Fix to DataFrame.set_value with non-existant row/col (GH911)
- Fix malformed block in groupby when excluding nuisance columns (GH916)
- Fix inconsistant NA handling in dtype=object arrays (GH925)
- Fix missing center-of-mass computation in ewmcov (GH862)
- Don’t raise exception when opening read-only HDF5 file (GH847)
- Fix possible out-of-bounds memory access in 0-length Series (GH917)
pandas 0.7.1¶
Release date: February 29, 2012
New Features¶
- Add
to_clipboard
function to pandas namespace for writing objects to the system clipboard (GH774) - Add
itertuples
method to DataFrame for iterating through the rows of a dataframe as tuples (GH818) - Add ability to pass fill_value and method to DataFrame and Series align method (GH806, GH807)
- Add fill_value option to reindex, align methods (GH784)
- Enable concat to produce DataFrame from Series (GH787)
- Add
between
method to Series (GH802) - Add HTML representation hook to DataFrame for the IPython HTML notebook (GH773)
- Support for reading Excel 2007 XML documents using openpyxl
Improvements to existing features¶
- Improve performance and memory usage of fillna on DataFrame
- Can concatenate a list of Series along axis=1 to obtain a DataFrame (GH787)
Bug Fixes¶
- Fix memory leak when inserting large number of columns into a single DataFrame (GH790)
- Appending length-0 DataFrame with new columns would not result in those new columns being part of the resulting concatenated DataFrame (GH782)
- Fixed groupby corner case when passing dictionary grouper and as_index is False (GH819)
- Fixed bug whereby bool array sometimes had object dtype (GH820)
- Fix exception thrown on np.diff (GH816)
- Fix to_records where columns are non-strings (GH822)
- Fix Index.intersection where indices have incomparable types (GH811)
- Fix ExcelFile throwing an exception for two-line file (GH837)
- Add clearer error message in csv parser (GH835)
- Fix loss of fractional seconds in HDFStore (GH513)
- Fix DataFrame join where columns have datetimes (GH787)
- Work around numpy performance issue in take (GH817)
- Improve comparison operations for NA-friendliness (GH801)
- Fix indexing operation for floating point values (GH780, GH798)
- Fix groupby case resulting in malformed dataframe (GH814)
- Fix behavior of reindex of Series dropping name (GH812)
- Improve on redudant groupby computation (GH775)
- Catch possible NA assignment to int/bool series with exception (GH839)
pandas 0.7.0¶
Release date: 2/9/2012
New Features¶
- New
merge
function for efficiently performing full gamut of database / relational-algebra operations. Refactored existing join methods to use the new infrastructure, resulting in substantial performance gains (GH220, GH249, GH267) - New
concat
function for concatenating DataFrame or Panel objects along an axis. Can form union or intersection of the other axes. Improves performance ofDataFrame.append
(GH468, GH479, GH273) - Handle differently-indexed output values in
DataFrame.apply
(GH498) - Can pass list of dicts (e.g., a list of shallow JSON objects) to DataFrame constructor (GH526)
- Add
reorder_levels
method to Series and DataFrame (GH534) - Add dict-like
get
function to DataFrame and Panel (GH521) DataFrame.iterrows
method for efficiently iterating through the rows of a DataFrame- Added
DataFrame.to_panel
with code adapted fromLongPanel.to_long
reindex_axis
method added to DataFrame- Add
level
option to binary arithmetic functions onDataFrame
andSeries
- Add
level
option to thereindex
andalign
methods on Series and DataFrame for broadcasting values across a level (GH542, GH552, others) - Add attribute-based item access to
Panel
and add IPython completion (PR GH554) - Add
logy
option toSeries.plot
for log-scaling on the Y axis - Add
index
,header
, andjustify
options toDataFrame.to_string
. Add option to (GH570, GH571) - Can pass multiple DataFrames to
DataFrame.join
to join on index (GH115) - Can pass multiple Panels to
Panel.join
(GH115) - Can pass multiple DataFrames to DataFrame.append to concatenate (stack)
and multiple Series to
Series.append
too - Added
justify
argument toDataFrame.to_string
to allow different alignment of column headers - Add
sort
option to GroupBy to allow disabling sorting of the group keys for potential speedups (GH595) - Can pass MaskedArray to Series constructor (GH563)
- Add Panel item access via attributes and IPython completion (GH554)
- Implement
DataFrame.lookup
, fancy-indexing analogue for retrieving values given a sequence of row and column labels (GH338) - Add
verbose
option toread_csv
andread_table
to show number of NA values inserted in non-numeric columns (GH614) - Can pass a list of dicts or Series to
DataFrame.append
to concatenate multiple rows (GH464) - Add
level
argument toDataFrame.xs
for selecting data from other MultiIndex levels. Can take one or more levels with potentially a tuple of keys for flexible retrieval of data (GH371, GH629) - New
crosstab
function for easily computing frequency tables (GH170) - Can pass a list of functions to aggregate with groupby on a DataFrame, yielding an aggregated result with hierarchical columns (GH166)
- Add integer-indexing functions
iget
in Series andirow
/iget
in DataFrame (GH628) - Add new
Series.unique
function, significantly faster thannumpy.unique
(GH658) - Add new
cummin
andcummax
instance methods toSeries
andDataFrame
(GH647) - Add new
value_range
function to return min/max of a dataframe (GH288) - Add
drop
parameter toreset_index
method ofDataFrame
and added method toSeries
as well (GH699) - Add
isin
method to Index objects, works just likeSeries.isin
(GH GH657) - Implement array interface on Panel so that ufuncs work (re: GH740)
- Add
sort
option toDataFrame.join
(GH731) - Improved handling of NAs (propagation) in binary operations with dtype=object arrays (GH737)
- Add
abs
method to Pandas objects - Added
algorithms
module to start collecting central algos
API Changes¶
- Label-indexing with integer indexes now raises KeyError if a label is not found instead of falling back on location-based indexing (GH700)
- Label-based slicing via
ix
or[]
on Series will now only work if exact matches for the labels are found or if the index is monotonic (for range selections) - Label-based slicing and sequences of labels can be passed to
[]
on a Series for both getting and setting (GH86) - [] operator (
__getitem__
and__setitem__
) will raise KeyError with integer indexes when an index is not contained in the index. The prior behavior would fall back on position-based indexing if a key was not found in the index which would lead to subtle bugs. This is now consistent with the behavior of.ix
on DataFrame and friends (GH328) - Rename
DataFrame.delevel
toDataFrame.reset_index
and add deprecation warning - Series.sort (an in-place operation) called on a Series which is a view on a larger array (e.g. a column in a DataFrame) will generate an Exception to prevent accidentally modifying the data source (GH316)
- Refactor to remove deprecated
LongPanel
class (GH552) - Deprecated
Panel.to_long
, renamed toto_frame
- Deprecated
colSpace
argument inDataFrame.to_string
, renamed tocol_space
- Rename
precision
toaccuracy
in engineering float formatter (GH GH395) - The default delimiter for
read_csv
is comma rather than lettingcsv.Sniffer
infer it - Rename
col_or_columns
argument inDataFrame.drop_duplicates
(GH GH734)
Improvements to existing features¶
- Better error message in DataFrame constructor when passed column labels don’t match data (GH497)
- Substantially improve performance of multi-GroupBy aggregation when a Python function is passed, reuse ndarray object in Cython (GH496)
- Can store objects indexed by tuples and floats in HDFStore (GH492)
- Don’t print length by default in Series.to_string, add length option (GH GH489)
- Improve Cython code for multi-groupby to aggregate without having to sort the data (GH93)
- Improve MultiIndex reindexing speed by storing tuples in the MultiIndex, test for backwards unpickling compatibility
- Improve column reindexing performance by using specialized Cython take function
- Further performance tweaking of Series.__getitem__ for standard use cases
- Avoid Index dict creation in some cases (i.e. when getting slices, etc.), regression from prior versions
- Friendlier error message in setup.py if NumPy not installed
- Use common set of NA-handling operations (sum, mean, etc.) in Panel class also (GH536)
- Default name assignment when calling
reset_index
on DataFrame with a regular (non-hierarchical) index (GH476) - Use Cythonized groupers when possible in Series/DataFrame stat ops with
level
parameter passed (GH545) - Ported skiplist data structure to C to speed up
rolling_median
by about 5-10x in most typical use cases (GH374) - Some performance enhancements in constructing a Panel from a dict of DataFrame objects
- Made
Index._get_duplicates
a public method by removing the underscore - Prettier printing of floats, and column spacing fix (GH395, GH571)
- Add
bold_rows
option to DataFrame.to_html (GH586) - Improve the performance of
DataFrame.sort_index
by up to 5x or more when sorting by multiple columns - Substantially improve performance of DataFrame and Series constructors when passed a nested dict or dict, respectively (GH540, GH621)
- Modified setup.py so that pip / setuptools will install dependencies (GH GH507, various pull requests)
- Unstack called on DataFrame with non-MultiIndex will return Series (GH GH477)
- Improve DataFrame.to_string and console formatting to be more consistent in the number of displayed digits (GH395)
- Use bottleneck if available for performing NaN-friendly statistical operations that it implemented (GH91)
- Monkey-patch context to traceback in
DataFrame.apply
to indicate which row/column the function application failed on (GH614) - Improved ability of read_table and read_clipboard to parse console-formatted DataFrames (can read the row of index names, etc.)
- Can pass list of group labels (without having to convert to an ndarray
yourself) to
groupby
in some cases (GH659) - Use
kind
argument to Series.order for selecting different sort kinds (GH668) - Add option to Series.to_csv to omit the index (GH684)
- Add
delimiter
as an alternative tosep
inread_csv
and other parsing functions - Substantially improved performance of groupby on DataFrames with many columns by aggregating blocks of columns all at once (GH745)
- Can pass a file handle or StringIO to Series/DataFrame.to_csv (GH765)
- Can pass sequence of integers to DataFrame.irow(icol) and Series.iget, (GH GH654)
- Prototypes for some vectorized string functions
- Add float64 hash table to solve the Series.unique problem with NAs (GH714)
- Memoize objects when reading from file to reduce memory footprint
- Can get and set a column of a DataFrame with hierarchical columns containing “empty” (‘’) lower levels without passing the empty levels (PR GH768)
Bug Fixes¶
- Raise exception in out-of-bounds indexing of Series instead of seg-faulting, regression from earlier releases (GH495)
- Fix error when joining DataFrames of different dtypes within the same typeclass (e.g. float32 and float64) (GH486)
- Fix bug in Series.min/Series.max on objects like datetime.datetime (GH GH487)
- Preserve index names in Index.union (GH501)
- Fix bug in Index joining causing subclass information (like DateRange type) to be lost in some cases (GH500)
- Accept empty list as input to DataFrame constructor, regression from 0.6.0 (GH491)
- Can output DataFrame and Series with ndarray objects in a dtype=object array (GH490)
- Return empty string from Series.to_string when called on empty Series (GH GH488)
- Fix exception passing empty list to DataFrame.from_records
- Fix Index.format bug (excluding name field) with datetimes with time info
- Fix scalar value access in Series to always return NumPy scalars, regression from prior versions (GH510)
- Handle rows skipped at beginning of file in read_* functions (GH505)
- Handle improper dtype casting in
set_value
methods - Unary ‘-‘ / __neg__ operator on DataFrame was returning integer values
- Unbox 0-dim ndarrays from certain operators like all, any in Series
- Fix handling of missing columns (was combine_first-specific) in DataFrame.combine for general case (GH529)
- Fix type inference logic with boolean lists and arrays in DataFrame indexing
- Use centered sum of squares in R-square computation if entity_effects=True in panel regression
- Handle all NA case in Series.{corr, cov}, was raising exception (GH548)
- Aggregating by multiple levels with
level
argument to DataFrame, Series stat method, was broken (GH545) - Fix Cython buf when converter passed to read_csv produced a numeric array (buffer dtype mismatch when passed to Cython type inference function) (GH GH546)
- Fix exception when setting scalar value using .ix on a DataFrame with a MultiIndex (GH551)
- Fix outer join between two DateRanges with different offsets that returned an invalid DateRange
- Cleanup DataFrame.from_records failure where index argument is an integer
- Fix Data.from_records failure when passed a dictionary
- Fix NA handling in {Series, DataFrame}.rank with non-floating point dtypes
- Fix bug related to integer type-checking in .ix-based indexing
- Handle non-string index name passed to DataFrame.from_records
- DataFrame.insert caused the columns name(s) field to be discarded (GH527)
- Fix erroneous in monotonic many-to-one left joins
- Fix DataFrame.to_string to remove extra column white space (GH571)
- Format floats to default to same number of digits (GH395)
- Added decorator to copy docstring from one function to another (GH449)
- Fix error in monotonic many-to-one left joins
- Fix __eq__ comparison between DateOffsets with different relativedelta keywords passed
- Fix exception caused by parser converter returning strings (GH583)
- Fix MultiIndex formatting bug with integer names (GH601)
- Fix bug in handling of non-numeric aggregates in Series.groupby (GH612)
- Fix TypeError with tuple subclasses (e.g. namedtuple) in DataFrame.from_records (GH611)
- Catch misreported console size when running IPython within Emacs
- Fix minor bug in pivot table margins, loss of index names and length-1 ‘All’ tuple in row labels
- Add support for legacy WidePanel objects to be read from HDFStore
- Fix out-of-bounds segfault in pad_object and backfill_object methods when either source or target array are empty
- Could not create a new column in a DataFrame from a list of tuples
- Fix bugs preventing SparseDataFrame and SparseSeries working with groupby (GH666)
- Use sort kind in Series.sort / argsort (GH668)
- Fix DataFrame operations on non-scalar, non-pandas objects (GH672)
- Don’t convert DataFrame column to integer type when passing integer to __setitem__ (GH669)
- Fix downstream bug in pivot_table caused by integer level names in MultiIndex (GH678)
- Fix SparseSeries.combine_first when passed a dense Series (GH687)
- Fix performance regression in HDFStore loading when DataFrame or Panel stored in table format with datetimes
- Raise Exception in DateRange when offset with n=0 is passed (GH683)
- Fix get/set inconsistency with .ix property and integer location but non-integer index (GH707)
- Use right dropna function for SparseSeries. Return dense Series for NA fill value (GH730)
- Fix Index.format bug causing incorrectly string-formatted Series with datetime indexes (GH726, GH758)
- Fix errors caused by object dtype arrays passed to ols (GH759)
- Fix error where column names lost when passing list of labels to DataFrame.__getitem__, (GH662)
- Fix error whereby top-level week iterator overwrote week instance
- Fix circular reference causing memory leak in sparse array / series / frame, (GH663)
- Fix integer-slicing from integers-as-floats (GH670)
- Fix zero division errors in nanops from object dtype arrays in all NA case (GH676)
- Fix csv encoding when using unicode (GH705, GH717, GH738)
- Fix assumption that each object contains every unique block type in concat, (GH708)
- Fix sortedness check of multiindex in to_panel (GH719, 720)
- Fix that None was not treated as NA in PyObjectHashtable
- Fix hashing dtype because of endianness confusion (GH747, GH748)
- Fix SparseSeries.dropna to return dense Series in case of NA fill value (GH GH730)
- Use map_infer instead of np.vectorize. handle NA sentinels if converter yields numeric array, (GH753)
- Fixes and improvements to DataFrame.rank (GH742)
- Fix catching AttributeError instead of NameError for bottleneck
- Try to cast non-MultiIndex to better dtype when calling reset_index (GH726 GH440)
- Fix #1.QNAN0’ float bug on 2.6/win64
- Allow subclasses of dicts in DataFrame constructor, with tests
- Fix problem whereby set_index destroys column multiindex (GH764)
- Hack around bug in generating DateRange from naive DateOffset (GH770)
- Fix bug in DateRange.intersection causing incorrect results with some overlapping ranges (GH771)
Thanks¶
- Craig Austin
- Chris Billington
- Marius Cobzarenco
- Mario Gamboa-Cavazos
- Hans-Martin Gaudecker
- Arthur Gerigk
- Yaroslav Halchenko
- Jeff Hammerbacher
- Matt Harrison
- Andreas Hilboll
- Luc Kesters
- Adam Klein
- Gregg Lind
- Solomon Negusse
- Wouter Overmeire
- Christian Prinoth
- Jeff Reback
- Sam Reckoner
- Craig Reeson
- Jan Schulz
- Skipper Seabold
- Ted Square
- Graham Taylor
- Aman Thakral
- Chris Uga
- Dieter Vandenbussche
- Texas P.
- Pinxing Ye
- ... and everyone I forgot
pandas 0.6.1¶
Release date: 12/13/2011
API Changes¶
- Rename names argument in DataFrame.from_records to columns. Add deprecation warning
- Boolean get/set operations on Series with boolean Series will reindex instead of requiring that the indexes be exactly equal (GH429)
New Features¶
- Can pass Series to DataFrame.append with ignore_index=True for appending a single row (GH430)
- Add Spearman and Kendall correlation options to Series.corr and DataFrame.corr (GH428)
- Add new get_value and set_value methods to Series, DataFrame, and Panel to very low-overhead access to scalar elements. df.get_value(row, column) is about 3x faster than df[column][row] by handling fewer cases (GH437, GH438). Add similar methods to sparse data structures for compatibility
- Add Qt table widget to sandbox (GH435)
- DataFrame.align can accept Series arguments, add axis keyword (GH461)
- Implement new SparseList and SparseArray data structures. SparseSeries now derives from SparseArray (GH463)
- max_columns / max_rows options in set_printoptions (GH453)
- Implement Series.rank and DataFrame.rank, fast versions of scipy.stats.rankdata (GH428)
- Implement DataFrame.from_items alternate constructor (GH444)
- DataFrame.convert_objects method for inferring better dtypes for object columns (GH302)
- Add rolling_corr_pairwise function for computing Panel of correlation matrices (GH189)
- Add margins option to pivot_table for computing subgroup aggregates (GH GH114)
- Add Series.from_csv function (GH482)
Improvements to existing features¶
- Improve memory usage of DataFrame.describe (do not copy data unnecessarily) (GH425)
- Use same formatting function for outputting floating point Series to console as in DataFrame (GH420)
- DataFrame.delevel will try to infer better dtype for new columns (GH440)
- Exclude non-numeric types in DataFrame.{corr, cov}
- Override Index.astype to enable dtype casting (GH412)
- Use same float formatting function for Series.__repr__ (GH420)
- Use available console width to output DataFrame columns (GH453)
- Accept ndarrays when setting items in Panel (GH452)
- Infer console width when printing __repr__ of DataFrame to console (PR GH453)
- Optimize scalar value lookups in the general case by 25% or more in Series and DataFrame
- Can pass DataFrame/DataFrame and DataFrame/Series to rolling_corr/rolling_cov (GH462)
- Fix performance regression in cross-sectional count in DataFrame, affecting DataFrame.dropna speed
- Column deletion in DataFrame copies no data (computes views on blocks) (GH GH158)
- MultiIndex.get_level_values can take the level name
- More helpful error message when DataFrame.plot fails on one of the columns (GH478)
- Improve performance of DataFrame.{index, columns} attribute lookup
Bug Fixes¶
- Fix O(K^2) memory leak caused by inserting many columns without consolidating, had been present since 0.4.0 (GH467)
- DataFrame.count should return Series with zero instead of NA with length-0 axis (GH423)
- Fix Yahoo! Finance API usage in pandas.io.data (GH419, GH427)
- Fix upstream bug causing failure in Series.align with empty Series (GH434)
- Function passed to DataFrame.apply can return a list, as long as it’s the right length. Regression from 0.4 (GH432)
- Don’t “accidentally” upcast scalar values when indexing using .ix (GH431)
- Fix groupby exception raised with as_index=False and single column selected (GH421)
- Implement DateOffset.__ne__ causing downstream bug (GH456)
- Fix __doc__-related issue when converting py -> pyo with py2exe
- Bug fix in left join Cython code with duplicate monotonic labels
- Fix bug when unstacking multiple levels described in GH451
- Exclude NA values in dtype=object arrays, regression from 0.5.0 (GH469)
- Use Cython map_infer function in DataFrame.applymap to properly infer output type, handle tuple return values and other things that were breaking (GH465)
- Handle floating point index values in HDFStore (GH454)
- Fixed stale column reference bug (cached Series object) caused by type change / item deletion in DataFrame (GH473)
- Index.get_loc should always raise Exception when there are duplicates
- Handle differently-indexed Series input to DataFrame constructor (GH475)
- Omit nuisance columns in multi-groupby with Python function
- Buglet in handling of single grouping in general apply
- Handle type inference properly when passing list of lists or tuples to DataFrame constructor (GH484)
- Preserve Index / MultiIndex names in GroupBy.apply concatenation step (GH GH481)
Thanks¶
- Ralph Bean
- Luca Beltrame
- Marius Cobzarenco
- Andreas Hilboll
- Jev Kuznetsov
- Adam Lichtenstein
- Wouter Overmeire
- Fernando Perez
- Nathan Pinger
- Christian Prinoth
- Alex Reyfman
- Joon Ro
- Chang She
- Ted Square
- Chris Uga
- Dieter Vandenbussche
pandas 0.6.0¶
Release date: 11/25/2011
API Changes¶
- Arithmetic methods like sum will attempt to sum dtype=object values by default instead of excluding them (GH382)
New Features¶
- Add melt function to pandas.core.reshape
- Add level parameter to group by level in Series and DataFrame descriptive statistics (GH313)
- Add head and tail methods to Series, analogous to to DataFrame (PR GH296)
- Add Series.isin function which checks if each value is contained in a passed sequence (GH289)
- Add float_format option to Series.to_string
- Add skip_footer (GH291) and converters (GH343) options to read_csv and read_table
- Add proper, tested weighted least squares to standard and panel OLS (GH GH303)
- Add drop_duplicates and duplicated functions for removing duplicate DataFrame rows and checking for duplicate rows, respectively (GH319)
- Implement logical (boolean) operators
&
,|
,^
on DataFrame (GH347) - Add Series.mad, mean absolute deviation, matching DataFrame
- Add QuarterEnd DateOffset (GH321)
- Add matrix multiplication function dot to DataFrame (GH65)
- Add orient option to Panel.from_dict to ease creation of mixed-type Panels (GH359, GH301)
- Add DataFrame.from_dict with similar orient option
- Can now pass list of tuples or list of lists to DataFrame.from_records for fast conversion to DataFrame (GH357)
- Can pass multiple levels to groupby, e.g. df.groupby(level=[0, 1]) (GH GH103)
- Can sort by multiple columns in DataFrame.sort_index (GH92, GH362)
- Add fast get_value and put_value methods to DataFrame and micro-performance tweaks (GH360)
- Add cov instance methods to Series and DataFrame (GH194, GH362)
- Add bar plot option to DataFrame.plot (GH348)
- Add idxmin and idxmax functions to Series and DataFrame for computing index labels achieving maximum and minimum values (GH286)
- Add read_clipboard function for parsing DataFrame from OS clipboard, should work across platforms (GH300)
- Add nunique function to Series for counting unique elements (GH297)
- DataFrame constructor will use Series name if no columns passed (GH373)
- Support regular expressions and longer delimiters in read_table/read_csv, but does not handle quoted strings yet (GH364)
- Add DataFrame.to_html for formatting DataFrame to HTML (GH387)
- MaskedArray can be passed to DataFrame constructor and masked values will be converted to NaN (GH396)
- Add DataFrame.boxplot function (GH368, others)
- Can pass extra args, kwds to DataFrame.apply (GH376)
Improvements to existing features¶
- Raise more helpful exception if date parsing fails in DateRange (GH298)
- Vastly improved performance of GroupBy on axes with a MultiIndex (GH299)
- Print level names in hierarchical index in Series repr (GH305)
- Return DataFrame when performing GroupBy on selected column and as_index=False (GH308)
- Can pass vector to on argument in DataFrame.join (GH312)
- Don’t show Series name if it’s None in the repr, also omit length for short Series (GH317)
- Show legend by default in DataFrame.plot, add legend boolean flag (GH GH324)
- Significantly improved performance of Series.order, which also makes np.unique called on a Series faster (GH327)
- Faster cythonized count by level in Series and DataFrame (GH341)
- Raise exception if dateutil 2.0 installed on Python 2.x runtime (GH346)
- Significant GroupBy performance enhancement with multiple keys with many “empty” combinations
- New Cython vectorized function map_infer speeds up Series.apply and Series.map significantly when passed elementwise Python function, motivated by GH355
- Cythonized cache_readonly, resulting in substantial micro-performance enhancements throughout the codebase (GH361)
- Special Cython matrix iterator for applying arbitrary reduction operations with 3-5x better performance than np.apply_along_axis (GH309)
- Add raw option to DataFrame.apply for getting better performance when the passed function only requires an ndarray (GH309)
- Improve performance of MultiIndex.from_tuples
- Can pass multiple levels to stack and unstack (GH370)
- Can pass multiple values columns to pivot_table (GH381)
- Can call DataFrame.delevel with standard Index with name set (GH393)
- Use Series name in GroupBy for result index (GH363)
- Refactor Series/DataFrame stat methods to use common set of NaN-friendly function
- Handle NumPy scalar integers at C level in Cython conversion routines
Bug Fixes¶
- Fix bug in DataFrame.to_csv when writing a DataFrame with an index name (GH290)
- DataFrame should clear its Series caches on consolidation, was causing “stale” Series to be returned in some corner cases (GH304)
- DataFrame constructor failed if a column had a list of tuples (GH293)
- Ensure that Series.apply always returns a Series and implement Series.round (GH314)
- Support boolean columns in Cythonized groupby functions (GH315)
- DataFrame.describe should not fail if there are no numeric columns, instead return categorical describe (GH323)
- Fixed bug which could cause columns to be printed in wrong order in DataFrame.to_string if specific list of columns passed (GH325)
- Fix legend plotting failure if DataFrame columns are integers (GH326)
- Shift start date back by one month for Yahoo! Finance API in pandas.io.data (GH329)
- Fix DataFrame.join failure on unconsolidated inputs (GH331)
- DataFrame.min/max will no longer fail on mixed-type DataFrame (GH337)
- Fix read_csv / read_table failure when passing list to index_col that is not in ascending order (GH349)
- Fix failure passing Int64Index to Index.union when both are monotonic
- Fix error when passing SparseSeries to (dense) DataFrame constructor
- Added missing bang at top of setup.py (GH352)
- Change is_monotonic on MultiIndex so it properly compares the tuples
- Fix MultiIndex outer join logic (GH351)
- Set index name attribute with single-key groupby (GH358)
- Bug fix in reflexive binary addition in Series and DataFrame for non-commutative operations (like string concatenation) (GH353)
- setupegg.py will invoke Cython (GH192)
- Fix block consolidation bug after inserting column into MultiIndex (GH366)
- Fix bug in join operations between Index and Int64Index (GH367)
- Handle min_periods=0 case in moving window functions (GH365)
- Fixed corner cases in DataFrame.apply/pivot with empty DataFrame (GH378)
- Fixed repr exception when Series name is a tuple
- Always return DateRange from asfreq (GH390)
- Pass level names to swaplavel (GH379)
- Don’t lose index names in MultiIndex.droplevel (GH394)
- Infer more proper return type in DataFrame.apply when no columns or rows depending on whether the passed function is a reduction (GH389)
- Always return NA/NaN from Series.min/max and DataFrame.min/max when all of a row/column/values are NA (GH384)
- Enable partial setting with .ix / advanced indexing (GH397)
- Handle mixed-type DataFrames correctly in unstack, do not lose type information (GH403)
- Fix integer name formatting bug in Index.format and in Series.__repr__
- Handle label types other than string passed to groupby (GH405)
- Fix bug in .ix-based indexing with partial retrieval when a label is not contained in a level
- Index name was not being pickled (GH408)
- Level name should be passed to result index in GroupBy.apply (GH416)
Thanks¶
- Craig Austin
- Marius Cobzarenco
- Joel Cross
- Jeff Hammerbacher
- Adam Klein
- Thomas Kluyver
- Jev Kuznetsov
- Kieran O’Mahony
- Wouter Overmeire
- Nathan Pinger
- Christian Prinoth
- Skipper Seabold
- Chang She
- Ted Square
- Aman Thakral
- Chris Uga
- Dieter Vandenbussche
- carljv
- rsamson
pandas 0.5.0¶
Release date: 10/24/2011
This release of pandas includes a number of API changes (see below) and cleanup of deprecated APIs from pre-0.4.0 releases. There are also bug fixes, new features, numerous significant performance enhancements, and includes a new ipython completer hook to enable tab completion of DataFrame columns accesses and attributes (a new feature).
In addition to the changes listed here from 0.4.3 to 0.5.0, the minor releases 4.1, 0.4.2, and 0.4.3 brought some significant new functionality and performance improvements that are worth taking a look at.
Thanks to all for bug reports, contributed patches and generally providing feedback on the library.
API Changes¶
- read_table, read_csv, and ExcelFile.parse default arguments for index_col is now None. To use one or more of the columns as the resulting DataFrame’s index, these must be explicitly specified now
- Parsing functions like read_csv no longer parse dates by default (GH GH225)
- Removed weights option in panel regression which was not doing anything principled (GH155)
- Changed buffer argument name in Series.to_string to buf
- Series.to_string and DataFrame.to_string now return strings by default instead of printing to sys.stdout
- Deprecated nanRep argument in various to_string and to_csv functions in favor of na_rep. Will be removed in 0.6 (GH275)
- Renamed delimiter to sep in DataFrame.from_csv for consistency
- Changed order of Series.clip arguments to match those of numpy.clip and added (unimplemented) out argument so numpy.clip can be called on a Series (GH272)
- Series functions renamed (and thus deprecated) in 0.4 series have been
removed:
- asOf, use asof
- toDict, use to_dict
- toString, use to_string
- toCSV, use to_csv
- merge, use map
- applymap, use apply
- combineFirst, use combine_first
- _firstTimeWithValue use first_valid_index
- _lastTimeWithValue use last_valid_index
- DataFrame functions renamed / deprecated in 0.4 series have been removed:
- asMatrix method, use as_matrix or values attribute
- combineFirst, use combine_first
- getXS, use xs
- merge, use join
- fromRecords, use from_records
- fromcsv, use from_csv
- toRecords, use to_records
- toDict, use to_dict
- toString, use to_string
- toCSV, use to_csv
- _firstTimeWithValue use first_valid_index
- _lastTimeWithValue use last_valid_index
- toDataMatrix is no longer needed
- rows() method, use index attribute
- cols() method, use columns attribute
- dropEmptyRows(), use dropna(how=’all’)
- dropIncompleteRows(), use dropna()
- tapply(f), use apply(f, axis=1)
- tgroupby(keyfunc, aggfunc), use groupby with axis=1
Deprecations Removed¶
- indexField argument in DataFrame.from_records
- missingAtEnd argument in Series.order. Use na_last instead
- Series.fromValue classmethod, use regular Series constructor instead
- Functions parseCSV, parseText, and parseExcel methods in pandas.io.parsers have been removed
- Index.asOfDate function
- Panel.getMinorXS (use minor_xs) and Panel.getMajorXS (use major_xs)
- Panel.toWide, use Panel.to_wide instead
New Features¶
- Added DataFrame.align method with standard join options
- Added parse_dates option to read_csv and read_table methods to optionally try to parse dates in the index columns
- Add nrows, chunksize, and iterator arguments to read_csv and read_table. The last two return a new TextParser class capable of lazily iterating through chunks of a flat file (GH242)
- Added ability to join on multiple columns in DataFrame.join (GH214)
- Added private _get_duplicates function to Index for identifying duplicate values more easily
- Added column attribute access to DataFrame, e.g. df.A equivalent to df[‘A’] if ‘A’ is a column in the DataFrame (GH213)
- Added IPython tab completion hook for DataFrame columns. (GH233, GH230)
- Implement Series.describe for Series containing objects (GH241)
- Add inner join option to DataFrame.join when joining on key(s) (GH248)
- Can select set of DataFrame columns by passing a list to __getitem__ (GH GH253)
- Can use & and | to intersection / union Index objects, respectively (GH GH261)
- Added pivot_table convenience function to pandas namespace (GH234)
- Implemented Panel.rename_axis function (GH243)
- DataFrame will show index level names in console output
- Implemented Panel.take
- Add set_eng_float_format function for setting alternate DataFrame floating point string formatting
- Add convenience set_index function for creating a DataFrame index from its existing columns
Improvements to existing features¶
- Major performance improvements in file parsing functions read_csv and read_table
- Added Cython function for converting tuples to ndarray very fast. Speeds up many MultiIndex-related operations
- File parsing functions like read_csv and read_table will explicitly check if a parsed index has duplicates and raise a more helpful exception rather than deferring the check until later
- Refactored merging / joining code into a tidy class and disabled unnecessary computations in the float/object case, thus getting about 10% better performance (GH211)
- Improved speed of DataFrame.xs on mixed-type DataFrame objects by about 5x, regression from 0.3.0 (GH215)
- With new DataFrame.align method, speeding up binary operations between differently-indexed DataFrame objects by 10-25%.
- Significantly sped up conversion of nested dict into DataFrame (GH212)
- Can pass hierarchical index level name to groupby instead of the level number if desired (GH223)
- Add support for different delimiters in DataFrame.to_csv (GH244)
- Add more helpful error message when importing pandas post-installation from the source directory (GH250)
- Significantly speed up DataFrame __repr__ and count on large mixed-type DataFrame objects
- Better handling of pyx file dependencies in Cython module build (GH271)
Bug Fixes¶
- read_csv / read_table fixes
- Be less aggressive about converting float->int in cases of floating point representations of integers like 1.0, 2.0, etc.
- “True”/”False” will not get correctly converted to boolean
- Index name attribute will get set when specifying an index column
- Passing column names should force header=None (GH257)
- Don’t modify passed column names when index_col is not None (GH258)
- Can sniff CSV separator in zip file (since seek is not supported, was failing before)
- Worked around matplotlib “bug” in which series[:, np.newaxis] fails. Should be reported upstream to matplotlib (GH224)
- DataFrame.iteritems was not returning Series with the name attribute set. Also neither was DataFrame._series
- Can store datetime.date objects in HDFStore (GH231)
- Index and Series names are now stored in HDFStore
- Fixed problem in which data would get upcasted to object dtype in GroupBy.apply operations (GH237)
- Fixed outer join bug with empty DataFrame (GH238)
- Can create empty Panel (GH239)
- Fix join on single key when passing list with 1 entry (GH246)
- Don’t raise Exception on plotting DataFrame with an all-NA column (GH251, GH254)
- Bug min/max errors when called on integer DataFrames (GH241)
- DataFrame.iteritems and DataFrame._series not assigning name attribute
- Panel.__repr__ raised exception on length-0 major/minor axes
- DataFrame.join on key with empty DataFrame produced incorrect columns
- Implemented MultiIndex.diff (GH260)
- Int64Index.take and MultiIndex.take lost name field, fix downstream issue GH262
- Can pass list of tuples to Series (GH270)
- Can pass level name to DataFrame.stack
- Support set operations between MultiIndex and Index
- Fix many corner cases in MultiIndex set operations - Fix MultiIndex-handling bug with GroupBy.apply when returned groups are not indexed the same
- Fix corner case bugs in DataFrame.apply
- Setting DataFrame index did not cause Series cache to get cleared
- Various int32 -> int64 platform-specific issues
- Don’t be too aggressive converting to integer when parsing file with MultiIndex (GH285)
- Fix bug when slicing Series with negative indices before beginning
Thanks¶
- Thomas Kluyver
- Daniel Fortunov
- Aman Thakral
- Luca Beltrame
- Wouter Overmeire
pandas 0.4.3¶
Release date: 10/9/2011
is is largely a bugfix release from 0.4.2 but also includes a handful of new d enhanced features. Also, pandas can now be installed and used on Python 3 hanks Thomas Kluyver!).
New Features¶
- Python 3 support using 2to3 (GH200, Thomas Kluyver)
- Add name attribute to Series and added relevant logic and tests. Name now prints as part of Series.__repr__
- Add name attribute to standard Index so that stacking / unstacking does not discard names and so that indexed DataFrame objects can be reliably round-tripped to flat files, pickle, HDF5, etc.
- Add isnull and notnull as instance methods on Series (GH209, GH203)
Improvements to existing features¶
- Skip xlrd-related unit tests if not installed
- Index.append and MultiIndex.append can accept a list of Index objects to concatenate together
- Altered binary operations on differently-indexed SparseSeries objects to use the integer-based (dense) alignment logic which is faster with a larger number of blocks (GH205)
- Refactored Series.__repr__ to be a bit more clean and consistent
API Changes¶
- Series.describe and DataFrame.describe now bring the 25% and 75% quartiles instead of the 10% and 90% deciles. The other outputs have not changed
- Series.toString will print deprecation warning, has been de-camelCased to to_string
Bug Fixes¶
- Fix broken interaction between Index and Int64Index when calling intersection. Implement Int64Index.intersection
- MultiIndex.sortlevel discarded the level names (GH202)
- Fix bugs in groupby, join, and append due to improper concatenation of MultiIndex objects (GH201)
- Fix regression from 0.4.1, isnull and notnull ceased to work on other kinds of Python scalar objects like datetime.datetime
- Raise more helpful exception when attempting to write empty DataFrame or LongPanel to HDFStore (GH204)
- Use stdlib csv module to properly escape strings with commas in DataFrame.to_csv (GH206, Thomas Kluyver)
- Fix Python ndarray access in Cython code for sparse blocked index integrity check
- Fix bug writing Series to CSV in Python 3 (GH209)
- Miscellaneous Python 3 bugfixes
Thanks¶
- Thomas Kluyver
- rsamson
pandas 0.4.2¶
Release date: 10/3/2011
is is a performance optimization release with several bug fixes. The new t64Index and new merging / joining Cython code and related Python frastructure are the main new additions
New Features¶
- Added fast Int64Index type with specialized join, union, intersection. Will result in significant performance enhancements for int64-based time series (e.g. using NumPy’s datetime64 one day) and also faster operations on DataFrame objects storing record array-like data.
- Refactored Index classes to have a join method and associated data alignment routines throughout the codebase to be able to leverage optimized joining / merging routines.
- Added Series.align method for aligning two series with choice of join method
- Wrote faster Cython data alignment / merging routines resulting in substantial speed increases
- Added is_monotonic property to Index classes with associated Cython code to evaluate the monotonicity of the Index values
- Add method get_level_values to MultiIndex
- Implemented shallow copy of BlockManager object in DataFrame internals
Improvements to existing features¶
- Improved performance of isnull and notnull, a regression from v0.3.0 (GH187)
- Wrote templating / code generation script to auto-generate Cython code for various functions which need to be available for the 4 major data types used in pandas (float64, bool, object, int64)
- Refactored code related to DataFrame.join so that intermediate aligned copies of the data in each DataFrame argument do not need to be created. Substantial performance increases result (GH176)
- Substantially improved performance of generic Index.intersection and Index.union
- Improved performance of DateRange.union with overlapping ranges and non-cacheable offsets (like Minute). Implemented analogous fast DateRange.intersection for overlapping ranges.
- Implemented BlockManager.take resulting in significantly faster take performance on mixed-type DataFrame objects (GH104)
- Improved performance of Series.sort_index
- Significant groupby performance enhancement: removed unnecessary integrity checks in DataFrame internals that were slowing down slicing operations to retrieve groups
- Added informative Exception when passing dict to DataFrame groupby aggregation with axis != 0
API Changes¶
Bug Fixes¶
- Fixed minor unhandled exception in Cython code implementing fast groupby aggregation operations
- Fixed bug in unstacking code manifesting with more than 3 hierarchical levels
- Throw exception when step specified in label-based slice (GH185)
- Fix isnull to correctly work with np.float32. Fix upstream bug described in GH182
- Finish implementation of as_index=False in groupby for DataFrame aggregation (GH181)
- Raise SkipTest for pre-epoch HDFStore failure. Real fix will be sorted out via datetime64 dtype
Thanks¶
- Uri Laserson
- Scott Sinclair
pandas 0.4.1¶
Release date: 9/25/2011
is is primarily a bug fix release but includes some new features and improvements
New Features¶
- Added new DataFrame methods get_dtype_counts and property dtypes
- Setting of values using
.ix
indexing attribute in mixed-type DataFrame objects has been implemented (fixes GH135) - read_csv can read multiple columns into a MultiIndex. DataFrame’s to_csv method will properly write out a MultiIndex which can be read back (GH151, thanks to Skipper Seabold)
- Wrote fast time series merging / joining methods in Cython. Will be integrated later into DataFrame.join and related functions
- Added ignore_index option to DataFrame.append for combining unindexed records stored in a DataFrame
Improvements to existing features¶
- Some speed enhancements with internal Index type-checking function
- DataFrame.rename has a new copy parameter which can rename a DataFrame in place
- Enable unstacking by level name (GH142)
- Enable sortlevel to work by level name (GH141)
- read_csv can automatically “sniff” other kinds of delimiters using csv.Sniffer (GH146)
- Improved speed of unit test suite by about 40%
- Exception will not be raised calling HDFStore.remove on non-existent node with where clause
- Optimized _ensure_index function resulting in performance savings in type-checking Index objects
API Changes¶
Bug Fixes¶
- Fixed DataFrame constructor bug causing downstream problems (e.g. .copy() failing) when passing a Series as the values along with a column name and index
- Fixed single-key groupby on DataFrame with as_index=False (GH160)
- Series.shift was failing on integer Series (GH154)
- unstack methods were producing incorrect output in the case of duplicate hierarchical labels. An exception will now be raised (GH147)
- Calling count with level argument caused reduceat failure or segfault in earlier NumPy (GH169)
- Fixed DataFrame.corrwith to automatically exclude non-numeric data (GH GH144)
- Unicode handling bug fixes in DataFrame.to_string (GH138)
- Excluding OLS degenerate unit test case that was causing platform specific failure (GH149)
- Skip blosc-dependent unit tests for PyTables < 2.2 (GH137)
- Calling copy on DateRange did not copy over attributes to the new object (GH168)
- Fix bug in HDFStore in which Panel data could be appended to a Table with different item order, thus resulting in an incorrect result read back
Thanks¶
- Yaroslav Halchenko
- Jeff Reback
- Skipper Seabold
- Dan Lovell
- Nick Pentreath
pandas 0.4.0¶
Release date: 9/12/2011
New Features¶
- pandas.core.sparse module: “Sparse” (mostly-NA, or some other fill value) versions of Series, DataFrame, and Panel. For low-density data, this will result in significant performance boosts, and smaller memory footprint. Added to_sparse methods to Series, DataFrame, and Panel. See online documentation for more on these
- Fancy indexing operator on Series / DataFrame, e.g. via .ix operator. Both
getting and setting of values is supported; however, setting values will only
currently work on homogeneously-typed DataFrame objects. Things like:
- series.ix[[d1, d2, d3]]
- frame.ix[5:10, [‘C’, ‘B’, ‘A’]], frame.ix[5:10, ‘A’:’C’]
- frame.ix[date1:date2]
- Significantly enhanced groupby functionality
- Can groupby multiple keys, e.g. df.groupby([‘key1’, ‘key2’]). Iteration with multiple groupings products a flattened tuple
- “Nuisance” columns (non-aggregatable) will automatically be excluded from DataFrame aggregation operations
- Added automatic “dispatching to Series / DataFrame methods to more easily invoke methods on groups. e.g. s.groupby(crit).std() will work even though std is not implemented on the GroupBy class
- Hierarchical / multi-level indexing
- New the MultiIndex class. Integrated MultiIndex into Series and DataFrame fancy indexing, slicing, __getitem__ and __setitem, reindexing, etc. Added level keyword argument to groupby to enable grouping by a level of a MultiIndex
- New data reshaping functions: stack and unstack on DataFrame and Series
- Integrate with MultiIndex to enable sophisticated reshaping of data
- Index objects (labels for axes) are now capable of holding tuples
- Series.describe, DataFrame.describe: produces an R-like table of summary statistics about each data column
- DataFrame.quantile, Series.quantile for computing sample quantiles of data across requested axis
- Added general DataFrame.dropna method to replace dropIncompleteRows and dropEmptyRows, deprecated those.
- Series arithmetic methods with optional fill_value for missing data, e.g. a.add(b, fill_value=0). If a location is missing for both it will still be missing in the result though.
- fill_value option has been added to DataFrame.{add, mul, sub, div} methods similar to Series
- Boolean indexing with DataFrame objects: data[data > 0.1] = 0.1 or data[data> other] = 1.
- pytz / tzinfo support in DateRange
- tz_localize, tz_normalize, and tz_validate methods added
- Added ExcelFile class to pandas.io.parsers for parsing multiple sheets out of a single Excel 2003 document
- GroupBy aggregations can now optionally broadcast, e.g. produce an object of the same size with the aggregated value propagated
- Added select function in all data structures: reindex axis based on arbitrary criterion (function returning boolean value), e.g. frame.select(lambda x: ‘foo’ in x, axis=1)
- DataFrame.consolidate method, API function relating to redesigned internals
- DataFrame.insert method for inserting column at a specified location rather than the default __setitem__ behavior (which puts it at the end)
- HDFStore class in pandas.io.pytables has been largely rewritten using patches from Jeff Reback from others. It now supports mixed-type DataFrame and Series data and can store Panel objects. It also has the option to query DataFrame and Panel data. Loading data from legacy HDFStore files is supported explicitly in the code
- Added set_printoptions method to modify appearance of DataFrame tabular output
- rolling_quantile functions; a moving version of Series.quantile / DataFrame.quantile
- Generic rolling_apply moving window function
- New drop method added to Series, DataFrame, etc. which can drop a set of labels from an axis, producing a new object
- reindex methods now sport a copy option so that data is not forced to be copied then the resulting object is indexed the same
- Added sort_index methods to Series and Panel. Renamed DataFrame.sort to sort_index. Leaving DataFrame.sort for now.
- Added
skipna
option to statistical instance methods on all the data structures - pandas.io.data module providing a consistent interface for reading time series data from several different sources
Improvements to existing features¶
- The 2-dimensional DataFrame and DataMatrix classes have been extensively
redesigned internally into a single class DataFrame, preserving where
possible their optimal performance characteristics. This should reduce
confusion from users about which class to use.
- Note that under the hood there is a new essentially “lazy evaluation” scheme within respect to adding columns to DataFrame. During some operations, like-typed blocks will be “consolidated” but not before.
- DataFrame accessing columns repeatedly is now significantly faster than DataMatrix used to be in 0.3.0 due to an internal Series caching mechanism (which are all views on the underlying data)
- Column ordering for mixed type data is now completely consistent in DataFrame. In prior releases, there was inconsistent column ordering in DataMatrix
- Improved console / string formatting of DataMatrix with negative numbers
- Improved tabular data parsing functions, read_table and read_csv:
- Added skiprows and na_values arguments to pandas.io.parsers functions for more flexible IO
- parseCSV / read_csv functions and others in pandas.io.parsers now can take a list of custom NA values, and also a list of rows to skip
- Can slice DataFrame and get a view of the data (when homogeneously typed), e.g. frame.xs(idx, copy=False) or frame.ix[idx]
- Many speed optimizations throughout Series and DataFrame
- Eager evaluation of groups when calling
groupby
functions, so if there is an exception with the grouping function it will raised immediately versus sometime later on when the groups are needed - datetools.WeekOfMonth offset can be parameterized with n different than 1 or -1.
- Statistical methods on DataFrame like mean, std, var, skew will now ignore non-numerical data. Before a not very useful error message was generated. A flag numeric_only has been added to DataFrame.sum and DataFrame.count to enable this behavior in those methods if so desired (disabled by default)
- DataFrame.pivot generalized to enable pivoting multiple columns into a DataFrame with hierarchical columns
- DataFrame constructor can accept structured / record arrays
- Panel constructor can accept a dict of DataFrame-like objects. Do not need to use from_dict anymore (from_dict is there to stay, though).
API Changes¶
- The DataMatrix variable now refers to DataFrame, will be removed within two releases
- WidePanel is now known as Panel. The WidePanel variable in the pandas namespace now refers to the renamed Panel class
- LongPanel and Panel / WidePanel now no longer have a common subclass. LongPanel is now a subclass of DataFrame having a number of additional methods and a hierarchical index instead of the old LongPanelIndex object, which has been removed. Legacy LongPanel pickles may not load properly
- Cython is now required to build pandas from a development branch. This was done to avoid continuing to check in cythonized C files into source control. Builds from released source distributions will not require Cython
- Cython code has been moved up to a top level pandas/src directory. Cython
extension modules have been renamed and promoted from the lib subpackage to
the top level, i.e.
- pandas.lib.tseries -> pandas._tseries
- pandas.lib.sparse -> pandas._sparse
- DataFrame pickling format has changed. Backwards compatibility for legacy pickles is provided, but it’s recommended to consider PyTables-based HDFStore for storing data with a longer expected shelf life
- A copy argument has been added to the DataFrame constructor to avoid unnecessary copying of data. Data is no longer copied by default when passed into the constructor
- Handling of boolean dtype in DataFrame has been improved to support storage of boolean data with NA / NaN values. Before it was being converted to float64 so this should not (in theory) cause API breakage
- To optimize performance, Index objects now only check that their labels are unique when uniqueness matters (i.e. when someone goes to perform a lookup). This is a potentially dangerous tradeoff, but will lead to much better performance in many places (like groupby).
- Boolean indexing using Series must now have the same indices (labels)
- Backwards compatibility support for begin/end/nPeriods keyword arguments in DateRange class has been removed
- More intuitive / shorter filling aliases ffill (for pad) and bfill (for backfill) have been added to the functions that use them: reindex, asfreq, fillna.
- pandas.core.mixins code moved to pandas.core.generic
- buffer keyword arguments (e.g. DataFrame.toString) renamed to buf to avoid using Python built-in name
- DataFrame.rows() removed (use DataFrame.index)
- Added deprecation warning to DataFrame.cols(), to be removed in next release
- DataFrame deprecations and de-camelCasing: merge, asMatrix, toDataMatrix, _firstTimeWithValue, _lastTimeWithValue, toRecords, fromRecords, tgroupby, toString
- pandas.io.parsers method deprecations
- parseCSV is now read_csv and keyword arguments have been de-camelCased
- parseText is now read_table
- parseExcel is replaced by the ExcelFile class and its parse method
- fillMethod arguments (deprecated in prior release) removed, should be replaced with method
- Series.fill, DataFrame.fill, and Panel.fill removed, use fillna instead
- groupby functions now exclude NA / NaN values from the list of groups. This matches R behavior with NAs in factors e.g. with the tapply function
- Removed parseText, parseCSV and parseExcel from pandas namespace
- Series.combineFunc renamed to Series.combine and made a bit more general with a fill_value keyword argument defaulting to NaN
- Removed pandas.core.pytools module. Code has been moved to pandas.core.common
- Tacked on groupName attribute for groups in GroupBy renamed to name
- Panel/LongPanel dims attribute renamed to shape to be more conformant
- Slicing a Series returns a view now
- More Series deprecations / renaming: toCSV to to_csv, asOf to asof, merge to map, applymap to apply, toDict to to_dict, combineFirst to combine_first. Will print FutureWarning.
- DataFrame.to_csv does not write an “index” column label by default
anymore since the output file can be read back without it. However, there
is a new
index_label
argument. So you can doindex_label='index'
to emulate the old behavior - datetools.Week argument renamed from dayOfWeek to weekday
- timeRule argument in shift has been deprecated in favor of using the offset argument for everything. So you can still pass a time rule string to offset
- Added optional encoding argument to read_csv, read_table, to_csv, from_csv to handle unicode in python 2.x
Bug Fixes¶
- Column ordering in pandas.io.parsers.parseCSV will match CSV in the presence of mixed-type data
- Fixed handling of Excel 2003 dates in pandas.io.parsers
- DateRange caching was happening with high resolution DateOffset objects, e.g. DateOffset(seconds=1). This has been fixed
- Fixed __truediv__ issue in DataFrame
- Fixed DataFrame.toCSV bug preventing IO round trips in some cases
- Fixed bug in Series.plot causing matplotlib to barf in exceptional cases
- Disabled Index objects from being hashable, like ndarrays
- Added __ne__ implementation to Index so that operations like ts[ts != idx] will work
- Added __ne__ implementation to DataFrame
- Bug / unintuitive result when calling fillna on unordered labels
- Bug calling sum on boolean DataFrame
- Bug fix when creating a DataFrame from a dict with scalar values
- Series.{sum, mean, std, ...} now return NA/NaN when the whole Series is NA
- NumPy 1.4 through 1.6 compatibility fixes
- Fixed bug in bias correction in rolling_cov, was affecting rolling_corr too
- R-square value was incorrect in the presence of fixed and time effects in the PanelOLS classes
- HDFStore can handle duplicates in table format, will take
Thanks¶
- Joon Ro
- Michael Pennington
- Chris Uga
- Chris Withers
- Jeff Reback
- Ted Square
- Craig Austin
- William Ferreira
- Daniel Fortunov
- Tony Roberts
- Martin Felder
- John Marino
- Tim McNamara
- Justin Berka
- Dieter Vandenbussche
- Shane Conway
- Skipper Seabold
- Chris Jordan-Squire
pandas 0.3.0¶
Release date: February 20, 2011
New features¶
- corrwith function to compute column- or row-wise correlations between two DataFrame objects
- Can boolean-index DataFrame objects, e.g. df[df > 2] = 2, px[px > last_px] = 0
- Added comparison magic methods (__lt__, __gt__, etc.)
- Flexible explicit arithmetic methods (add, mul, sub, div, etc.)
- Added reindex_like method
- Added reindex_like method to WidePanel
- Convenience functions for accessing SQL-like databases in pandas.io.sql module
- Added (still experimental) HDFStore class for storing pandas data structures using HDF5 / PyTables in pandas.io.pytables module
- Added WeekOfMonth date offset
- pandas.rpy (experimental) module created, provide some interfacing / conversion between rpy2 and pandas
Improvements to existing features¶
- Unit test coverage: 100% line coverage of core data structures
- Speed enhancement to rolling_{median, max, min}
- Column ordering between DataFrame and DataMatrix is now consistent: before DataFrame would not respect column order
- Improved {Series, DataFrame}.plot methods to be more flexible (can pass matplotlib Axis arguments, plot DataFrame columns in multiple subplots, etc.)
API Changes¶
- Exponentially-weighted moment functions in pandas.stats.moments have a more consistent API and accept a min_periods argument like their regular moving counterparts.
- fillMethod argument in Series, DataFrame changed to method, FutureWarning added.
- fill method in Series, DataFrame/DataMatrix, WidePanel renamed to fillna, FutureWarning added to fill
- Renamed DataFrame.getXS to xs, FutureWarning added
- Removed cap and floor functions from DataFrame, renamed to clip_upper and clip_lower for consistency with NumPy
Bug Fixes¶
- Fixed bug in IndexableSkiplist Cython code that was breaking rolling_max function
- Numerous numpy.int64-related indexing fixes
- Several NumPy 1.4.0 NaN-handling fixes
- Bug fixes to pandas.io.parsers.parseCSV
- Fixed DateRange caching issue with unusual date offsets
- Fixed bug in DateRange.union
- Fixed corner case in IndexableSkiplist implementation