Release Notes¶
This is the list of changes to pandas between each release. For full details, see the commit logs at http://github.com/pandas-dev/pandas
What is it
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.
Where to get it
- Source code: http://github.com/pandas-dev/pandas
- Binary installers on PyPI: http://pypi.python.org/pypi/pandas
- Documentation: http://pandas.pydata.org
pandas 0.20.0 / 0.20.1¶
Release date: May 5, 2017
This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- New
.agg()
API for Series/DataFrame similar to the groupby-rolling-resample API’s, see here - Integration with the
feather-format
, including a new top-levelpd.read_feather()
andDataFrame.to_feather()
method, see here. - The
.ix
indexer has been deprecated, see here Panel
has been deprecated, see here- Addition of an
IntervalIndex
andInterval
scalar type, see here - Improved user API when grouping by index levels in
.groupby()
, see here - Improved support for
UInt64
dtypes, see here - A new orient for JSON serialization,
orient='table'
, that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see here - Experimental support for exporting styled DataFrames (
DataFrame.style
) to Excel, see here - Window binary corr/cov operations now return a MultiIndexed
DataFrame
rather than aPanel
, asPanel
is now deprecated, see here - Support for S3 handling now uses
s3fs
, see here - Google BigQuery support now uses the
pandas-gbq
library, see here
See the v0.20.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.20.1.
Note
This is a combined release for 0.20.0 and and 0.20.1.
Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas’ utils
routines. (GH16250)
Thanks¶
- abaldenko
- Adam J. Stewart
- Adrian
- adrian-stepien
- Ajay Saxena
- Akash Tandon
- Albert Villanova del Moral
- Aleksey Bilogur
- alexandercbooth
- Alexis Mignon
- Amol Kahat
- Andreas Winkler
- Andrew Kittredge
- Anthonios Partheniou
- Arco Bast
- Ashish Singal
- atbd
- bastewart
- Baurzhan Muftakhidinov
- Ben Kandel
- Ben Thayer
- Ben Welsh
- Bill Chambers
- bmagnusson
- Brandon M. Burroughs
- Brian
- Brian McFee
- carlosdanielcsantos
- Carlos Souza
- chaimdemulder
- Chris
- chris-b1
- Chris Ham
- Christopher C. Aycock
- Christoph Gohlke
- Christoph Paulik
- Chris Warth
- Clemens Brunner
- DaanVanHauwermeiren
- Daniel Himmelstein
- Dave Willmer
- David Cook
- David Gwynne
- David Hoffman
- David Krych
- dickreuter
- Diego Fernandez
- Dimitris Spathis
- discort
- Dmitry L
- Dody Suria Wijaya
- Dominik Stanczak
- Dr-Irv
- Dr. Irv
- dr-leo
- D.S. McNeil
- dubourg
- dwkenefick
- Elliott Sales de Andrade
- Ennemoser Christoph
- Francesc Alted
- Fumito Hamamura
- funnycrab
- gfyoung
- Giacomo Ferroni
- goldenbull
- Graham R. Jeffries
- Greg Williams
- Guilherme Beltramini
- Guilherme Samora
- Hao Wu
- Harshit Patni
- hesham.shabana@hotmail.com
- Ilya V. Schurov
- Iván Vallés Pérez
- Jackie Leng
- Jaehoon Hwang
- James Draper
- James Goppert
- James McBride
- James Santucci
- Jan Schulz
- Jeff Carey
- Jeff Reback
- JennaVergeynst
- Jim
- Jim Crist
- Joe Jevnik
- Joel Nothman
- John
- John Tucker
- John W. O’Brien
- John Zwinck
- jojomdt
- Jonathan de Bruin
- Jonathan Whitmore
- Jon Mease
- Jon M. Mease
- Joost Kranendonk
- Joris Van den Bossche
- Joshua Bradt
- Julian Santander
- Julien Marrec
- Jun Kim
- Justin Solinsky
- Kacawi
- Kamal Kamalaldin
- Kerby Shedden
- Kernc
- Keshav Ramaswamy
- Kevin Sheppard
- Kyle Kelley
- Larry Ren
- Leon Yin
- linebp
- Line Pedersen
- Lorenzo Cestaro
- Luca Scarabello
- Lukasz
- Mahmoud Lababidi
- manu
- manuels
- Mark Mandel
- Matthew Brett
- Matthew Roeschke
- mattip
- Matti Picus
- Matt Roeschke
- maxalbert
- Maximilian Roos
- mcocdawc
- Michael Charlton
- Michael Felt
- Michael Lamparski
- Michiel Stock
- Mikolaj Chwalisz
- Min RK
- Miroslav Šedivý
- Mykola Golubyev
- Nate Yoder
- Nathalie Rud
- Nicholas Ver Halen
- Nick Chmura
- Nolan Nichols
- nuffe
- Pankaj Pandey
- paul-mannino
- Pawel Kordek
- pbreach
- Pete Huang
- Peter
- Peter Csizsek
- Petio Petrov
- Phil Ruffwind
- Pietro Battiston
- Piotr Chromiec
- Prasanjit Prakash
- Robert Bradshaw
- Rob Forgione
- Robin
- Rodolfo Fernandez
- Roger Thomas
- Rouz Azari
- Sahil Dua
- sakkemo
- Sam Foo
- Sami Salonen
- Sarah Bird
- Sarma Tangirala
- scls19fr
- Scott Sanderson
- Sebastian Bank
- Sebastian Gsänger
- Sébastien de Menten
- Shawn Heide
- Shyam Saladi
- sinhrks
- Sinhrks
- Stephen Rauch
- stijnvanhoey
- Tara Adiseshan
- themrmax
- the-nose-knows
- Thiago Serafim
- Thoralf Gutierrez
- Thrasibule
- Tobias Gustafsson
- Tom Augspurger
- tomrod
- Tong Shen
- Tong SHEN
- TrigonaMinima
- tzinckgraf
- Uwe
- wandersoncferreira
- watercrossing
- wcwagner
- Wes Turner
- Wiktor Tomczak
- WillAyd
- xgdgsc
- Yaroslav Halchenko
- Yimeng Zhang
- yui-knk
pandas 0.19.2¶
Release date: December 24, 2016
This is a minor bug-fix release in the 0.19.x series and includes some small regression fixes, bug fixes and performance improvements.
Highlights include:
- Compatibility with Python 3.6
- Added a Pandas Cheat Sheet. (GH13202).
See the v0.19.2 Whatsnew page for an overview of all bugs that have been fixed in 0.19.2.
Thanks¶
- Ajay Saxena
- Ben Kandel
- Chris
- Chris Ham
- Christopher C. Aycock
- Daniel Himmelstein
- Dave Willmer
- Dr-Irv
- gfyoung
- hesham shabana
- Jeff Carey
- Jeff Reback
- Joe Jevnik
- Joris Van den Bossche
- Julian Santander
- Kerby Shedden
- Keshav Ramaswamy
- Kevin Sheppard
- Luca Scarabello
- Matti Picus
- Matt Roeschke
- Maximilian Roos
- Mykola Golubyev
- Nate Yoder
- Nicholas Ver Halen
- Pawel Kordek
- Pietro Battiston
- Rodolfo Fernandez
- sinhrks
- Tara Adiseshan
- Tom Augspurger
- wandersoncferreira
- Yaroslav Halchenko
pandas 0.19.1¶
Release date: November 3, 2016
This is a minor bug-fix release from 0.19.0 and includes some small regression fixes, bug fixes and performance improvements.
See the v0.19.1 Whatsnew page for an overview of all bugs that have been fixed in 0.19.1.
Thanks¶
- Adam Chainz
- Anthonios Partheniou
- Arash Rouhani
- Ben Kandel
- Brandon M. Burroughs
- Chris
- chris-b1
- Chris Warth
- David Krych
- dubourg
- gfyoung
- Iván Vallés Pérez
- Jeff Reback
- Joe Jevnik
- Jon M. Mease
- Joris Van den Bossche
- Josh Owen
- Keshav Ramaswamy
- Larry Ren
- mattrijk
- Michael Felt
- paul-mannino
- Piotr Chromiec
- Robert Bradshaw
- Sinhrks
- Thiago Serafim
- Tom Bird
pandas 0.19.0¶
Release date: October 2, 2016
This is a major release from 0.18.1 and includes number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
merge_asof()
for asof-style time-series joining, see here.rolling()
is now time-series aware, see hereread_csv()
now supports parsingCategorical
data, see here- A function
union_categorical()
has been added for combining categoricals, see here PeriodIndex
now has its ownperiod
dtype, and changed to be more consistent with otherIndex
classes. See here- Sparse data structures gained enhanced support of
int
andbool
dtypes, see here - Comparison operations with
Series
no longer ignores the index, see here for an overview of the API changes. - Introduction of a pandas development API for utility functions, see here.
- Deprecation of
Panel4D
andPanelND
. We recommend to represent these types of n-dimensional data with the xarray package. - Removal of the previously deprecated modules
pandas.io.data
,pandas.io.wb
,pandas.tools.rplot
.
See the v0.19.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.19.0.
Thanks¶
- adneu
- Adrien Emery
- agraboso
- Alex Alekseyev
- Alex Vig
- Allen Riddell
- Amol
- Amol Agrawal
- Andy R. Terrel
- Anthonios Partheniou
- babakkeyvani
- Ben Kandel
- Bob Baxley
- Brett Rosen
- c123w
- Camilo Cota
- Chris
- chris-b1
- Chris Grinolds
- Christian Hudon
- Christopher C. Aycock
- Chris Warth
- cmazzullo
- conquistador1492
- cr3
- Daniel Siladji
- Douglas McNeil
- Drewrey Lupton
- dsm054
- Eduardo Blancas Reyes
- Elliot Marsden
- Evan Wright
- Felix Marczinowski
- Francis T. O’Donovan
- Gábor Lipták
- Geraint Duck
- gfyoung
- Giacomo Ferroni
- Grant Roch
- Haleemur Ali
- harshul1610
- Hassan Shamim
- iamsimha
- Iulius Curt
- Ivan Nazarov
- jackieleng
- Jeff Reback
- Jeffrey Gerard
- Jenn Olsen
- Jim Crist
- Joe Jevnik
- John Evans
- John Freeman
- John Liekezer
- Johnny Gill
- John W. O’Brien
- John Zwinck
- Jordan Erenrich
- Joris Van den Bossche
- Josh Howes
- Jozef Brandys
- Kamil Sindi
- Ka Wo Chen
- Kerby Shedden
- Kernc
- Kevin Sheppard
- Matthieu Brucher
- Maximilian Roos
- Michael Scherer
- Mike Graham
- Mortada Mehyar
- mpuels
- Muhammad Haseeb Tariq
- Nate George
- Neil Parley
- Nicolas Bonnotte
- OXPHOS
- Pan Deng / Zora
- Paul
- Pauli Virtanen
- Paul Mestemaker
- Pawel Kordek
- Pietro Battiston
- pijucha
- Piotr Jucha
- priyankjain
- Ravi Kumar Nimmi
- Robert Gieseke
- Robert Kern
- Roger Thomas
- Roy Keyes
- Russell Smith
- Sahil Dua
- Sanjiv Lobo
- Sašo Stanovnik
- Shawn Heide
- sinhrks
- Sinhrks
- Stephen Kappel
- Steve Choi
- Stewart Henderson
- Sudarshan Konge
- Thomas A Caswell
- Tom Augspurger
- Tom Bird
- Uwe Hoffmann
- wcwagner
- WillAyd
- Xiang Zhang
- Yadunandan
- Yaroslav Halchenko
- YG-Riku
- Yuichiro Kaneko
- yui-knk
- zhangjinjie
- znmean
- 颜发才(Yan Facai)
pandas 0.18.1¶
Release date: (May 3, 2016)
This is a minor release from 0.18.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
.groupby(...)
has been enhanced to provide convenient syntax when working with.rolling(..)
,.expanding(..)
and.resample(..)
per group, see herepd.to_datetime()
has gained the ability to assemble dates from aDataFrame
, see here- Method chaining improvements, see here.
- Custom business hour offset, see here.
- Many bug fixes in the handling of
sparse
, see here - Expanded the Tutorials section with a feature on modern pandas, courtesy of @TomAugsburger. (GH13045).
See the v0.18.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.18.1.
Thanks¶
- Andrew Fiore-Gartland
- Bastiaan
- Benoît Vinot
- Brandon Rhodes
- DaCoEx
- Drew Fustin
- Ernesto Freitas
- Filip Ter
- Gregory Livschitz
- Gábor Lipták
- Hassan Kibirige
- Iblis Lin
- Israel Saeta Pérez
- Jason Wolosonovich
- Jeff Reback
- Joe Jevnik
- Joris Van den Bossche
- Joshua Storck
- Ka Wo Chen
- Kerby Shedden
- Kieran O’Mahony
- Leif Walsh
- Mahmoud Lababidi
- Maoyuan Liu
- Mark Roth
- Matt Wittmann
- MaxU
- Maximilian Roos
- Michael Droettboom
- Nick Eubank
- Nicolas Bonnotte
- OXPHOS
- Pauli Virtanen
- Peter Waller
- Pietro Battiston
- Prabhjot Singh
- Robin Wilson
- Roger Thomas
- Sebastian Bank
- Stephen Hoover
- Tim Hopper
- Tom Augspurger
- WANG Aiyong
- Wes Turner
- Winand
- Xbar
- Yan Facai
- adneu
- ajenkins-cargometrics
- behzad nouri
- chinskiy
- gfyoung
- jeps-journal
- jonaslb
- kotrfa
- nileracecrew
- onesandzeroes
- rs2
- sinhrks
- tsdlovell
pandas 0.18.0¶
Release date: (March 13, 2016)
This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Moving and expanding window functions are now methods on Series and DataFrame,
similar to
.groupby
, see here. - Adding support for a
RangeIndex
as a specialized form of theInt64Index
for memory savings, see here. - API breaking change to the
.resample
method to make it more.groupby
like, see here. - Removal of support for positional indexing with floats, which was deprecated
since 0.14.0. This will now raise a
TypeError
, see here. - The
.to_xarray()
function has been added for compatibility with the xarray package, see here. - The
read_sas
function has been enhanced to readsas7bdat
files, see here. - Addition of the .str.extractall() method, and API changes to the .str.extract() method and .str.cat() method.
pd.test()
top-level nose test runner is available (GH4327).
See the v0.18.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.18.0.
Thanks¶
- ARF
- Alex Alekseyev
- Andrew McPherson
- Andrew Rosenfeld
- Anthonios Partheniou
- Anton I. Sipos
- Ben
- Ben North
- Bran Yang
- Chris
- Chris Carroux
- Christopher C. Aycock
- Christopher Scanlin
- Cody
- Da Wang
- Daniel Grady
- Dorozhko Anton
- Dr-Irv
- Erik M. Bray
- Evan Wright
- Francis T. O’Donovan
- Frank Cleary
- Gianluca Rossi
- Graham Jeffries
- Guillaume Horel
- Henry Hammond
- Isaac Schwabacher
- Jean-Mathieu Deschenes
- Jeff Reback
- Joe Jevnik
- John Freeman
- John Fremlin
- Jonas Hoersch
- Joris Van den Bossche
- Joris Vankerschaver
- Justin Lecher
- Justin Lin
- Ka Wo Chen
- Keming Zhang
- Kerby Shedden
- Kyle
- Marco Farrugia
- MasonGallo
- MattRijk
- Matthew Lurie
- Maximilian Roos
- Mayank Asthana
- Mortada Mehyar
- Moussa Taifi
- Navreet Gill
- Nicolas Bonnotte
- Paul Reiners
- Philip Gura
- Pietro Battiston
- RahulHP
- Randy Carnevale
- Rinoc Johnson
- Rishipuri
- Sangmin Park
- Scott E Lasley
- Sereger13
- Shannon Wang
- Skipper Seabold
- Thierry Moisan
- Thomas A Caswell
- Toby Dylan Hocking
- Tom Augspurger
- Travis
- Trent Hauck
- Tux1
- Varun
- Wes McKinney
- Will Thompson
- Yoav Ram
- Yoong Kang Lim
- Yoshiki Vázquez Baeza
- Young Joong Kim
- Younggun Kim
- Yuval Langer
- alex argunov
- behzad nouri
- boombard
- brian-pantano
- chromy
- daniel
- dgram0
- gfyoung
- hack-c
- hcontrast
- jfoo
- kaustuv deolal
- llllllllll
- ranarag
- rockg
- scls19fr
- seales
- sinhrks
- srib
- surveymedia.ca
- tworec
pandas 0.17.1¶
Release date: (November 21, 2015)
This is a minor release from 0.17.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
- Support for Conditional HTML Formatting, see here
- Releasing the GIL on the csv reader & other ops, see here
- Regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (GH11376)
See the v0.17.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.17.1.
Thanks¶
- Aleksandr Drozd
- Alex Chase
- Anthonios Partheniou
- BrenBarn
- Brian J. McGuirk
- Chris
- Christian Berendt
- Christian Perez
- Cody Piersall
- Data & Code Expert Experimenting with Code on Data
- DrIrv
- Evan Wright
- Guillaume Gay
- Hamed Saljooghinejad
- Iblis Lin
- Jake VanderPlas
- Jan Schulz
- Jean-Mathieu Deschenes
- Jeff Reback
- Jimmy Callin
- Joris Van den Bossche
- K.-Michael Aye
- Ka Wo Chen
- Loïc Séguin-C
- Luo Yicheng
- Magnus Jöud
- Manuel Leonhardt
- Matthew Gilbert
- Maximilian Roos
- Michael
- Nicholas Stahl
- Nicolas Bonnotte
- Pastafarianist
- Petra Chong
- Phil Schaf
- Philipp A
- Rob deCarvalho
- Roman Khomenko
- Rémy Léone
- Sebastian Bank
- Thierry Moisan
- Tom Augspurger
- Tux1
- Varun
- Wieland Hoffmann
- Winterflower
- Yoav Ram
- Younggun Kim
- Zeke
- ajcr
- azuranski
- behzad nouri
- cel4
- emilydolson
- hironow
- lexual
- llllllllll
- rockg
- silentquasar
- sinhrks
- taeold
pandas 0.17.0¶
Release date: (October 9, 2015)
This is a major release from 0.16.2 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Release the Global Interpreter Lock (GIL) on some cython operations, see here
- Plotting methods are now available as attributes of the
.plot
accessor, see here - The sorting API has been revamped to remove some long-time inconsistencies, see here
- Support for a
datetime64[ns]
with timezones as a first-class dtype, see here - The default for
to_datetime
will now be toraise
when presented with unparseable formats, previously this would return the original input. Also, date parse functions now return consistent results. See here - The default for
dropna
inHDFStore
has changed toFalse
, to store by default all rows even if they are allNaN
, see here - Datetime accessor (
dt
) now supportsSeries.dt.strftime
to generate formatted strings for datetime-likes, andSeries.dt.total_seconds
to generate each duration of the timedelta in seconds. See here Period
andPeriodIndex
can handle multiplied freq like3D
, which corresponding to 3 days span. See here- Development installed versions of pandas will now have
PEP440
compliant version strings (GH9518) - Development support for benchmarking with the Air Speed Velocity library (GH8316)
- Support for reading SAS xport files, see here
- Documentation comparing SAS to pandas, see here
- Removal of the automatic TimeSeries broadcasting, deprecated since 0.8.0, see here
- Display format with plain text can optionally align with Unicode East Asian Width, see here
- Compatibility with Python 3.5 (GH11097)
- Compatibility with matplotlib 1.5.0 (GH11111)
See the v0.17.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.17.0.
Thanks¶
- Alex Rothberg
- Andrea Bedini
- Andrew Rosenfeld
- Andy Li
- Anthonios Partheniou
- Artemy Kolchinsky
- Bernard Willers
- Charlie Clark
- Chris
- Chris Whelan
- Christoph Gohlke
- Christopher Whelan
- Clark Fitzgerald
- Clearfield Christopher
- Dan Ringwalt
- Daniel Ni
- Data & Code Expert Experimenting with Code on Data
- David Cottrell
- David John Gagne
- David Kelly
- ETF
- Eduardo Schettino
- Egor
- Egor Panfilov
- Evan Wright
- Frank Pinter
- Gabriel Araujo
- Garrett-R
- Gianluca Rossi
- Guillaume Gay
- Guillaume Poulin
- Harsh Nisar
- Ian Henriksen
- Ian Hoegen
- Jaidev Deshpande
- Jan Rudolph
- Jan Schulz
- Jason Swails
- Jeff Reback
- Jonas Buyl
- Joris Van den Bossche
- Joris Vankerschaver
- Josh Levy-Kramer
- Julien Danjou
- Ka Wo Chen
- Karrie Kehoe
- Kelsey Jordahl
- Kerby Shedden
- Kevin Sheppard
- Lars Buitinck
- Leif Johnson
- Luis Ortiz
- Mac
- Matt Gambogi
- Matt Savoie
- Matthew Gilbert
- Maximilian Roos
- Michelangelo D’Agostino
- Mortada Mehyar
- Nick Eubank
- Nipun Batra
- Ondřej Čertík
- Phillip Cloud
- Pratap Vardhan
- Rafal Skolasinski
- Richard Lewis
- Rinoc Johnson
- Rob Levy
- Robert Gieseke
- Safia Abdalla
- Samuel Denny
- Saumitra Shahapure
- Sebastian Pölsterl
- Sebastian Rubbert
- Sheppard, Kevin
- Sinhrks
- Siu Kwan Lam
- Skipper Seabold
- Spencer Carrucciu
- Stephan Hoyer
- Stephen Hoover
- Stephen Pascoe
- Terry Santegoeds
- Thomas Grainger
- Tjerk Santegoeds
- Tom Augspurger
- Vincent Davis
- Winterflower
- Yaroslav Halchenko
- Yuan Tang (Terry)
- agijsberts
- ajcr
- behzad nouri
- cel4
- cyrusmaher
- davidovitch
- ganego
- jreback
- juricast
- larvian
- maximilianr
- msund
- rekcahpassyla
- robertzk
- scls19fr
- seth-p
- sinhrks
- springcoil
- terrytangyuan
- tzinckgraf
pandas 0.16.2¶
Release date: (June 12, 2015)
This is a minor release from 0.16.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
See the v0.16.2 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.16.2.
Thanks¶
- Andrew Rosenfeld
- Artemy Kolchinsky
- Bernard Willers
- Christer van der Meeren
- Christian Hudon
- Constantine Glen Evans
- Daniel Julius Lasiman
- Evan Wright
- Francesco Brundu
- Gaëtan de Menten
- Jake VanderPlas
- James Hiebert
- Jeff Reback
- Joris Van den Bossche
- Justin Lecher
- Ka Wo Chen
- Kevin Sheppard
- Mortada Mehyar
- Morton Fox
- Robin Wilson
- Thomas Grainger
- Tom Ajamian
- Tom Augspurger
- Yoshiki Vázquez Baeza
- Younggun Kim
- austinc
- behzad nouri
- jreback
- lexual
- rekcahpassyla
- scls19fr
- sinhrks
pandas 0.16.1¶
Release date: (May 11, 2015)
This is a minor release from 0.16.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs.
See the v0.16.1 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.16.1.
Thanks¶
- Alfonso MHC
- Andy Hayden
- Artemy Kolchinsky
- Chris Gilmer
- Chris Grinolds
- Dan Birken
- David BROCHART
- David Hirschfeld
- David Stephens
- Dr. Leo
- Evan Wright
- Frans van Dunné
- Hatem Nassrat
- Henning Sperr
- Hugo Herter
- Jan Schulz
- Jeff Blackburne
- Jeff Reback
- Jim Crist
- Jonas Abernot
- Joris Van den Bossche
- Kerby Shedden
- Leo Razoumov
- Manuel Riel
- Mortada Mehyar
- Nick Burns
- Nick Eubank
- Olivier Grisel
- Phillip Cloud
- Pietro Battiston
- Roy Hyunjin Han
- Sam Zhang
- Scott Sanderson
- Stephan Hoyer
- Tiago Antao
- Tom Ajamian
- Tom Augspurger
- Tomaz Berisa
- Vikram Shirgur
- Vladimir Filimonov
- William Hogman
- Yasin A
- Younggun Kim
- behzad nouri
- dsm054
- floydsoft
- flying-sheep
- gfr
- jnmclarty
- jreback
- ksanghai
- lucas
- mschmohl
- ptype
- rockg
- scls19fr
- sinhrks
pandas 0.16.0¶
Release date: (March 22, 2015)
This is a major release from 0.15.2 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
DataFrame.assign
method, see hereSeries.to_coo/from_coo
methods to interact withscipy.sparse
, see here- Backwards incompatible change to
Timedelta
to conform the.seconds
attribute withdatetime.timedelta
, see here - Changes to the
.loc
slicing API to conform with the behavior of.ix
see here - Changes to the default for ordering in the
Categorical
constructor, see here - The
pandas.tools.rplot
,pandas.sandbox.qtpandas
andpandas.rpy
modules are deprecated. We refer users to external packages like seaborn, pandas-qt and rpy2 for similar or equivalent functionality, see here
See the v0.16.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.16.0.
Thanks¶
- Aaron Toth
- Alan Du
- Alessandro Amici
- Artemy Kolchinsky
- Ashwini Chaudhary
- Ben Schiller
- Bill Letson
- Brandon Bradley
- Chau Hoang
- Chris Reynolds
- Chris Whelan
- Christer van der Meeren
- David Cottrell
- David Stephens
- Ehsan Azarnasab
- Garrett-R
- Guillaume Gay
- Jake Torcasso
- Jason Sexauer
- Jeff Reback
- John McNamara
- Joris Van den Bossche
- Joschka zur Jacobsmühlen
- Juarez Bochi
- Junya Hayashi
- K.-Michael Aye
- Kerby Shedden
- Kevin Sheppard
- Kieran O’Mahony
- Kodi Arfer
- Matti Airas
- Min RK
- Mortada Mehyar
- Robert
- Scott E Lasley
- Scott Lasley
- Sergio Pascual
- Skipper Seabold
- Stephan Hoyer
- Thomas Grainger
- Tom Augspurger
- TomAugspurger
- Vladimir Filimonov
- Vyomkesh Tripathi
- Will Holmgren
- Yulong Yang
- behzad nouri
- bertrandhaut
- bjonen
- cel4
- clham
- hsperr
- ischwabacher
- jnmclarty
- josham
- jreback
- omtinez
- roch
- sinhrks
- unutbu
pandas 0.15.2¶
Release date: (December 12, 2014)
This is a minor release from 0.15.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs.
See the v0.15.2 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.2.
Thanks¶
- Aaron Staple
- Angelos Evripiotis
- Artemy Kolchinsky
- Benoit Pointet
- Brian Jacobowski
- Charalampos Papaloizou
- Chris Warth
- David Stephens
- Fabio Zanini
- Francesc Via
- Henry Kleynhans
- Jake VanderPlas
- Jan Schulz
- Jeff Reback
- Jeff Tratner
- Joris Van den Bossche
- Kevin Sheppard
- Matt Suggit
- Matthew Brett
- Phillip Cloud
- Rupert Thompson
- Scott E Lasley
- Stephan Hoyer
- Stephen Simmons
- Sylvain Corlay
- Thomas Grainger
- Tiago Antao
- Trent Hauck
- Victor Chaves
- Victor Salgado
- Vikram Bhandoh
- WANG Aiyong
- Will Holmgren
- behzad nouri
- broessli
- charalampos papaloizou
- immerrr
- jnmclarty
- jreback
- mgilbert
- onesandzeroes
- peadarcoyle
- rockg
- seth-p
- sinhrks
- unutbu
- wavedatalab
- Åsmund Hjulstad
pandas 0.15.1¶
Release date: (November 9, 2014)
This is a minor release from 0.15.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
See the v0.15.1 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.1.
Thanks¶
- Aaron Staple
- Andrew Rosenfeld
- Anton I. Sipos
- Artemy Kolchinsky
- Bill Letson
- Dave Hughes
- David Stephens
- Guillaume Horel
- Jeff Reback
- Joris Van den Bossche
- Kevin Sheppard
- Nick Stahl
- Sanghee Kim
- Stephan Hoyer
- TomAugspurger
- WANG Aiyong
- behzad nouri
- immerrr
- jnmclarty
- jreback
- pallav-fdsi
- unutbu
pandas 0.15.0¶
Release date: (October 18, 2014)
This is a major release from 0.14.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- Drop support for numpy < 1.7.0 (GH7711)
- The
Categorical
type was integrated as a first-class pandas type, see here - New scalar type
Timedelta
, and a new index typeTimedeltaIndex
, see here - New DataFrame default display for
df.info()
to include memory usage, see Memory Usage - New datetimelike properties accessor
.dt
for Series, see Datetimelike Properties - Split indexing documentation into Indexing and Selecting Data and MultiIndex / Advanced Indexing
- Split out string methods documentation into Working with Text Data
read_csv
will now by default ignore blank lines when parsing, see here- API change in using Indexes in set operations, see here
- Internal refactoring of the
Index
class to no longer sub-classndarray
, see Internal Refactoring - dropping support for
PyTables
less than version 3.0.0, andnumexpr
less than version 2.1 (GH7990)
See the v0.15.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.0.
Thanks¶
- Aaron Schumacher
- Adam Greenhall
- Andy Hayden
- Anthony O’Brien
- Artemy Kolchinsky
- behzad nouri
- Benedikt Sauer
- benjamin
- Benjamin Thyreau
- Ben Schiller
- bjonen
- BorisVerk
- Chris Reynolds
- Chris Stoafer
- Dav Clark
- dlovell
- DSM
- dsm054
- FragLegs
- German Gomez-Herrero
- Hsiaoming Yang
- Huan Li
- hunterowens
- Hyungtae Kim
- immerrr
- Isaac Slavitt
- ischwabacher
- Jacob Schaer
- Jacob Wasserman
- Jan Schulz
- Jeff Tratner
- Jesse Farnham
- jmorris0x0
- jnmclarty
- Joe Bradish
- Joerg Rittinger
- John W. O’Brien
- Joris Van den Bossche
- jreback
- Kevin Sheppard
- klonuo
- Kyle Meyer
- lexual
- Max Chang
- mcjcode
- Michael Mueller
- Michael W Schatzow
- Mike Kelly
- Mortada Mehyar
- mtrbean
- Nathan Sanders
- Nathan Typanski
- onesandzeroes
- Paul Masurel
- Phillip Cloud
- Pietro Battiston
- RenzoBertocchi
- rockg
- Ross Petchler
- seth-p
- Shahul Hameed
- Shashank Agarwal
- sinhrks
- someben
- stahlous
- stas-sl
- Stephan Hoyer
- thatneat
- tom-alcorn
- TomAugspurger
- Tom Augspurger
- Tony Lorenzo
- unknown
- unutbu
- Wes Turner
- Wilfred Hughes
- Yevgeniy Grechka
- Yoshiki Vázquez Baeza
- zachcp
pandas 0.14.1¶
Release date: (July 11, 2014)
This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- New methods
select_dtypes()
to select columns based on the dtype andsem()
to calculate the standard error of the mean. - Support for dateutil timezones (see docs).
- Support for ignoring full line comments in the
read_csv()
text parser. - New documentation section on Options and Settings.
- Lots of bug fixes.
See the v0.14.1 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.14.1.
Thanks¶
- Andrew Rosenfeld
- Andy Hayden
- Benjamin Adams
- Benjamin M. Gross
- Brian Quistorff
- Brian Wignall
- bwignall
- clham
- Daniel Waeber
- David Bew
- David Stephens
- DSM
- dsm054
- helger
- immerrr
- Jacob Schaer
- jaimefrio
- Jan Schulz
- John David Reaver
- John W. O’Brien
- Joris Van den Bossche
- jreback
- Julien Danjou
- Kevin Sheppard
- K.-Michael Aye
- Kyle Meyer
- lexual
- Matthew Brett
- Matt Wittmann
- Michael Mueller
- Mortada Mehyar
- onesandzeroes
- Phillip Cloud
- Rob Levy
- rockg
- sanguineturtle
- Schaer, Jacob C
- seth-p
- sinhrks
- Stephan Hoyer
- Thomas Kluyver
- Todd Jennings
- TomAugspurger
- unknown
- yelite
pandas 0.14.0¶
Release date: (May 31, 2014)
This is a major release from 0.13.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- Officially support Python 3.4
- SQL interfaces updated to use
sqlalchemy
, see here. - Display interface changes, see here
- MultiIndexing using Slicers, see here.
- Ability to join a singly-indexed DataFrame with a multi-indexed DataFrame, see here
- More consistency in groupby results and more flexible groupby specifications, see here
- Holiday calendars are now supported in
CustomBusinessDay
, see here - Several improvements in plotting functions, including: hexbin, area and pie plots, see here.
- Performance doc section on I/O operations, see here
See the v0.14.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.14.0.
Thanks¶
- Acanthostega
- Adam Marcus
- agijsberts
- akittredge
- Alex Gaudio
- Alex Rothberg
- AllenDowney
- Andrew Rosenfeld
- Andy Hayden
- ankostis
- anomrake
- Antoine Mazières
- anton-d
- bashtage
- Benedikt Sauer
- benjamin
- Brad Buran
- bwignall
- cgohlke
- chebee7i
- Christopher Whelan
- Clark Fitzgerald
- clham
- Dale Jung
- Dan Allan
- Dan Birken
- danielballan
- Daniel Waeber
- David Jung
- David Stephens
- Douglas McNeil
- DSM
- Garrett Drapala
- Gouthaman Balaraman
- Guillaume Poulin
- hshimizu77
- hugo
- immerrr
- ischwabacher
- Jacob Howard
- Jacob Schaer
- jaimefrio
- Jason Sexauer
- Jeff Reback
- Jeffrey Starr
- Jeff Tratner
- John David Reaver
- John McNamara
- John W. O’Brien
- Jonathan Chambers
- Joris Van den Bossche
- jreback
- jsexauer
- Julia Evans
- Júlio
- Katie Atkinson
- kdiether
- Kelsey Jordahl
- Kevin Sheppard
- K.-Michael Aye
- Matthias Kuhn
- Matt Wittmann
- Max Grender-Jones
- Michael E. Gruen
- michaelws
- mikebailey
- Mike Kelly
- Nipun Batra
- Noah Spies
- ojdo
- onesandzeroes
- Patrick O’Keeffe
- phaebz
- Phillip Cloud
- Pietro Battiston
- PKEuS
- Randy Carnevale
- ribonoous
- Robert Gibboni
- rockg
- sinhrks
- Skipper Seabold
- SplashDance
- Stephan Hoyer
- Tim Cera
- Tobias Brandt
- Todd Jennings
- TomAugspurger
- Tom Augspurger
- unutbu
- westurner
- Yaroslav Halchenko
- y-p
- zach powers
pandas 0.13.1¶
Release date: (February 3, 2014)
API Changes¶
Series.sort
will raise aValueError
(rather than aTypeError
) on sorting an object that is a view of another (GH5856, GH5853)- Raise/Warn
SettingWithCopyError
(according to the optionchained_assignment
in more cases, when detecting chained assignment, related (GH5938, GH6025) - DataFrame.head(0) returns self instead of empty frame (GH5846)
autocorrelation_plot
now accepts**kwargs
. (GH5623)convert_objects
now accepts aconvert_timedeltas='coerce'
argument to allow forced dtype conversion of timedeltas (GH5458,:issue:5689)- Add
-NaN
and-nan
to the default set of NA values (GH5952). See NA Values. NDFrame
now has anequals
method. (GH5283)DataFrame.apply
will use thereduce
argument to determine whether aSeries
or aDataFrame
should be returned when theDataFrame
is empty (GH6007).
Experimental Features¶
Improvements to existing features¶
- perf improvements in Series datetime/timedelta binary operations (GH5801)
- option_context context manager now available as top-level API (GH5752)
- df.info() view now display dtype info per column (GH5682)
- df.info() now honors option max_info_rows, disable null counts for large frames (GH5974)
- perf improvements in DataFrame
count/dropna
foraxis=1
- Series.str.contains now has a regex=False keyword which can be faster for plain (non-regex) string patterns. (GH5879)
- support
dtypes
property onSeries/Panel/Panel4D
- extend
Panel.apply
to allow arbitrary functions (rather than only ufuncs) (GH1148) allow multiple axes to be used to operate on slabs of aPanel
- The
ArrayFormatter
fordatetime
andtimedelta64
now intelligently limit precision based on the values in the array (GH3401) pd.show_versions()
is now available for convenience when reporting issues.- perf improvements to Series.str.extract (GH5944)
- perf improvements in
dtypes/ftypes
methods (GH5968) - perf improvements in indexing with object dtypes (GH5968)
- improved dtype inference for
timedelta
like passed to constructors (GH5458, GH5689) - escape special characters when writing to latex (:issue: 5374)
- perf improvements in
DataFrame.apply
(GH6013) pd.read_csv
andpd.to_datetime
learned a newinfer_datetime_format
keyword which greatly improves parsing perf in many cases. Thanks to @lexual for suggesting and @danbirken for rapidly implementing. (GH5490,:issue:6021)- add ability to recognize ‘%p’ format code (am/pm) to date parsers when the specific format is supplied (GH5361)
- Fix performance regression in JSON IO (GH5765)
- performance regression in Index construction from Series (GH6150)
Bug Fixes¶
- Bug in
io.wb.get_countries
not including all countries (GH6008) - Bug in Series replace with timestamp dict (GH5797)
- read_csv/read_table now respects the prefix kwarg (GH5732).
- Bug in selection with missing values via
.ix
from a duplicate indexed DataFrame failing (GH5835) - Fix issue of boolean comparison on empty DataFrames (GH5808)
- Bug in isnull handling
NaT
in an object array (GH5443) - Bug in
to_datetime
when passed anp.nan
or integer datelike and a format string (GH5863) - Bug in groupby dtype conversion with datetimelike (GH5869)
- Regression in handling of empty Series as indexers to Series (GH5877)
- Bug in internal caching, related to (GH5727)
- Testing bug in reading JSON/msgpack from a non-filepath on windows under py3 (GH5874)
- Bug when assigning to .ix[tuple(...)] (GH5896)
- Bug in fully reindexing a Panel (GH5905)
- Bug in idxmin/max with object dtypes (GH5914)
- Bug in
BusinessDay
when adding n days to a date not on offset when n>5 and n%5==0 (GH5890) - Bug in assigning to chained series with a series via ix (GH5928)
- Bug in creating an empty DataFrame, copying, then assigning (GH5932)
- Bug in DataFrame.tail with empty frame (GH5846)
- Bug in propagating metadata on
resample
(GH5862) - Fixed string-representation of
NaT
to be “NaT” (GH5708) - Fixed string-representation for Timestamp to show nanoseconds if present (GH5912)
pd.match
not returning passed sentinelPanel.to_frame()
no longer fails whenmajor_axis
is aMultiIndex
(GH5402).- Bug in
pd.read_msgpack
with inferring aDateTimeIndex
frequency incorrectly (GH5947) - Fixed
to_datetime
for array with both Tz-aware datetimes andNaT
‘s (GH5961) - Bug in rolling skew/kurtosis when passed a Series with bad data (GH5749)
- Bug in scipy
interpolate
methods with a datetime index (GH5975) - Bug in NaT comparison if a mixed datetime/np.datetime64 with NaT were passed (GH5968)
- Fixed bug with
pd.concat
losing dtype information if all inputs are empty (GH5742) - Recent changes in IPython cause warnings to be emitted when using previous versions of pandas in QTConsole, now fixed. If you’re using an older version and need to suppress the warnings, see (GH5922).
- Bug in merging
timedelta
dtypes (GH5695) - Bug in plotting.scatter_matrix function. Wrong alignment among diagonal and off-diagonal plots, see (GH5497).
- Regression in Series with a multi-index via ix (GH6018)
- Bug in Series.xs with a multi-index (GH6018)
- Bug in Series construction of mixed type with datelike and an integer (which should result in object type and not automatic conversion) (GH6028)
- Possible segfault when chained indexing with an object array under numpy 1.7.1 (GH6026, GH6056)
- Bug in setting using fancy indexing a single element with a non-scalar (e.g. a list), (GH6043)
to_sql
did not respectif_exists
(GH4110 GH4304)- Regression in
.get(None)
indexing from 0.12 (GH5652) - Subtle
iloc
indexing bug, surfaced in (GH6059) - Bug with insert of strings into DatetimeIndex (GH5818)
- Fixed unicode bug in to_html/HTML repr (GH6098)
- Fixed missing arg validation in get_options_data (GH6105)
- Bug in assignment with duplicate columns in a frame where the locations are a slice (e.g. next to each other) (GH6120)
- Bug in propogating _ref_locs during construction of a DataFrame with dups index/columns (GH6121)
- Bug in
DataFrame.apply
when using mixed datelike reductions (GH6125) - Bug in
DataFrame.append
when appending a row with different columns (GH6129) - Bug in DataFrame construction with recarray and non-ns datetime dtype (GH6140)
- Bug in
.loc
setitem indexing with a dataframe on rhs, multiple item setting, and a datetimelike (GH6152) - Fixed a bug in
query
/eval
during lexicographic string comparisons (GH6155). - Fixed a bug in
query
where the index of a single-elementSeries
was being thrown away (GH6148). - Bug in
HDFStore
on appending a dataframe with multi-indexed columns to an existing table (GH6167) - Consistency with dtypes in setting an empty DataFrame (GH6171)
- Bug in selecting on a multi-index
HDFStore
even in the presence of under specified column spec (GH6169) - Bug in
nanops.var
withddof=1
and 1 elements would sometimes returninf
rather thannan
on some platforms (GH6136) - Bug in Series and DataFrame bar plots ignoring the
use_index
keyword (GH6209) - Bug in groupby with mixed str/int under python3 fixed;
argsort
was failing (GH6212)
pandas 0.13.0¶
Release date: January 3, 2014
New Features¶
plot(kind='kde')
now accepts the optional parametersbw_method
andind
, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set the bandwidth, and to gkde.evaluate() to specify the indicies at which it is evaluated, respectively. See scipy docs. (GH4298)- Added
isin
method to DataFrame (GH4211) df.to_clipboard()
learned a newexcel
keyword that let’s you paste df data directly into excel (enabled by default). (GH5070).- Clipboard functionality now works with PySide (GH4282)
- New
extract
string method returns regex matches more conveniently (GH4685) - Auto-detect field widths in read_fwf when unspecified (GH4488)
to_csv()
now outputs datetime objects according to a specified format string via thedate_format
keyword (GH4313)- Added
LastWeekOfMonth
DateOffset (GH4637) - Added
cumcount
groupby method (GH4646) - Added
FY5253
, andFY5253Quarter
DateOffsets (GH4511) - Added
mode()
method toSeries
andDataFrame
to get the statistical mode(s) of a column/series. (GH5367)
Experimental Features¶
- The new
eval()
function implements expression evaluation usingnumexpr
behind the scenes. This results in large speedups for complicated expressions involving large DataFrames/Series. DataFrame
has a neweval()
that evaluates an expression in the context of theDataFrame
; allows inline expression assignment- A
query()
method has been added that allows you to select elements of aDataFrame
using a natural query syntax nearly identical to Python syntax. pd.eval
and friends now evaluate operations involvingdatetime64
objects in Python space becausenumexpr
cannot handleNaT
values (GH4897).- Add msgpack support via
pd.read_msgpack()
andpd.to_msgpack()
/df.to_msgpack()
for serialization of arbitrary pandas (and python objects) in a lightweight portable binary format (GH686, GH5506) - Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
- Added
pandas.io.gbq
for reading from (and writing to) Google BigQuery into a DataFrame. (GH4140)
Improvements to existing features¶
read_html
now raises aURLError
instead of catching and raising aValueError
(GH4303, GH4305)read_excel
now supports an integer in itssheetname
argument giving the index of the sheet to read in (GH4301).get_dummies
works with NaN (GH4446)- Added a test for
read_clipboard()
andto_clipboard()
(GH4282) - Added bins argument to
value_counts
(GH3945), also sort and ascending, now available in Series method as well as top-level function. - Text parser now treats anything that reads like inf (“inf”, “Inf”, “-Inf”,
“iNf”, etc.) to infinity. (GH4220, GH4219), affecting
read_table
,read_csv
, etc. - Added a more informative error message when plot arguments contain overlapping color and style arguments (GH4402)
- Significant table writing performance improvements in
HDFStore
- JSON date serialization now performed in low-level C code.
- JSON support for encoding datetime.time
- Expanded JSON docs, more info about orient options and the use of the numpy param when decoding.
- Add
drop_level
argument to xs (GH4180) - Can now resample a DataFrame with ohlc (GH2320)
Index.copy()
andMultiIndex.copy()
now accept keyword arguments to change attributes (i.e.,names
,levels
,labels
) (GH4039)- Add
rename
andset_names
methods toIndex
as well asset_names
,set_levels
,set_labels
toMultiIndex
. (GH4039) with improved validation for all (GH4039, GH4794) - A Series of dtype
timedelta64[ns]
can now be divided/multiplied by an integer series (GH4521) - A Series of dtype
timedelta64[ns]
can now be divided by anothertimedelta64[ns]
object to yield afloat64
dtyped Series. This is frequency conversion; astyping is also supported. - Timedelta64 support
fillna/ffill/bfill
with an integer interpreted as seconds, or atimedelta
(GH3371) - Box numeric ops on
timedelta
Series (GH4984) - Datetime64 support
ffill/bfill
- Performance improvements with
__getitem__
onDataFrames
with when the key is a column - Support for using a
DatetimeIndex/PeriodsIndex
directly in a datelike calculation e.g. s-s.index (GH4629) - Better/cleaned up exceptions in core/common, io/excel and core/format (GH4721, GH3954), as well as cleaned up test cases in tests/test_frame, tests/test_multilevel (GH4732).
- Performance improvement of timeseries plotting with PeriodIndex and added test to vbench (GH4705 and GH4722)
- Add
axis
andlevel
keywords towhere
, so that theother
argument can now be an alignable pandas object. to_datetime
with a format of ‘%Y%m%d’ now parses much faster- It’s now easier to hook new Excel writers into pandas (just subclass
ExcelWriter
and register your engine). You can specify anengine
into_excel
or inExcelWriter
. You can also specify which writers you want to use by default with config optionsio.excel.xlsx.writer
andio.excel.xls.writer
. (GH4745, GH4750) Panel.to_excel()
now accepts keyword arguments that will be passed to itsDataFrame
‘sto_excel()
methods. (GH4750)- Added XlsxWriter as an optional
ExcelWriter
engine. This is about 5x faster than the default openpyxl xlsx writer and is equivalent in speed to the xlwt xls writer module. (GH4542) - allow DataFrame constructor to accept more list-like objects, e.g. list of
collections.Sequence
andarray.Array
objects (GH3783, GH4297, GH4851), thanks @lgautier - DataFrame constructor now accepts a numpy masked record array (GH3478), thanks @jnothman
__getitem__
withtuple
key (e.g.,[:, 2]
) onSeries
withoutMultiIndex
raisesValueError
(GH4759, GH4837)read_json
now raises a (more informative)ValueError
when the dict contains a bad key andorient='split'
(GH4730, GH4838)read_stata
now accepts Stata 13 format (GH4291)ExcelWriter
andExcelFile
can be used as contextmanagers. (GH3441, GH4933)pandas
is now tested with two different versions ofstatsmodels
(0.4.3 and 0.5.0) (GH4981).- Better string representations of
MultiIndex
(including ability to roundtrip viarepr
). (GH3347, GH4935) - Both ExcelFile and read_excel to accept an xlrd.Book for the io (formerly path_or_buf) argument; this requires engine to be set. (GH4961).
concat
now gives a more informative error message when passed objects that cannot be concatenated (GH4608).- Add
halflife
option to exponentially weighted moving functions (PR GH4998) to_dict
now takesrecords
as a possible outtype. Returns an array of column-keyed dictionaries. (GH4936)tz_localize
can infer a fall daylight savings transition based on the structure of unlocalized data (GH4230)- DatetimeIndex is now in the API documentation
- Improve support for converting R datasets to pandas objects (more informative index for timeseries and numeric, support for factors, dist, and high-dimensional arrays).
read_html()
now supports theparse_dates
,tupleize_cols
andthousands
parameters (GH4770).json_normalize()
is a new method to allow you to create a flat table from semi-structured JSON data. See the docs (GH1067)DataFrame.from_records()
will now accept generators (GH4910)DataFrame.interpolate()
andSeries.interpolate()
have been expanded to include interpolation methods from scipy. (GH4434, GH1892)Series
now supports ato_frame
method to convert it to a single-column DataFrame (GH5164)- DatetimeIndex (and date_range) can now be constructed in a left- or
right-open fashion using the
closed
parameter (GH4579) - Python csv parser now supports usecols (GH4335)
- Added support for Google Analytics v3 API segment IDs that also supports v2 IDs. (GH5271)
NDFrame.drop()
now accepts names as well as integers for the axis argument. (GH5354)- Added short docstrings to a few methods that were missing them + fixed the docstrings for Panel flex methods. (GH5336)
NDFrame.drop()
,NDFrame.dropna()
, and.drop_duplicates()
all acceptinplace
as a keyword argument; however, this only means that the wrapper is updated inplace, a copy is still made internally. (GH1960, GH5247, GH5628, and related GH2325 [still not closed])- Fixed bug in tools.plotting.andrews_curvres so that lines are drawn grouped by color as expected.
read_excel()
now tries to convert integral floats (like1.0
) to int by default. (GH5394)- Excel writers now have a default option
merge_cells
into_excel()
to merge cells in MultiIndex and Hierarchical Rows. Note: using this option it is no longer possible to round trip Excel files with merged MultiIndex and Hierarchical Rows. Set themerge_cells
toFalse
to restore the previous behaviour. (GH5254) - The FRED DataReader now accepts multiple series (:issue`3413`)
- StataWriter adjusts variable names to Stata’s limitations (GH5709)
API Changes¶
DataFrame.reindex()
and forward/backward filling now raises ValueError if either index is not monotonic (GH4483, GH4484).pandas
now is Python 2/3 compatible without the need for 2to3 thanks to @jtratner. As a result, pandas now uses iterators more extensively. This also led to the introduction of substantive parts of the Benjamin Peterson’ssix
library into compat. (GH4384, GH4375, GH4372)pandas.util.compat
andpandas.util.py3compat
have been merged intopandas.compat
.pandas.compat
now includes many functions allowing 2/3 compatibility. It contains both list and iterator versions of range, filter, map and zip, plus other necessary elements for Python 3 compatibility.lmap
,lzip
,lrange
andlfilter
all produce lists instead of iterators, for compatibility withnumpy
, subscripting andpandas
constructors.(GH4384, GH4375, GH4372)- deprecated
iterkv
, which will be removed in a future release (was just an alias of iteritems used to get around2to3
‘s changes). (GH4384, GH4375, GH4372) Series.get
with negative indexers now returns the same as[]
(GH4390)- allow
ix/loc
for Series/DataFrame/Panel to set on any axis even when the single-key is not currently contained in the index for that axis (GH2578, GH5226, GH5632, GH5720, GH5744, GH5756) - Default export for
to_clipboard
is now csv with a sep of t for compat (GH3368) at
now will enlarge the object inplace (and return the same) (GH2578)DataFrame.plot
will scatter plot x versus y by passingkind='scatter'
(GH2215)HDFStore
append_to_multiple
automatically synchronizes writing rows to multiple tables and adds adropna
kwarg (GH4698)- handle a passed
Series
in table format (GH4330) - added an
is_open
property to indicate if the underlying file handle is_open; a closed store will now report ‘CLOSED’ when viewing the store (rather than raising an error) (GH4409) - a close of a
HDFStore
now will close that instance of theHDFStore
but will only close the actual file if the ref count (byPyTables
) w.r.t. all of the open handles are 0. Essentially you have a local instance ofHDFStore
referenced by a variable. Once you close it, it will report closed. Other references (to the same file) will continue to operate until they themselves are closed. Performing an action on a closed file will raiseClosedFileError
- removed the
_quiet
attribute, replace by aDuplicateWarning
if retrieving duplicate rows from a table (GH4367) - removed the
warn
argument fromopen
. Instead aPossibleDataLossError
exception will be raised if you try to usemode='w'
with an OPEN file handle (GH4367) - allow a passed locations array or mask as a
where
condition (GH4467) - add the keyword
dropna=True
toappend
to change whether ALL nan rows are not written to the store (default isTrue
, ALL nan rows are NOT written), also settable via the optionio.hdf.dropna_table
(GH4625) - the
format
keyword now replaces thetable
keyword; allowed values arefixed(f)|table(t)
theStorer
format has been renamed toFixed
- a column multi-index will be recreated properly (GH4710); raise on trying to use a multi-index with data_columns on the same axis
select_as_coordinates
will now return anInt64Index
of the resultant selection set- support
timedelta64[ns]
as a serialization type (GH3577) - store datetime.date objects as ordinals rather then timetuples to avoid timezone issues (GH2852), thanks @tavistmorph and @numpand
numexpr
2.2.2 fixes incompatibility in PyTables 2.4 (GH4908)flush
now accepts anfsync
parameter, which defaults toFalse
(GH5364)unicode
indices not supported ontable
formats (GH5386)- pass thru store creation arguments; can be used to support in-memory stores
JSON
Index
andMultiIndex
changes (GH4039):- Setting
levels
andlabels
directly onMultiIndex
is now deprecated. Instead, you can use theset_levels()
andset_labels()
methods. levels
,labels
andnames
properties no longer return lists, but instead return containers that do not allow setting of items (‘mostly immutable’)levels
,labels
andnames
are validated upon setting and are either copied or shallow-copied.- inplace setting of
levels
orlabels
now correctly invalidates the cached properties. (GH5238). __deepcopy__
now returns a shallow copy (currently: a view) of the data - allowing metadata changes.MultiIndex.astype()
now only allowsnp.object_
-like dtypes and now returns aMultiIndex
rather than anIndex
. (GH4039)- Added
is_
method toIndex
that allows fast equality comparison of views (similar tonp.may_share_memory
but no false positives, and changes onlevels
andlabels
setting onMultiIndex
). (GH4859 , GH4909) - Aliased
__iadd__
to__add__
. (GH4996) - Added
is_
method toIndex
that allows fast equality comparison of views (similar tonp.may_share_memory
but no false positives, and changes onlevels
andlabels
setting onMultiIndex
). (GH4859, GH4909)
- Setting
- Infer and downcast dtype if
downcast='infer'
is passed tofillna/ffill/bfill
(GH4604) __nonzero__
for all NDFrame objects, will now raise aValueError
, this reverts back to (GH1073, GH4633) behavior. Add.bool()
method toNDFrame
objects to facilitate evaluating of single-element boolean SeriesDataFrame.update()
no longer raises aDataConflictError
, it now will raise aValueError
instead (if necessary) (GH4732)Series.isin()
andDataFrame.isin()
now raise aTypeError
when passed a string (GH4763). Pass alist
of one element (containing the string) instead.- Remove undocumented/unused
kind
keyword argument fromread_excel
, andExcelFile
. (GH4713, GH4712) - The
method
argument ofNDFrame.replace()
is valid again, so that a a list can be passed toto_replace
(GH4743). - provide automatic dtype conversions on _reduce operations (GH3371)
- exclude non-numerics if mixed types with datelike in _reduce operations (GH3371)
- default for
tupleize_cols
is nowFalse
for bothto_csv
andread_csv
. Fair warning in 0.12 (GH3604) - moved timedeltas support to pandas.tseries.timedeltas.py; add timedeltas
string parsing, add top-level
to_timedelta
function NDFrame
now is compatible with Python’s toplevelabs()
function (GH4821).- raise a
TypeError
on invalid comparison ops on Series/DataFrame (e.g. integer/datetime) (GH4968) - Added a new index type,
Float64Index
. This will be automatically created when passing floating values in index creation. This enables a pure label-based slicing paradigm that makes[],ix,loc
for scalar indexing and slicing work exactly the same. Indexing on other index types are preserved (and positional fallback for[],ix
), with the exception, that floating point slicing on indexes on nonFloat64Index
will raise aTypeError
, e.g.Series(range(5))[3.5:4.5]
(GH263,:issue:5375) - Make Categorical repr nicer (GH4368)
- Remove deprecated
Factor
(GH3650) - Remove deprecated
set_printoptions/reset_printoptions
(:issue:3046
) - Remove deprecated
_verbose_info
(GH3215) - Begin removing methods that don’t make sense on
GroupBy
objects (GH4887). - Remove deprecated
read_clipboard/to_clipboard/ExcelFile/ExcelWriter
frompandas.io.parsers
(GH3717) - All non-Index NDFrames (
Series
,DataFrame
,Panel
,Panel4D
,SparsePanel
, etc.), now support the entire set of arithmetic operators and arithmetic flex methods (add, sub, mul, etc.).SparsePanel
does not supportpow
ormod
with non-scalars. (GH3765) - Arithmetic func factories are now passed real names (suitable for using with super) (GH5240)
- Provide numpy compatibility with 1.7 for a calling convention like
np.prod(pandas_object)
as numpy call with additional keyword args (GH4435) - Provide __dir__ method (and local context) for tab completion / remove ipython completers code (GH4501)
- Support non-unique axes in a Panel via indexing operations (GH4960)
.truncate
will raise aValueError
if invalid before and afters dates are given (GH5242)Timestamp
now supportsnow/today/utcnow
class methods (GH5339)- default for display.max_seq_len is now 100 rather then None. This activates truncated display (”...”) of long sequences in various places. (GH3391)
- All division with
NDFrame
- likes is now truedivision, regardless of the future import. You can use//
andfloordiv
to do integer division.
In [3]: arr = np.array([1, 2, 3, 4])
In [4]: arr2 = np.array([5, 3, 2, 1])
In [5]: arr / arr2
Out[5]: array([0, 0, 1, 4])
In [6]: pd.Series(arr) / pd.Series(arr2) # no future import required
Out[6]:
0 0.200000
1 0.666667
2 1.500000
3 4.000000
dtype: float64
- raise/warn
SettingWithCopyError/Warning
exception/warning when setting of a copy thru chained assignment is detected, settable via optionmode.chained_assignment
- test the list of
NA
values in the csv parser. addN/A
,#NA
as independent default na values (GH5521) - The refactoring involving``Series`` deriving from
NDFrame
breaksrpy2<=2.3.8
. an Issue has been opened against rpy2 and a workaround is detailed in GH5698. Thanks @JanSchulz. Series.argmin
andSeries.argmax
are now aliased toSeries.idxmin
andSeries.idxmax
. These return the index of the min or max element respectively. Prior to 0.13.0 these would return the position of the min / max element (GH6214)
Internal Refactoring¶
In 0.13.0 there is a major refactor primarily to subclass Series
from
NDFrame
, which is the base class currently for DataFrame
and Panel
,
to unify methods and behaviors. Series formerly subclassed directly from
ndarray
. (GH4080, GH3862, GH816)
See Internal Refactoring
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
- added
_setup_axes
to created generic NDFrame structures- moved methods
from_axes
,_wrap_array
,axes
,ix
,loc
,iloc
,shape
,empty
,swapaxes
,transpose
,pop
__iter__
,keys
,__contains__
,__len__
,__neg__
,__invert__
convert_objects
,as_blocks
,as_matrix
,values
__getstate__
,__setstate__
(compat remains in frame/panel)__getattr__
,__setattr__
_indexed_same
,reindex_like
,align
,where
,mask
fillna
,replace
(Series
replace is now consistent withDataFrame
)filter
(also added axis argument to selectively filter on a different axis)reindex
,reindex_axis
,take
truncate
(moved to become part ofNDFrame
)isnull/notnull
now available onNDFrame
objects
- These are API changes which make
Panel
more consistent withDataFrame
swapaxes
on aPanel
with the same axes specified now return a copy- support attribute access for setting
filter
supports same API as originalDataFrame
filterfillna
refactored tocore/generic.py
, while > 3ndim isNotImplemented
- Series now inherits from
NDFrame
rather than directly fromndarray
. There are several minor changes that affect the API.
- numpy functions that do not support the array interface will now return
ndarrays
rather than series, e.g.np.diff
,np.ones_like
,np.where
Series(0.5)
would previously return the scalar0.5
, this is no longer supportedTimeSeries
is now an alias forSeries
. the propertyis_time_series
can be used to distinguish (if desired)
- Refactor of Sparse objects to use BlockManager
- Created a new block type in internals,
SparseBlock
, which can hold multi-dtypes and is non-consolidatable.SparseSeries
andSparseDataFrame
now inherit more methods from there hierarchy (Series/DataFrame), and no longer inherit fromSparseArray
(which instead is the object of theSparseBlock
)- Sparse suite now supports integration with non-sparse data. Non-float sparse data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness, merging type operations will convert to dense (and back to sparse), so might be somewhat inefficient
- enable setitem on
SparseSeries
for boolean/integer/slicesSparsePanels
implementation is unchanged (e.g. not using BlockManager, needs work)
- added
ftypes
method to Series/DataFame, similar todtypes
, but indicates if the underlying is sparse/dense (as well as the dtype) - All
NDFrame
objects now have a_prop_attributes
, which can be used to indicate various values to propagate to a new object from an existing (e.g. name inSeries
will follow more automatically now) - Internal type checking is now done via a suite of generated classes,
allowing
isinstance(value, klass)
without having to directly import the klass, courtesy of @jtratner - Bug in Series update where the parent frame is not updating its cache based on changes (GH4080, GH5216) or types (GH3217), fillna (GH3386)
- Indexing with dtype conversions fixed (GH4463, GH4204)
- Refactor
Series.reindex
to core/generic.py (GH4604, GH4618), allowmethod=
in reindexing on a Series to work Series.copy
no longer accepts theorder
parameter and is now consistent withNDFrame
copy- Refactor
rename
methods to core/generic.py; fixesSeries.rename
for (GH4605), and addsrename
with the same signature forPanel
- Series (for index) / Panel (for items) now as attribute access to its elements (GH1903)
- Refactor
clip
methods to core/generic.py (GH4798) - Refactor of
_get_numeric_data/_get_bool_data
to core/generic.py, allowing Series/Panel functionality - Refactor of Series arithmetic with time-like objects (datetime/timedelta/time etc.) into a separate, cleaned up wrapper class. (GH4613)
- Complex compat for
Series
withndarray
. (GH4819) - Removed unnecessary
rwproperty
from codebase in favor of builtin property. (GH4843) - Refactor object level numeric methods (mean/sum/min/max...) from object
level modules to
core/generic.py
(GH4435). - Refactor cum objects to core/generic.py (GH4435), note that these have a more numpy-like function signature.
read_html()
now usesTextParser
to parse HTML data from bs4/lxml (GH4770).- Removed the
keep_internal
keyword parameter inpandas/core/groupby.py
because it wasn’t being used (GH5102). - Base
DateOffsets
are no longer all instantiated on importing pandas, instead they are generated and cached on the fly. The internal representation and handling of DateOffsets has also been clarified. (GH5189, related GH5004) MultiIndex
constructor now validates that passed levels and labels are compatible. (GH5213, GH5214)- Unity
dropna
for Series/DataFrame signature (GH5250), tests from GH5234, courtesy of @rockg - Rewrite assert_almost_equal() in cython for performance (GH4398)
- Added an internal
_update_inplace
method to facilitate updatingNDFrame
wrappers on inplace ops (only is for convenience of caller, doesn’t actually prevent copies). (GH5247)
Bug Fixes¶
HDFStore
- raising an invalid
TypeError
rather thanValueError
when appending with a different block ordering (GH4096) read_hdf
was not respecting as passedmode
(GH4504)- appending a 0-len table will work correctly (GH4273)
to_hdf
was raising when passing both argumentsappend
andtable
(GH4584)- reading from a store with duplicate columns across dtypes would raise (GH4767)
- Fixed a bug where
ValueError
wasn’t correctly raised when column names weren’t strings (GH4956) - A zero length series written in Fixed format not deserializing properly. (GH4708)
- Fixed decoding perf issue on pyt3 (GH5441)
- Validate levels in a multi-index before storing (GH5527)
- Correctly handle
data_columns
with a Panel (GH5717)
- raising an invalid
- Fixed bug in tslib.tz_convert(vals, tz1, tz2): it could raise IndexError exception while trying to access trans[pos + 1] (GH4496)
- The
by
argument now works correctly with thelayout
argument (GH4102, GH4014) in*.hist
plotting methods - Fixed bug in
PeriodIndex.map
where usingstr
would return the str representation of the index (GH4136) - Fixed test failure
test_time_series_plot_color_with_empty_kwargs
when using custom matplotlib default colors (GH4345) - Fix running of stata IO tests. Now uses temporary files to write (GH4353)
- Fixed an issue where
DataFrame.sum
was slower thanDataFrame.mean
for integer valued frames (GH4365) read_html
tests now work with Python 2.6 (GH4351)- Fixed bug where
network
testing was throwingNameError
because a local variable was undefined (GH4381) - In
to_json
, raise if a passedorient
would cause loss of data because of a duplicate index (GH4359) - In
to_json
, fix date handling so milliseconds are the default timestamp as the docstring says (GH4362). as_index
is no longer ignored when doing groupby apply (GH4648, GH3417)- JSON NaT handling fixed, NaTs are now serialized to null (GH4498)
- Fixed JSON handling of escapable characters in JSON object keys (GH4593)
- Fixed passing
keep_default_na=False
whenna_values=None
(GH4318) - Fixed bug with
values
raising an error on a DataFrame with duplicate columns and mixed dtypes, surfaced in (GH4377) - Fixed bug with duplicate columns and type conversion in
read_json
whenorient='split'
(GH4377) - Fixed JSON bug where locales with decimal separators other than ‘.’ threw exceptions when encoding / decoding certain values. (GH4918)
- Fix
.iat
indexing with aPeriodIndex
(GH4390) - Fixed an issue where
PeriodIndex
joining with self was returning a new instance rather than the same instance (GH4379); also adds a test for this for the other index types - Fixed a bug with all the dtypes being converted to object when using the CSV cparser with the usecols parameter (GH3192)
- Fix an issue in merging blocks where the resulting DataFrame had partially set _ref_locs (GH4403)
- Fixed an issue where hist subplots were being overwritten when they were called using the top level matplotlib API (GH4408)
- Fixed a bug where calling
Series.astype(str)
would truncate the string (GH4405, GH4437) - Fixed a py3 compat issue where bytes were being repr’d as tuples (GH4455)
- Fixed Panel attribute naming conflict if item is named ‘a’ (GH3440)
- Fixed an issue where duplicate indexes were raising when plotting (GH4486)
- Fixed an issue where cumsum and cumprod didn’t work with bool dtypes (GH4170, GH4440)
- Fixed Panel slicing issued in
xs
that was returning an incorrect dimmed object (GH4016) - Fix resampling bug where custom reduce function not used if only one group (GH3849, GH4494)
- Fixed Panel assignment with a transposed frame (GH3830)
- Raise on set indexing with a Panel and a Panel as a value which needs alignment (GH3777)
- frozenset objects now raise in the
Series
constructor (GH4482, GH4480) - Fixed issue with sorting a duplicate multi-index that has multiple dtypes (GH4516)
- Fixed bug in
DataFrame.set_values
which was causing name attributes to be lost when expanding the index. (GH3742, GH4039) - Fixed issue where individual
names
,levels
andlabels
could be set onMultiIndex
without validation (GH3714, GH4039) - Fixed (GH3334) in pivot_table. Margins did not compute if values is the index.
- Fix bug in having a rhs of
np.timedelta64
ornp.offsets.DateOffset
when operating with datetimes (GH4532) - Fix arithmetic with series/datetimeindex and
np.timedelta64
not working the same (GH4134) and buggy timedelta in numpy 1.6 (GH4135) - Fix bug in
pd.read_clipboard
on windows with PY3 (GH4561); not decoding properly tslib.get_period_field()
andtslib.get_period_field_arr()
now raise if code argument out of range (GH4519, GH4520)- Fix boolean indexing on an empty series loses index names (GH4235), infer_dtype works with empty arrays.
- Fix reindexing with multiple axes; if an axes match was not replacing the current axes, leading to a possible lazay frequency inference issue (GH3317)
- Fixed issue where
DataFrame.apply
was reraising exceptions incorrectly (causing the original stack trace to be truncated). - Fix selection with
ix/loc
and non_unique selectors (GH4619) - Fix assignment with iloc/loc involving a dtype change in an existing column (GH4312, GH5702) have internal setitem_with_indexer in core/indexing to use Block.setitem
- Fixed bug where thousands operator was not handled correctly for floating point numbers in csv_import (GH4322)
- Fix an issue with CacheableOffset not properly being used by many DateOffset; this prevented the DateOffset from being cached (GH4609)
- Fix boolean comparison with a DataFrame on the lhs, and a list/tuple on the rhs (GH4576)
- Fix error/dtype conversion with setitem of
None
onSeries/DataFrame
(GH4667) - Fix decoding based on a passed in non-default encoding in
pd.read_stata
(GH4626) - Fix
DataFrame.from_records
with a plain-vanillandarray
. (GH4727) - Fix some inconsistencies with
Index.rename
andMultiIndex.rename
, etc. (GH4718, GH4628) - Bug in using
iloc/loc
with a cross-sectional and duplicate indicies (GH4726) - Bug with using
QUOTE_NONE
withto_csv
causingException
. (GH4328) - Bug with Series indexing not raising an error when the right-hand-side has an incorrect length (GH2702)
- Bug in multi-indexing with a partial string selection as one part of a MultIndex (GH4758)
- Bug with reindexing on the index with a non-unique index will now raise
ValueError
(GH4746) - Bug in setting with
loc/ix
a single indexer with a multi-index axis and a numpy array, related to (GH3777) - Bug in concatenation with duplicate columns across dtypes not merging with axis=0 (GH4771, GH4975)
- Bug in
iloc
with a slice index failing (GH4771) - Incorrect error message with no colspecs or width in
read_fwf
. (GH4774) - Fix bugs in indexing in a Series with a duplicate index (GH4548, GH4550)
- Fixed bug with reading compressed files with
read_fwf
in Python 3. (GH3963) - Fixed an issue with a duplicate index and assignment with a dtype change (GH4686)
- Fixed bug with reading compressed files in as
bytes
rather thanstr
in Python 3. Simplifies bytes-producing file-handling in Python 3 (GH3963, GH4785). - Fixed an issue related to ticklocs/ticklabels with log scale bar plots across different versions of matplotlib (GH4789)
- Suppressed DeprecationWarning associated with internal calls issued by repr() (GH4391)
- Fixed an issue with a duplicate index and duplicate selector with
.loc
(GH4825) - Fixed an issue with
DataFrame.sort_index
where, when sorting by a single column and passing a list forascending
, the argument forascending
was being interpreted asTrue
(GH4839, GH4846) - Fixed
Panel.tshift
not working. Added freq support toPanel.shift
(GH4853) - Fix an issue in TextFileReader w/ Python engine (i.e. PythonParser) with thousands != ”,” (GH4596)
- Bug in getitem with a duplicate index when using where (GH4879)
- Fix Type inference code coerces float column into datetime (GH4601)
- Fixed
_ensure_numeric
does not check for complex numbers (GH4902) - Fixed a bug in
Series.hist
where two figures were being created when theby
argument was passed (GH4112, GH4113). - Fixed a bug in
convert_objects
for > 2 ndims (GH4937) - Fixed a bug in DataFrame/Panel cache insertion and subsequent indexing (GH4939, GH5424)
- Fixed string methods for
FrozenNDArray
andFrozenList
(GH4929) - Fixed a bug with setting invalid or out-of-range values in indexing enlargement scenarios (GH4940)
- Tests for fillna on empty Series (GH4346), thanks @immerrr
- Fixed
copy()
to shallow copy axes/indices as well and thereby keep separate metadata. (GH4202, GH4830) - Fixed skiprows option in Python parser for read_csv (GH4382)
- Fixed bug preventing
cut
from working withnp.inf
levels without explicitly passing labels (GH3415) - Fixed wrong check for overlapping in
DatetimeIndex.union
(GH4564) - Fixed conflict between thousands separator and date parser in csv_parser (GH4678)
- Fix appending when dtypes are not the same (error showing mixing float/np.datetime64) (GH4993)
- Fix repr for DateOffset. No longer show duplicate entries in kwds. Removed unused offset fields. (GH4638)
- Fixed wrong index name during read_csv if using usecols. Applies to c parser only. (GH4201)
Timestamp
objects can now appear in the left hand side of a comparison operation with aSeries
orDataFrame
object (GH4982).- Fix a bug when indexing with
np.nan
viailoc/loc
(GH5016) - Fixed a bug where low memory c parser could create different types in different chunks of the same file. Now coerces to numerical type or raises warning. (GH3866)
- Fix a bug where reshaping a
Series
to its own shape raisedTypeError
(GH4554) and other reshaping issues. - Bug in setting with
ix/loc
and a mixed int/string index (GH4544) - Make sure series-series boolean comparisons are label based (GH4947)
- Bug in multi-level indexing with a Timestamp partial indexer (GH4294)
- Tests/fix for multi-index construction of an all-nan frame (GH4078)
- Fixed a bug where
read_html()
wasn’t correctly inferring values of tables with commas (GH5029) - Fixed a bug where
read_html()
wasn’t providing a stable ordering of returned tables (GH4770, GH5029). - Fixed a bug where
read_html()
was incorrectly parsing when passedindex_col=0
(GH5066). - Fixed a bug where
read_html()
was incorrectly inferring the type of headers (GH5048). - Fixed a bug where
DatetimeIndex
joins withPeriodIndex
caused a stack overflow (GH3899). - Fixed a bug where
groupby
objects didn’t allow plots (GH5102). - Fixed a bug where
groupby
objects weren’t tab-completing column names (GH5102). - Fixed a bug where
groupby.plot()
and friends were duplicating figures multiple times (GH5102). - Provide automatic conversion of
object
dtypes on fillna, related (GH5103) - Fixed a bug where default options were being overwritten in the option parser cleaning (GH5121).
- Treat a list/ndarray identically for
iloc
indexing with list-like (GH5006) - Fix
MultiIndex.get_level_values()
with missing values (GH5074) - Fix bound checking for Timestamp() with datetime64 input (GH4065)
- Fix a bug where
TestReadHtml
wasn’t calling the correctread_html()
function (GH5150). - Fix a bug with
NDFrame.replace()
which made replacement appear as though it was (incorrectly) using regular expressions (GH5143). - Fix better error message for to_datetime (GH4928)
- Made sure different locales are tested on travis-ci (GH4918). Also adds a couple of utilities for getting locales and setting locales with a context manager.
- Fixed segfault on
isnull(MultiIndex)
(now raises an error instead) (GH5123, GH5125) - Allow duplicate indices when performing operations that align (GH5185, GH5639)
- Compound dtypes in a constructor raise
NotImplementedError
(GH5191) - Bug in comparing duplicate frames (GH4421) related
- Bug in describe on duplicate frames
- Bug in
to_datetime
with a format andcoerce=True
not raising (GH5195) - Bug in
loc
setting with multiple indexers and a rhs of a Series that needs broadcasting (GH5206) - Fixed bug where inplace setting of levels or labels on
MultiIndex
would not clear cachedvalues
property and therefore return wrongvalues
. (GH5215) - Fixed bug where filtering a grouped DataFrame or Series did not maintain the original ordering (GH4621).
- Fixed
Period
with a business date freq to always roll-forward if on a non-business date. (GH5203) - Fixed bug in Excel writers where frames with duplicate column names weren’t written correctly. (GH5235)
- Fixed issue with
drop
and a non-unique index on Series (GH5248) - Fixed seg fault in C parser caused by passing more names than columns in the file. (GH5156)
- Fix
Series.isin
with date/time-like dtypes (GH5021) - C and Python Parser can now handle the more common multi-index column format which doesn’t have a row for index names (GH4702)
- Bug when trying to use an out-of-bounds date as an object dtype (GH5312)
- Bug when trying to display an embedded PandasObject (GH5324)
- Allows operating of Timestamps to return a datetime if the result is out-of-bounds related (GH5312)
- Fix return value/type signature of
initObjToJSON()
to be compatible with numpy’simport_array()
(GH5334, GH5326) - Bug when renaming then set_index on a DataFrame (GH5344)
- Test suite no longer leaves around temporary files when testing graphics. (GH5347) (thanks for catching this @yarikoptic!)
- Fixed html tests on win32. (GH4580)
- Make sure that
head/tail
areiloc
based, (GH5370) - Fixed bug for
PeriodIndex
string representation if there are 1 or 2 elements. (GH5372) - The GroupBy methods
transform
andfilter
can be used on Series and DataFrames that have repeated (non-unique) indices. (GH4620) - Fix empty series not printing name in repr (GH4651)
- Make tests create temp files in temp directory by default. (GH5419)
pd.to_timedelta
of a scalar returns a scalar (GH5410)pd.to_timedelta
acceptsNaN
andNaT
, returningNaT
instead of raising (GH5437)- performance improvements in
isnull
on larger size pandas objects - Fixed various setitem with 1d ndarray that does not have a matching length to the indexer (GH5508)
- Bug in getitem with a multi-index and
iloc
(GH5528) - Bug in delitem on a Series (GH5542)
- Bug fix in apply when using custom function and objects are not mutated (GH5545)
- Bug in selecting from a non-unique index with
loc
(GH5553) - Bug in groupby returning non-consistent types when user function returns a
None
, (GH5592) - Work around regression in numpy 1.7.0 which erroneously raises IndexError from
ndarray.item
(GH5666) - Bug in repeated indexing of object with resultant non-unique index (GH5678)
- Bug in fillna with Series and a passed series/dict (GH5703)
- Bug in groupby transform with a datetime-like grouper (GH5712)
- Bug in multi-index selection in PY3 when using certain keys (GH5725)
- Row-wise concat of differing dtypes failing in certain cases (GH5754)
pandas 0.12.0¶
Release date: 2013-07-24
New Features¶
pd.read_html()
can now parse HTML strings, files or urls and returns a list ofDataFrame
s courtesy of @cpcloud. (GH3477, GH3605, GH3606)- Support for reading Amazon S3 files. (GH3504)
- Added module for reading and writing JSON strings/files: pandas.io.json
includes
to_json
DataFrame/Series method, and aread_json
top-level reader various issues (GH1226, GH3804, GH3876, GH3867, GH1305) - Added module for reading and writing Stata files: pandas.io.stata (GH1512)
includes
to_stata
DataFrame method, and aread_stata
top-level reader - Added support for writing in
to_csv
and reading inread_csv
, multi-index columns. Theheader
option inread_csv
now accepts a list of the rows from which to read the index. Added the option,tupleize_cols
to provide compatibility for the pre 0.12 behavior of writing and reading multi-index columns via a list of tuples. The default in 0.12 is to write lists of tuples and not interpret list of tuples as a multi-index column. Note: The default value will change in 0.12 to make the default to write and read multi-index columns in the new format. (GH3571, GH1651, GH3141) - Add iterator to
Series.str
(GH3638) pd.set_option()
now allows N option, value pairs (GH3667).- Added keyword parameters for different types of scatter_matrix subplots
- A
filter
method on grouped Series or DataFrames returns a subset of the original (GH3680, GH919) - Access to historical Google Finance data in pandas.io.data (GH3814)
- DataFrame plotting methods can sample column colors from a Matplotlib
colormap via the
colormap
keyword. (GH3860)
Improvements to existing features¶
- Fixed various issues with internal pprinting code, the repr() for various objects including TimeStamp and Index now produces valid python code strings and can be used to recreate the object, (GH3038, GH3379, GH3251, GH3460)
convert_objects
now accepts acopy
parameter (defaults toTrue
)HDFStore
- will retain index attributes (freq,tz,name) on recreation (GH3499,:issue:4098)
- will warn with a
AttributeConflictWarning
if you are attempting to append an index with a different frequency than the existing, or attempting to append an index with a different name than the existing - support datelike columns with a timezone as data_columns (GH2852)
- table writing performance improvements.
- support python3 (via
PyTables 3.0.0
) (GH3750)
- Add modulo operator to Series, DataFrame
- Add
date
method to DatetimeIndex - Add
dropna
argument to pivot_table (:issue: 3820) - Simplified the API and added a describe method to Categorical
melt
now accepts the optional parametersvar_name
andvalue_name
to specify custom column names of the returned DataFrame (GH3649), thanks @hoechenberger. Ifvar_name
is not specified anddataframe.columns.name
is not None, then this will be used as thevar_name
(GH4144). Also support for MultiIndex columns.- clipboard functions use pyperclip (no dependencies on Windows, alternative dependencies offered for Linux) (GH3837).
- Plotting functions now raise a
TypeError
before trying to plot anything if the associated objects have have a dtype ofobject
(GH1818, GH3572, GH3911, GH3912), but they will try to convert object arrays to numeric arrays if possible so that you can still plot, for example, an object array with floats. This happens before any drawing takes place which eliminates any spurious plots from showing up. - Added Faq section on repr display options, to help users customize their setup.
where
operations that result in block splitting are much faster (GH3733)- Series and DataFrame hist methods now take a
figsize
argument (GH3834) - DatetimeIndexes no longer try to convert mixed-integer indexes during join operations (GH3877)
- Add
unit
keyword toTimestamp
andto_datetime
to enable passing of integers or floats that are in an epoch unit ofD, s, ms, us, ns
, thanks @mtkini (GH3969) (e.g. unix timestamps or epochs
, with fractional seconds allowed) (GH3540) - DataFrame corr method (spearman) is now cythonized.
- Improved
network
test decorator to catchIOError
(and thereforeURLError
as well). Addedwith_connectivity_check
decorator to allow explicitly checking a website as a proxy for seeing if there is network connectivity. Plus, newoptional_args
decorator factory for decorators. (GH3910, GH3914) read_csv
will now throw a more informative error message when a file contains no columns, e.g., all newline characters- Added
layout
keyword to DataFrame.hist() for more customizable layout (GH4050) - Timestamp.min and Timestamp.max now represent valid Timestamp instances instead of the default datetime.min and datetime.max (respectively), thanks @SleepingPills
read_html
now raises when no tables are found and BeautifulSoup==4.2.0 is detected (GH4214)
API Changes¶
HDFStore
- When removing an object,
remove(key)
raisesKeyError
if the key is not a valid store object. - raise a
TypeError
on passingwhere
orcolumns
to select with a Storer; these are invalid parameters at this time (GH4189) - can now specify an
encoding
option toappend/put
to enable alternate encodings (GH3750) - enable support for
iterator/chunksize
withread_hdf
- When removing an object,
- The repr() for (Multi)Index now obeys display.max_seq_items rather then numpy threshold print options. (GH3426, GH3466)
- Added mangle_dupe_cols option to read_table/csv, allowing users to control legacy behaviour re dupe cols (A, A.1, A.2 vs A, A ) (GH3468) Note: The default value will change in 0.12 to the “no mangle” behaviour, If your code relies on this behaviour, explicitly specify mangle_dupe_cols=True in your calls.
- Do not allow astypes on
datetime64[ns]
except toobject
, andtimedelta64[ns]
toobject/int
(GH3425) - The behavior of
datetime64
dtypes has changed with respect to certain so-called reduction operations (GH3726). The following operations now raise aTypeError
when performed on aSeries
and return an emptySeries
when performed on aDataFrame
similar to performing these operations on, for example, aDataFrame
ofslice
objects: - sum, prod, mean, std, var, skew, kurt, corr, and cov - Do not allow datetimelike/timedeltalike creation except with valid types
(e.g. cannot pass
datetime64[ms]
) (GH3423) - Add
squeeze
keyword togroupby
to allow reduction from DataFrame -> Series if groups are unique. Regression from 0.10.1, partial revert on (GH2893) with (GH3596) - Raise on
iloc
when boolean indexing with a label based indexer mask e.g. a boolean Series, even with integer labels, will raise. Sinceiloc
is purely positional based, the labels on the Series are not alignable (GH3631) - The
raise_on_error
option to plotting methods is obviated by GH3572, so it is removed. Plots now always raise when data cannot be plotted or the object being plotted has a dtype ofobject
. DataFrame.interpolate()
is now deprecated. Please useDataFrame.fillna()
andDataFrame.replace()
instead (GH3582, GH3675, GH3676).- the
method
andaxis
arguments ofDataFrame.replace()
are deprecated DataFrame.replace
‘sinfer_types
parameter is removed and now performs conversion by default. (GH3907)- Deprecated display.height, display.width is now only a formatting option does not control triggering of summary, similar to < 0.11.0.
- Add the keyword
allow_duplicates
toDataFrame.insert
to allow a duplicate column to be inserted ifTrue
, default isFalse
(same as prior to 0.12) (GH3679) - io API changes
- added
pandas.io.api
for i/o imports - removed
Excel
support topandas.io.excel
- added top-level
pd.read_sql
andto_sql
DataFrame methods - removed
clipboard
support topandas.io.clipboard
- replace top-level and instance methods
save
andload
with top-levelread_pickle
andto_pickle
instance method,save
andload
will give deprecation warning.
- added
- the
method
andaxis
arguments ofDataFrame.replace()
are deprecated - set FutureWarning to require data_source, and to replace year/month with expiry date in pandas.io options. This is in preparation to add options data from Google (GH3822)
- the
method
andaxis
arguments ofDataFrame.replace()
are deprecated - Implement
__nonzero__
forNDFrame
objects (GH3691, GH3696) as_matrix
with mixed signed and unsigned dtypes will result in 2 x the lcd of the unsigned as an int, maxing withint64
, to avoid precision issues (GH3733)na_values
in a list provided toread_csv/read_excel
will match string and numeric versions e.g.na_values=['99']
will match 99 whether the column ends up being int, float, or string (GH3611)read_html
now defaults toNone
when reading, and falls back onbs4
+html5lib
when lxml fails to parse. a list of parsers to try until success is also valid- more consistency in the to_datetime return types (give string/array of string inputs) (GH3888)
- The internal
pandas
class hierarchy has changed (slightly). The previousPandasObject
now is calledPandasContainer
and a newPandasObject
has become the baseclass forPandasContainer
as well asIndex
,Categorical
,GroupBy
,SparseList
, andSparseArray
(+ their base classes). Currently,PandasObject
provides string methods (fromStringMixin
). (GH4090, GH4092) - New
StringMixin
that, given a__unicode__
method, gets python 2 and python 3 compatible string methods (__str__
,__bytes__
, and__repr__
). Plus string safety throughout. Now employed in many places throughout the pandas library. (GH4090, GH4092)
Experimental Features¶
- Added experimental
CustomBusinessDay
class to supportDateOffsets
with custom holiday calendars and custom weekmasks. (GH2301)
Bug Fixes¶
- Fixed an esoteric excel reading bug, xlrd>= 0.9.0 now required for excel support. Should provide python3 support (for reading) which has been lacking. (GH3164)
- Disallow Series constructor called with MultiIndex which caused segfault (GH4187)
- Allow unioning of date ranges sharing a timezone (GH3491)
- Fix to_csv issue when having a large number of rows and
NaT
in some columns (GH3437) .loc
was not raising when passed an integer list (GH3449)- Unordered time series selection was misbehaving when using label slicing (GH3448)
- Fix sorting in a frame with a list of columns which contains datetime64[ns] dtypes (GH3461)
- DataFrames fetched via FRED now handle ‘.’ as a NaN. (GH3469)
- Fix regression in a DataFrame apply with axis=1, objects were not being converted back to base dtypes correctly (GH3480)
- Fix issue when storing uint dtypes in an HDFStore. (GH3493)
- Non-unique index support clarified (GH3468)
- Addressed handling of dupe columns in df.to_csv new and old (GH3454, GH3457)
- Fix assigning a new index to a duplicate index in a DataFrame would fail (GH3468)
- Fix construction of a DataFrame with a duplicate index
- ref_locs support to allow duplicative indices across dtypes, allows iget support to always find the index (even across dtypes) (GH2194)
- applymap on a DataFrame with a non-unique index now works (removed warning) (GH2786), and fix (GH3230)
- Fix to_csv to handle non-unique columns (GH3495)
- Duplicate indexes with getitem will return items in the correct order (GH3455, GH3457) and handle missing elements like unique indices (GH3561)
- Duplicate indexes with and empty DataFrame.from_records will return a correct frame (GH3562)
- Concat to produce a non-unique columns when duplicates are across dtypes is fixed (GH3602)
- Non-unique indexing with a slice via
loc
and friends fixed (GH3659) - Allow insert/delete to non-unique columns (GH3679)
- Extend
reindex
to correctly deal with non-unique indices (GH3679) DataFrame.itertuples()
now works with frames with duplicate column names (GH3873)- Bug in non-unique indexing via
iloc
(GH4017); addedtakeable
argument toreindex
for location-based taking - Allow non-unique indexing in series via
.ix/.loc
and__getitem__
(GH4246) - Fixed non-unique indexing memory allocation issue with
.ix/.loc
(GH4280)
- Fixed bug in groupby with empty series referencing a variable before assignment. (GH3510)
- Allow index name to be used in groupby for non MultiIndex (GH4014)
- Fixed bug in mixed-frame assignment with aligned series (GH3492)
- Fixed bug in selecting month/quarter/year from a series would not select the time element on the last day (GH3546)
- Fixed a couple of MultiIndex rendering bugs in df.to_html() (GH3547, GH3553)
- Properly convert np.datetime64 objects in a Series (GH3416)
- Raise a
TypeError
on invalid datetime/timedelta operations e.g. add datetimes, multiple timedelta x datetime - Fix
.diff
on datelike and timedelta operations (GH3100) combine_first
not returning the same dtype in cases where it can (GH3552)- Fixed bug with
Panel.transpose
argument aliases (GH3556) - Fixed platform bug in
PeriodIndex.take
(GH3579) - Fixed bud in incorrect conversion of datetime64[ns] in
combine_first
(GH3593) - Fixed bug in reset_index with
NaN
in a multi-index (GH3586) fillna
methods now raise aTypeError
when thevalue
parameter is alist
ortuple
.- Fixed bug where a time-series was being selected in preference to an actual column name in a frame (GH3594)
- Make secondary_y work properly for bar plots (GH3598)
- Fix modulo and integer division on Series,DataFrames to act similary to
float
dtypes to returnnp.nan
ornp.inf
as appropriate (GH3590) - Fix incorrect dtype on groupby with
as_index=False
(GH3610) - Fix
read_csv/read_excel
to correctly encode identical na_values, e.g.na_values=[-999.0,-999]
was failing (GH3611) - Disable HTML output in qtconsole again. (GH3657)
- Reworked the new repr display logic, which users found confusing. (GH3663)
- Fix indexing issue in ndim >= 3 with
iloc
(GH3617) - Correctly parse date columns with embedded (nan/NaT) into datetime64[ns] dtype in
read_csv
whenparse_dates
is specified (GH3062) - Fix not consolidating before to_csv (GH3624)
- Fix alignment issue when setitem in a DataFrame with a piece of a DataFrame (GH3626) or a mixed DataFrame and a Series (GH3668)
- Fix plotting of unordered DatetimeIndex (GH3601)
sql.write_frame
failing when writing a single column to sqlite (GH3628), thanks to @stonebig- Fix pivoting with
nan
in the index (GH3558) - Fix running of bs4 tests when it is not installed (GH3605)
- Fix parsing of html table (GH3606)
read_html()
now only allows a single backend:html5lib
(GH3616)convert_objects
withconvert_dates='coerce'
was parsing some single-letter strings into today’s dateDataFrame.from_records
did not accept empty recarrays (GH3682)DataFrame.to_csv
will succeed with the deprecated optionnanRep
, @tdsmithDataFrame.to_html
andDataFrame.to_latex
now accept a path for their first argument (GH3702)- Fix file tokenization error with r delimiter and quoted fields (GH3453)
- Groupby transform with item-by-item not upcasting correctly (GH3740)
- Incorrectly read a HDFStore multi-index Frame with a column specification (GH3748)
read_html
now correctly skips tests (GH3741)- PandasObjects raise TypeError when trying to hash (GH3882)
- Fix incorrect arguments passed to concat that are not list-like (e.g. concat(df1,df2)) (GH3481)
- Correctly parse when passed the
dtype=str
(or other variable-len string dtypes) inread_csv
(GH3795) - Fix index name not propagating when using
loc/ix
(GH3880) - Fix groupby when applying a custom function resulting in a returned DataFrame was not converting dtypes (GH3911)
- Fixed a bug where
DataFrame.replace
with a compiled regular expression in theto_replace
argument wasn’t working (GH3907) - Fixed
__truediv__
in Python 2.7 withnumexpr
installed to actually do true division when dividing two integer arrays with at least 10000 cells total (GH3764) - Indexing with a string with seconds resolution not selecting from a time index (GH3925)
- csv parsers would loop infinitely if
iterator=True
but nochunksize
was specified (GH3967), python parser failing withchunksize=1
- Fix index name not propagating when using
shift
- Fixed dropna=False being ignored with multi-index stack (GH3997)
- Fixed flattening of columns when renaming MultiIndex columns DataFrame (GH4004)
- Fix
Series.clip
for datetime series. NA/NaN threshold values will now throw ValueError (GH3996) - Fixed insertion issue into DataFrame, after rename (GH4032)
- Fixed testing issue where too many sockets where open thus leading to a connection reset issue (GH3982, GH3985, GH4028, GH4054)
- Fixed failing tests in test_yahoo, test_google where symbols were not retrieved but were being accessed (GH3982, GH3985, GH4028, GH4054)
Series.hist
will now take the figure from the current environment if one is not passed- Fixed bug where a 1xN DataFrame would barf on a 1xN mask (GH4071)
- Fixed running of
tox
under python3 where the pickle import was getting rewritten in an incompatible way (GH4062, GH4063) - Fixed bug where sharex and sharey were not being passed to grouped_hist (GH4089)
- Fix bug where
HDFStore
will fail to append because of a different block ordering on-disk (GH4096) - Better error messages on inserting incompatible columns to a frame (GH4107)
- Fixed bug in
DataFrame.replace
where a nested dict wasn’t being iterated over when regex=False (GH4115) - Fixed bug in
convert_objects(convert_numeric=True)
where a mixed numeric and object Series/Frame was not converting properly (GH4119) - Fixed bugs in multi-index selection with column multi-index and duplicates (GH4145, GH4146)
- Fixed bug in the parsing of microseconds when using the
format
argument into_datetime
(GH4152) - Fixed bug in
PandasAutoDateLocator
whereinvert_xaxis
triggered incorrectlyMilliSecondLocator
(GH3990) - Fixed bug in
Series.where
where broadcasting a single element input vector to the length of the series resulted in multiplying the value inside the input (GH4192) - Fixed bug in plotting that wasn’t raising on invalid colormap for matplotlib 1.1.1 (GH4215)
- Fixed the legend displaying in
DataFrame.plot(kind='kde')
(GH4216) - Fixed bug where Index slices weren’t carrying the name attribute (GH4226)
- Fixed bug in initializing
DatetimeIndex
with an array of strings in a certain time zone (GH4229) - Fixed bug where html5lib wasn’t being properly skipped (GH4265)
- Fixed bug where get_data_famafrench wasn’t using the correct file edges (GH4281)
pandas 0.11.0¶
Release date: 2013-04-22
New Features¶
- New documentation section,
10 Minutes to Pandas
- New documentation section,
Cookbook
- Allow mixed dtypes (e.g
float32/float64/int32/int16/int8
) to coexist in DataFrames and propagate in operations - Add function to pandas.io.data for retrieving stock index components from Yahoo! finance (GH2795)
- Support slicing with time objects (GH2681)
- Added
.iloc
attribute, to support strict integer based indexing, analogous to.ix
(GH2922) - Added
.loc
attribute, to support strict label based indexing, analogous to.ix
(GH3053) - Added
.iat
attribute, to support fast scalar access via integers (replacesiget_value/iset_value
) - Added
.at
attribute, to support fast scalar access via labels (replacesget_value/set_value
) - Moved functionality from
irow,icol,iget_value/iset_value
to.iloc
indexer (via_ixs
methods in each object) - Added support for expression evaluation using the
numexpr
library - Added
convert=boolean
totake
routines to translate negative indices to positive, defaults to True - Added to_series() method to indices, to facilitate the creation of indexers (GH3275)
Improvements to existing features¶
Improved performance of df.to_csv() by up to 10x in some cases. (GH3059)
added
blocks
attribute to DataFrames, to return a dict of dtypes to homogeneously dtyped DataFramesadded keyword
convert_numeric
toconvert_objects()
to try to convert object dtypes to numeric types (default is False)convert_dates
inconvert_objects
can now becoerce
which will return a datetime64[ns] dtype with non-convertibles set asNaT
; will preserve an all-nan object (e.g. strings), default is True (to perform soft-conversionSeries print output now includes the dtype by default
describe_option()
now reports the default and current value of options.Add
format
option topandas.to_datetime
with faster conversion of strings that can be parsed with datetime.strptimeAdd
axes
property toSeries
for compatibilityAdd
xs
function toSeries
for compatibilityAllow setitem in a frame where only mixed numerics are present (e.g. int and float), (GH3037)
HDFStore
Add
squeeze
method to possibly remove length 1 dimensions from an object.In [1]: p = pd.Panel(np.random.randn(3,4,4),items=['ItemA','ItemB','ItemC'], ...: major_axis=pd.date_range('20010102',periods=4), ...: minor_axis=['A','B','C','D']) ...: In [2]: p Out[2]: <class 'pandas.core.panel.Panel'> Dimensions: 3 (items) x 4 (major_axis) x 4 (minor_axis) Items axis: ItemA to ItemC Major_axis axis: 2001-01-02 00:00:00 to 2001-01-05 00:00:00 Minor_axis axis: A to D In [3]: p.reindex(items=['ItemA']).squeeze()