Release Notes¶
This is the list of changes to pandas between each release. For full details, see the commit logs at http://github.com/pandas-dev/pandas
What is it
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.
Where to get it
- Source code: http://github.com/pandas-dev/pandas
- Binary installers on PyPI: http://pypi.python.org/pypi/pandas
- Documentation: http://pandas.pydata.org
pandas 0.21.1¶
Release date: December 12, 2017
This is a minor bug-fix release in the 0.21.x series and includes some small regression fixes, bug fixes and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
- Temporarily restore matplotlib datetime plotting functionality. This should resolve issues for users who relied implicitly on pandas to plot datetimes with matplotlib. See here.
- Improvements to the Parquet IO functions introduced in 0.21.0. See here.
See the v0.21.1 Whatsnew overview for an extensive list of all the changes for 0.21.1.
Thanks¶
A total of 46 people contributed to this release. People with a “+” by their names contributed a patch for the first time.
Contributors¶
- Aaron Critchley +
- Alex Rychyk
- Alexander Buchkovsky +
- Alexander Michael Schade +
- Chris Mazzullo
- Cornelius Riemenschneider +
- Dave Hirschfeld +
- David Fischer +
- David Stansby +
- Dror Atariah +
- Eric Kisslinger +
- Hans +
- Ingolf Becker +
- Jan Werkmann +
- Jeff Reback
- Joris Van den Bossche
- Jörg Döpfert +
- Kevin Kuhl +
- Krzysztof Chomski +
- Leif Walsh
- Licht Takeuchi
- Manraj Singh +
- Matt Braymer-Hayes +
- Michael Waskom +
- Mie~~~ +
- Peter Hoffmann +
- Robert Meyer +
- Sam Cohan +
- Sietse Brouwer +
- Sven +
- Tim Swast
- Tom Augspurger
- Wes Turner
- William Ayd +
- Yee Mey +
- bolkedebruin +
- cgohlke
- derestle-htwg +
- fjdiod +
- gabrielclow +
- gfyoung
- ghasemnaddaf +
- jbrockmendel
- jschendel
- miker985 +
- topper-123
pandas 0.21.0¶
Release date: October 27, 2017
This is a major release from 0.20.3 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Integration with Apache Parquet, including a new top-level
read_parquet()function andDataFrame.to_parquet()method, see here. - New user-facing
pandas.api.types.CategoricalDtypefor specifying categoricals independent of the data, see here. - The behavior of
sumandprodon all-NaN Series/DataFrames is now consistent and no longer depends on whether bottleneck is installed, andsumandprodon empty Series now return NaN instead of 0, see here. - Compatibility fixes for pypy, see here.
- Additions to the
drop,reindexandrenameAPI to make them more consistent, see here. - Addition of the new methods
DataFrame.infer_objects(see here) andGroupBy.pipe(see here). - Indexing with a list of labels, where one or more of the labels is missing, is deprecated and will raise a KeyError in a future version, see here.
See the v0.21.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.21.0
Thanks¶
A total of 206 people contributed to this release. People with a “+” by their names contributed a patch for the first time.
Contributors¶
- 3553x +
- Aaron Barber
- Adam Gleave +
- Adam Smith +
- AdamShamlian +
- Adrian Liaw +
- Alan Velasco +
- Alan Yee +
- Alex B +
- Alex Lubbock +
- Alex Marchenko +
- Alex Rychyk +
- Amol K +
- Andreas Winkler
- Andrew +
- Andrew 亮
- André Jonasson +
- Becky Sweger
- Berkay +
- Bob Haffner +
- Bran Yang
- Brian Tu +
- Brock Mendel +
- Carol Willing +
- Carter Green +
- Chankey Pathak +
- Chris
- Chris Billington
- Chris Filo Gorgolewski +
- Chris Kerr
- Chris M +
- Chris Mazzullo +
- Christian Prinoth
- Christian Stade-Schuldt
- Christoph Moehl +
- DSM
- Daniel Chen +
- Daniel Grady
- Daniel Himmelstein
- Dave Willmer
- David Cook
- David Gwynne
- David Read +
- Dillon Niederhut +
- Douglas Rudd
- Eric Stein +
- Eric Wieser +
- Erik Fredriksen
- Florian Wilhelm +
- Floris Kint +
- Forbidden Donut
- Gabe F +
- Giftlin +
- Giftlin Rajaiah +
- Giulio Pepe +
- Guilherme Beltramini
- Guillem Borrell +
- Hanmin Qin +
- Hendrik Makait +
- Hugues Valois
- Hussain Tamboli +
- Iva Miholic +
- Jan Novotný +
- Jan Rudolph
- Jean Helie +
- Jean-Baptiste Schiratti +
- Jean-Mathieu Deschenes
- Jeff Knupp +
- Jeff Reback
- Jeff Tratner
- JennaVergeynst
- JimStearns206
- Joel Nothman
- John W. O’Brien
- Jon Crall +
- Jon Mease
- Jonathan J. Helmus +
- Joris Van den Bossche
- JosephWagner
- Juarez Bochi
- Julian Kuhlmann +
- Karel De Brabandere
- Kassandra Keeton +
- Keiron Pizzey +
- Keith Webber
- Kernc
- Kevin Sheppard
- Kirk Hansen +
- Licht Takeuchi +
- Lucas Kushner +
- Mahdi Ben Jelloul +
- Makarov Andrey +
- Malgorzata Turzanska +
- Marc Garcia +
- Margaret Sy +
- MarsGuy +
- Matt Bark +
- Matthew Roeschke
- Matti Picus
- Mehmet Ali “Mali” Akmanalp
- Michael Gasvoda +
- Michael Penkov +
- Milo +
- Morgan Stuart +
- Morgan243 +
- Nathan Ford +
- Nick Eubank
- Nick Garvey +
- Oleg Shteynbuk +
- P-Tillmann +
- Pankaj Pandey
- Patrick Luo
- Patrick O’Melveny
- Paul Reidy +
- Paula +
- Peter Quackenbush
- Peter Yanovich +
- Phillip Cloud
- Pierre Haessig
- Pietro Battiston
- Pradyumna Reddy Chinthala
- Prasanjit Prakash
- RobinFiveWords
- Ryan Hendrickson
- Sam Foo
- Sangwoong Yoon +
- Simon Gibbons +
- SimonBaron
- Steven Cutting +
- Sudeep +
- Sylvia +
- T N +
- Telt
- Thomas A Caswell
- Tim Swast +
- Tom Augspurger
- Tong SHEN
- Tuan +
- Utkarsh Upadhyay +
- Vincent La +
- Vivek +
- WANG Aiyong
- WBare
- Wes McKinney
- XF +
- Yi Liu +
- Yosuke Nakabayashi +
- aaron315 +
- abarber4gh +
- aernlund +
- agustín méndez +
- andymaheshw +
- ante328 +
- aviolov +
- bpraggastis
- cbertinato +
- cclauss +
- chernrick
- chris-b1
- dkamm +
- dwkenefick
- economy
- faic +
- fding253 +
- gfyoung
- guygoldberg +
- hhuuggoo +
- huashuai +
- ian
- iulia +
- jaredsnyder
- jbrockmendel +
- jdeschenes
- jebob +
- jschendel +
- keitakurita
- kernc +
- kiwirob +
- kjford
- linebp
- lloydkirk
- louispotok +
- majiang +
- manikbhandari +
- matthiashuschle +
- mattip
- maxwasserman +
- mjlove12 +
- nmartensen +
- pandas-docs-bot +
- parchd-1 +
- philipphanemann +
- rdk1024 +
- reidy-p +
- ri938
- ruiann +
- rvernica +
- s-weigand +
- scotthavard92 +
- skwbc +
- step4me +
- tobycheese +
- topper-123 +
- tsdlovell
- ysau +
- zzgao +
pandas 0.20.0 / 0.20.1¶
Release date: May 5, 2017
This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- New
.agg()API for Series/DataFrame similar to the groupby-rolling-resample API’s, see here - Integration with the
feather-format, including a new top-levelpd.read_feather()andDataFrame.to_feather()method, see here. - The
.ixindexer has been deprecated, see here Panelhas been deprecated, see here- Addition of an
IntervalIndexandIntervalscalar type, see here - Improved user API when grouping by index levels in
.groupby(), see here - Improved support for
UInt64dtypes, see here - A new orient for JSON serialization,
orient='table', that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see here - Experimental support for exporting styled DataFrames (
DataFrame.style) to Excel, see here - Window binary corr/cov operations now return a MultiIndexed
DataFramerather than aPanel, asPanelis now deprecated, see here - Support for S3 handling now uses
s3fs, see here - Google BigQuery support now uses the
pandas-gbqlibrary, see here
See the v0.20.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.20.1.
Note
This is a combined release for 0.20.0 and and 0.20.1.
Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas’ utils routines. (GH16250)
Thanks¶
- abaldenko
- Adam J. Stewart
- Adrian
- adrian-stepien
- Ajay Saxena
- Akash Tandon
- Albert Villanova del Moral
- Aleksey Bilogur
- alexandercbooth
- Alexis Mignon
- Amol Kahat
- Andreas Winkler
- Andrew Kittredge
- Anthonios Partheniou
- Arco Bast
- Ashish Singal
- atbd
- bastewart
- Baurzhan Muftakhidinov
- Ben Kandel
- Ben Thayer
- Ben Welsh
- Bill Chambers
- bmagnusson
- Brandon M. Burroughs
- Brian
- Brian McFee
- carlosdanielcsantos
- Carlos Souza
- chaimdemulder
- Chris
- chris-b1
- Chris Ham
- Christopher C. Aycock
- Christoph Gohlke
- Christoph Paulik
- Chris Warth
- Clemens Brunner
- DaanVanHauwermeiren
- Daniel Himmelstein
- Dave Willmer
- David Cook
- David Gwynne
- David Hoffman
- David Krych
- dickreuter
- Diego Fernandez
- Dimitris Spathis
- discort
- Dmitry L
- Dody Suria Wijaya
- Dominik Stanczak
- Dr-Irv
- Dr. Irv
- dr-leo
- D.S. McNeil
- dubourg
- dwkenefick
- Elliott Sales de Andrade
- Ennemoser Christoph
- Francesc Alted
- Fumito Hamamura
- funnycrab
- gfyoung
- Giacomo Ferroni
- goldenbull
- Graham R. Jeffries
- Greg Williams
- Guilherme Beltramini
- Guilherme Samora
- Hao Wu
- Harshit Patni
- hesham.shabana@hotmail.com
- Ilya V. Schurov
- Iván Vallés Pérez
- Jackie Leng
- Jaehoon Hwang
- James Draper
- James Goppert
- James McBride
- James Santucci
- Jan Schulz
- Jeff Carey
- Jeff Reback
- JennaVergeynst
- Jim
- Jim Crist
- Joe Jevnik
- Joel Nothman
- John
- John Tucker
- John W. O’Brien
- John Zwinck
- jojomdt
- Jonathan de Bruin
- Jonathan Whitmore
- Jon Mease
- Jon M. Mease
- Joost Kranendonk
- Joris Van den Bossche
- Joshua Bradt
- Julian Santander
- Julien Marrec
- Jun Kim
- Justin Solinsky
- Kacawi
- Kamal Kamalaldin
- Kerby Shedden
- Kernc
- Keshav Ramaswamy
- Kevin Sheppard
- Kyle Kelley
- Larry Ren
- Leon Yin
- linebp
- Line Pedersen
- Lorenzo Cestaro
- Luca Scarabello
- Lukasz
- Mahmoud Lababidi
- manu
- manuels
- Mark Mandel
- Matthew Brett
- Matthew Roeschke
- mattip
- Matti Picus
- Matt Roeschke
- maxalbert
- Maximilian Roos
- mcocdawc
- Michael Charlton
- Michael Felt
- Michael Lamparski
- Michiel Stock
- Mikolaj Chwalisz
- Min RK
- Miroslav Šedivý
- Mykola Golubyev
- Nate Yoder
- Nathalie Rud
- Nicholas Ver Halen
- Nick Chmura
- Nolan Nichols
- nuffe
- Pankaj Pandey
- paul-mannino
- Pawel Kordek
- pbreach
- Pete Huang
- Peter
- Peter Csizsek
- Petio Petrov
- Phil Ruffwind
- Pietro Battiston
- Piotr Chromiec
- Prasanjit Prakash
- Robert Bradshaw
- Rob Forgione
- Robin
- Rodolfo Fernandez
- Roger Thomas
- Rouz Azari
- Sahil Dua
- sakkemo
- Sam Foo
- Sami Salonen
- Sarah Bird
- Sarma Tangirala
- scls19fr
- Scott Sanderson
- Sebastian Bank
- Sebastian Gsänger
- Sébastien de Menten
- Shawn Heide
- Shyam Saladi
- sinhrks
- Sinhrks
- Stephen Rauch
- stijnvanhoey
- Tara Adiseshan
- themrmax
- the-nose-knows
- Thiago Serafim
- Thoralf Gutierrez
- Thrasibule
- Tobias Gustafsson
- Tom Augspurger
- tomrod
- Tong Shen
- Tong SHEN
- TrigonaMinima
- tzinckgraf
- Uwe
- wandersoncferreira
- watercrossing
- wcwagner
- Wes Turner
- Wiktor Tomczak
- WillAyd
- xgdgsc
- Yaroslav Halchenko
- Yimeng Zhang
- yui-knk
pandas 0.19.2¶
Release date: December 24, 2016
This is a minor bug-fix release in the 0.19.x series and includes some small regression fixes, bug fixes and performance improvements.
Highlights include:
- Compatibility with Python 3.6
- Added a Pandas Cheat Sheet. (GH13202).
See the v0.19.2 Whatsnew page for an overview of all bugs that have been fixed in 0.19.2.
Thanks¶
- Ajay Saxena
- Ben Kandel
- Chris
- Chris Ham
- Christopher C. Aycock
- Daniel Himmelstein
- Dave Willmer
- Dr-Irv
- gfyoung
- hesham shabana
- Jeff Carey
- Jeff Reback
- Joe Jevnik
- Joris Van den Bossche
- Julian Santander
- Kerby Shedden
- Keshav Ramaswamy
- Kevin Sheppard
- Luca Scarabello
- Matti Picus
- Matt Roeschke
- Maximilian Roos
- Mykola Golubyev
- Nate Yoder
- Nicholas Ver Halen
- Pawel Kordek
- Pietro Battiston
- Rodolfo Fernandez
- sinhrks
- Tara Adiseshan
- Tom Augspurger
- wandersoncferreira
- Yaroslav Halchenko
pandas 0.19.1¶
Release date: November 3, 2016
This is a minor bug-fix release from 0.19.0 and includes some small regression fixes, bug fixes and performance improvements.
See the v0.19.1 Whatsnew page for an overview of all bugs that have been fixed in 0.19.1.
Thanks¶
- Adam Chainz
- Anthonios Partheniou
- Arash Rouhani
- Ben Kandel
- Brandon M. Burroughs
- Chris
- chris-b1
- Chris Warth
- David Krych
- dubourg
- gfyoung
- Iván Vallés Pérez
- Jeff Reback
- Joe Jevnik
- Jon M. Mease
- Joris Van den Bossche
- Josh Owen
- Keshav Ramaswamy
- Larry Ren
- mattrijk
- Michael Felt
- paul-mannino
- Piotr Chromiec
- Robert Bradshaw
- Sinhrks
- Thiago Serafim
- Tom Bird
pandas 0.19.0¶
Release date: October 2, 2016
This is a major release from 0.18.1 and includes number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
merge_asof()for asof-style time-series joining, see here.rolling()is now time-series aware, see hereread_csv()now supports parsingCategoricaldata, see here- A function
union_categorical()has been added for combining categoricals, see here PeriodIndexnow has its ownperioddtype, and changed to be more consistent with otherIndexclasses. See here- Sparse data structures gained enhanced support of
intandbooldtypes, see here - Comparison operations with
Seriesno longer ignores the index, see here for an overview of the API changes. - Introduction of a pandas development API for utility functions, see here.
- Deprecation of
Panel4DandPanelND. We recommend to represent these types of n-dimensional data with the xarray package. - Removal of the previously deprecated modules
pandas.io.data,pandas.io.wb,pandas.tools.rplot.
See the v0.19.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.19.0.
Thanks¶
- adneu
- Adrien Emery
- agraboso
- Alex Alekseyev
- Alex Vig
- Allen Riddell
- Amol
- Amol Agrawal
- Andy R. Terrel
- Anthonios Partheniou
- babakkeyvani
- Ben Kandel
- Bob Baxley
- Brett Rosen
- c123w
- Camilo Cota
- Chris
- chris-b1
- Chris Grinolds
- Christian Hudon
- Christopher C. Aycock
- Chris Warth
- cmazzullo
- conquistador1492
- cr3
- Daniel Siladji
- Douglas McNeil
- Drewrey Lupton
- dsm054
- Eduardo Blancas Reyes
- Elliot Marsden
- Evan Wright
- Felix Marczinowski
- Francis T. O’Donovan
- Gábor Lipták
- Geraint Duck
- gfyoung
- Giacomo Ferroni
- Grant Roch
- Haleemur Ali
- harshul1610
- Hassan Shamim
- iamsimha
- Iulius Curt
- Ivan Nazarov
- jackieleng
- Jeff Reback
- Jeffrey Gerard
- Jenn Olsen
- Jim Crist
- Joe Jevnik
- John Evans
- John Freeman
- John Liekezer
- Johnny Gill
- John W. O’Brien
- John Zwinck
- Jordan Erenrich
- Joris Van den Bossche
- Josh Howes
- Jozef Brandys
- Kamil Sindi
- Ka Wo Chen
- Kerby Shedden
- Kernc
- Kevin Sheppard
- Matthieu Brucher
- Maximilian Roos
- Michael Scherer
- Mike Graham
- Mortada Mehyar
- mpuels
- Muhammad Haseeb Tariq
- Nate George
- Neil Parley
- Nicolas Bonnotte
- OXPHOS
- Pan Deng / Zora
- Paul
- Pauli Virtanen
- Paul Mestemaker
- Pawel Kordek
- Pietro Battiston
- pijucha
- Piotr Jucha
- priyankjain
- Ravi Kumar Nimmi
- Robert Gieseke
- Robert Kern
- Roger Thomas
- Roy Keyes
- Russell Smith
- Sahil Dua
- Sanjiv Lobo
- Sašo Stanovnik
- Shawn Heide
- sinhrks
- Sinhrks
- Stephen Kappel
- Steve Choi
- Stewart Henderson
- Sudarshan Konge
- Thomas A Caswell
- Tom Augspurger
- Tom Bird
- Uwe Hoffmann
- wcwagner
- WillAyd
- Xiang Zhang
- Yadunandan
- Yaroslav Halchenko
- YG-Riku
- Yuichiro Kaneko
- yui-knk
- zhangjinjie
- znmean
- 颜发才(Yan Facai)
pandas 0.18.1¶
Release date: (May 3, 2016)
This is a minor release from 0.18.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
.groupby(...)has been enhanced to provide convenient syntax when working with.rolling(..),.expanding(..)and.resample(..)per group, see herepd.to_datetime()has gained the ability to assemble dates from aDataFrame, see here- Method chaining improvements, see here.
- Custom business hour offset, see here.
- Many bug fixes in the handling of
sparse, see here - Expanded the Tutorials section with a feature on modern pandas, courtesy of @TomAugsburger. (GH13045).
See the v0.18.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.18.1.
Thanks¶
- Andrew Fiore-Gartland
- Bastiaan
- Benoît Vinot
- Brandon Rhodes
- DaCoEx
- Drew Fustin
- Ernesto Freitas
- Filip Ter
- Gregory Livschitz
- Gábor Lipták
- Hassan Kibirige
- Iblis Lin
- Israel Saeta Pérez
- Jason Wolosonovich
- Jeff Reback
- Joe Jevnik
- Joris Van den Bossche
- Joshua Storck
- Ka Wo Chen
- Kerby Shedden
- Kieran O’Mahony
- Leif Walsh
- Mahmoud Lababidi
- Maoyuan Liu
- Mark Roth
- Matt Wittmann
- MaxU
- Maximilian Roos
- Michael Droettboom
- Nick Eubank
- Nicolas Bonnotte
- OXPHOS
- Pauli Virtanen
- Peter Waller
- Pietro Battiston
- Prabhjot Singh
- Robin Wilson
- Roger Thomas
- Sebastian Bank
- Stephen Hoover
- Tim Hopper
- Tom Augspurger
- WANG Aiyong
- Wes Turner
- Winand
- Xbar
- Yan Facai
- adneu
- ajenkins-cargometrics
- behzad nouri
- chinskiy
- gfyoung
- jeps-journal
- jonaslb
- kotrfa
- nileracecrew
- onesandzeroes
- rs2
- sinhrks
- tsdlovell
pandas 0.18.0¶
Release date: (March 13, 2016)
This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Moving and expanding window functions are now methods on Series and DataFrame,
similar to
.groupby, see here. - Adding support for a
RangeIndexas a specialized form of theInt64Indexfor memory savings, see here. - API breaking change to the
.resamplemethod to make it more.groupbylike, see here. - Removal of support for positional indexing with floats, which was deprecated
since 0.14.0. This will now raise a
TypeError, see here. - The
.to_xarray()function has been added for compatibility with the xarray package, see here. - The
read_sasfunction has been enhanced to readsas7bdatfiles, see here. - Addition of the .str.extractall() method, and API changes to the .str.extract() method and .str.cat() method.
pd.test()top-level nose test runner is available (GH4327).
See the v0.18.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.18.0.
Thanks¶
- ARF
- Alex Alekseyev
- Andrew McPherson
- Andrew Rosenfeld
- Anthonios Partheniou
- Anton I. Sipos
- Ben
- Ben North
- Bran Yang
- Chris
- Chris Carroux
- Christopher C. Aycock
- Christopher Scanlin
- Cody
- Da Wang
- Daniel Grady
- Dorozhko Anton
- Dr-Irv
- Erik M. Bray
- Evan Wright
- Francis T. O’Donovan
- Frank Cleary
- Gianluca Rossi
- Graham Jeffries
- Guillaume Horel
- Henry Hammond
- Isaac Schwabacher
- Jean-Mathieu Deschenes
- Jeff Reback
- Joe Jevnik
- John Freeman
- John Fremlin
- Jonas Hoersch
- Joris Van den Bossche
- Joris Vankerschaver
- Justin Lecher
- Justin Lin
- Ka Wo Chen
- Keming Zhang
- Kerby Shedden
- Kyle
- Marco Farrugia
- MasonGallo
- MattRijk
- Matthew Lurie
- Maximilian Roos
- Mayank Asthana
- Mortada Mehyar
- Moussa Taifi
- Navreet Gill
- Nicolas Bonnotte
- Paul Reiners
- Philip Gura
- Pietro Battiston
- RahulHP
- Randy Carnevale
- Rinoc Johnson
- Rishipuri
- Sangmin Park
- Scott E Lasley
- Sereger13
- Shannon Wang
- Skipper Seabold
- Thierry Moisan
- Thomas A Caswell
- Toby Dylan Hocking
- Tom Augspurger
- Travis
- Trent Hauck
- Tux1
- Varun
- Wes McKinney
- Will Thompson
- Yoav Ram
- Yoong Kang Lim
- Yoshiki Vázquez Baeza
- Young Joong Kim
- Younggun Kim
- Yuval Langer
- alex argunov
- behzad nouri
- boombard
- brian-pantano
- chromy
- daniel
- dgram0
- gfyoung
- hack-c
- hcontrast
- jfoo
- kaustuv deolal
- llllllllll
- ranarag
- rockg
- scls19fr
- seales
- sinhrks
- srib
- surveymedia.ca
- tworec
pandas 0.17.1¶
Release date: (November 21, 2015)
This is a minor release from 0.17.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
- Support for Conditional HTML Formatting, see here
- Releasing the GIL on the csv reader & other ops, see here
- Regression in
DataFrame.drop_duplicatesfrom 0.16.2, causing incorrect results on integer values (GH11376)
See the v0.17.1 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.17.1.
Thanks¶
- Aleksandr Drozd
- Alex Chase
- Anthonios Partheniou
- BrenBarn
- Brian J. McGuirk
- Chris
- Christian Berendt
- Christian Perez
- Cody Piersall
- Data & Code Expert Experimenting with Code on Data
- DrIrv
- Evan Wright
- Guillaume Gay
- Hamed Saljooghinejad
- Iblis Lin
- Jake VanderPlas
- Jan Schulz
- Jean-Mathieu Deschenes
- Jeff Reback
- Jimmy Callin
- Joris Van den Bossche
- K.-Michael Aye
- Ka Wo Chen
- Loïc Séguin-C
- Luo Yicheng
- Magnus Jöud
- Manuel Leonhardt
- Matthew Gilbert
- Maximilian Roos
- Michael
- Nicholas Stahl
- Nicolas Bonnotte
- Pastafarianist
- Petra Chong
- Phil Schaf
- Philipp A
- Rob deCarvalho
- Roman Khomenko
- Rémy Léone
- Sebastian Bank
- Thierry Moisan
- Tom Augspurger
- Tux1
- Varun
- Wieland Hoffmann
- Winterflower
- Yoav Ram
- Younggun Kim
- Zeke
- ajcr
- azuranski
- behzad nouri
- cel4
- emilydolson
- hironow
- lexual
- llllllllll
- rockg
- silentquasar
- sinhrks
- taeold
pandas 0.17.0¶
Release date: (October 9, 2015)
This is a major release from 0.16.2 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
- Release the Global Interpreter Lock (GIL) on some cython operations, see here
- Plotting methods are now available as attributes of the
.plotaccessor, see here - The sorting API has been revamped to remove some long-time inconsistencies, see here
- Support for a
datetime64[ns]with timezones as a first-class dtype, see here - The default for
to_datetimewill now be toraisewhen presented with unparseable formats, previously this would return the original input. Also, date parse functions now return consistent results. See here - The default for
dropnainHDFStorehas changed toFalse, to store by default all rows even if they are allNaN, see here - Datetime accessor (
dt) now supportsSeries.dt.strftimeto generate formatted strings for datetime-likes, andSeries.dt.total_secondsto generate each duration of the timedelta in seconds. See here PeriodandPeriodIndexcan handle multiplied freq like3D, which corresponding to 3 days span. See here- Development installed versions of pandas will now have
PEP440compliant version strings (GH9518) - Development support for benchmarking with the Air Speed Velocity library (GH8316)
- Support for reading SAS xport files, see here
- Documentation comparing SAS to pandas, see here
- Removal of the automatic TimeSeries broadcasting, deprecated since 0.8.0, see here
- Display format with plain text can optionally align with Unicode East Asian Width, see here
- Compatibility with Python 3.5 (GH11097)
- Compatibility with matplotlib 1.5.0 (GH11111)
See the v0.17.0 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.17.0.
Thanks¶
- Alex Rothberg
- Andrea Bedini
- Andrew Rosenfeld
- Andy Li
- Anthonios Partheniou
- Artemy Kolchinsky
- Bernard Willers
- Charlie Clark
- Chris
- Chris Whelan
- Christoph Gohlke
- Christopher Whelan
- Clark Fitzgerald
- Clearfield Christopher
- Dan Ringwalt
- Daniel Ni
- Data & Code Expert Experimenting with Code on Data
- David Cottrell
- David John Gagne
- David Kelly
- ETF
- Eduardo Schettino
- Egor
- Egor Panfilov
- Evan Wright
- Frank Pinter
- Gabriel Araujo
- Garrett-R
- Gianluca Rossi
- Guillaume Gay
- Guillaume Poulin
- Harsh Nisar
- Ian Henriksen
- Ian Hoegen
- Jaidev Deshpande
- Jan Rudolph
- Jan Schulz
- Jason Swails
- Jeff Reback
- Jonas Buyl
- Joris Van den Bossche
- Joris Vankerschaver
- Josh Levy-Kramer
- Julien Danjou
- Ka Wo Chen
- Karrie Kehoe
- Kelsey Jordahl
- Kerby Shedden
- Kevin Sheppard
- Lars Buitinck
- Leif Johnson
- Luis Ortiz
- Mac
- Matt Gambogi
- Matt Savoie
- Matthew Gilbert
- Maximilian Roos
- Michelangelo D’Agostino
- Mortada Mehyar
- Nick Eubank
- Nipun Batra
- Ondřej Čertík
- Phillip Cloud
- Pratap Vardhan
- Rafal Skolasinski
- Richard Lewis
- Rinoc Johnson
- Rob Levy
- Robert Gieseke
- Safia Abdalla
- Samuel Denny
- Saumitra Shahapure
- Sebastian Pölsterl
- Sebastian Rubbert
- Sheppard, Kevin
- Sinhrks
- Siu Kwan Lam
- Skipper Seabold
- Spencer Carrucciu
- Stephan Hoyer
- Stephen Hoover
- Stephen Pascoe
- Terry Santegoeds
- Thomas Grainger
- Tjerk Santegoeds
- Tom Augspurger
- Vincent Davis
- Winterflower
- Yaroslav Halchenko
- Yuan Tang (Terry)
- agijsberts
- ajcr
- behzad nouri
- cel4
- cyrusmaher
- davidovitch
- ganego
- jreback
- juricast
- larvian
- maximilianr
- msund
- rekcahpassyla
- robertzk
- scls19fr
- seth-p
- sinhrks
- springcoil
- terrytangyuan
- tzinckgraf
pandas 0.16.2¶
Release date: (June 12, 2015)
This is a minor release from 0.16.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements.
Highlights include:
See the v0.16.2 Whatsnew overview for an extensive list of all enhancements and bugs that have been fixed in 0.16.2.
Thanks¶
- Andrew Rosenfeld
- Artemy Kolchinsky
- Bernard Willers
- Christer van der Meeren
- Christian Hudon
- Constantine Glen Evans
- Daniel Julius Lasiman
- Evan Wright
- Francesco Brundu
- Gaëtan de Menten
- Jake VanderPlas
- James Hiebert
- Jeff Reback
- Joris Van den Bossche
- Justin Lecher
- Ka Wo Chen
- Kevin Sheppard
- Mortada Mehyar
- Morton Fox
- Robin Wilson
- Thomas Grainger
- Tom Ajamian
- Tom Augspurger
- Yoshiki Vázquez Baeza
- Younggun Kim
- austinc
- behzad nouri
- jreback
- lexual
- rekcahpassyla
- scls19fr
- sinhrks
pandas 0.16.1¶
Release date: (May 11, 2015)
This is a minor release from 0.16.0 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs.
See the v0.16.1 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.16.1.
Thanks¶
- Alfonso MHC
- Andy Hayden
- Artemy Kolchinsky
- Chris Gilmer
- Chris Grinolds
- Dan Birken
- David BROCHART
- David Hirschfeld
- David Stephens
- Dr. Leo
- Evan Wright
- Frans van Dunné
- Hatem Nassrat
- Henning Sperr
- Hugo Herter
- Jan Schulz
- Jeff Blackburne
- Jeff Reback
- Jim Crist
- Jonas Abernot
- Joris Van den Bossche
- Kerby Shedden
- Leo Razoumov
- Manuel Riel
- Mortada Mehyar
- Nick Burns
- Nick Eubank
- Olivier Grisel
- Phillip Cloud
- Pietro Battiston
- Roy Hyunjin Han
- Sam Zhang
- Scott Sanderson
- Stephan Hoyer
- Tiago Antao
- Tom Ajamian
- Tom Augspurger
- Tomaz Berisa
- Vikram Shirgur
- Vladimir Filimonov
- William Hogman
- Yasin A
- Younggun Kim
- behzad nouri
- dsm054
- floydsoft
- flying-sheep
- gfr
- jnmclarty
- jreback
- ksanghai
- lucas
- mschmohl
- ptype
- rockg
- scls19fr
- sinhrks
pandas 0.16.0¶
Release date: (March 22, 2015)
This is a major release from 0.15.2 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
DataFrame.assignmethod, see hereSeries.to_coo/from_coomethods to interact withscipy.sparse, see here- Backwards incompatible change to
Timedeltato conform the.secondsattribute withdatetime.timedelta, see here - Changes to the
.locslicing API to conform with the behavior of.ixsee here - Changes to the default for ordering in the
Categoricalconstructor, see here - The
pandas.tools.rplot,pandas.sandbox.qtpandasandpandas.rpymodules are deprecated. We refer users to external packages like seaborn, pandas-qt and rpy2 for similar or equivalent functionality, see here
See the v0.16.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.16.0.
Thanks¶
- Aaron Toth
- Alan Du
- Alessandro Amici
- Artemy Kolchinsky
- Ashwini Chaudhary
- Ben Schiller
- Bill Letson
- Brandon Bradley
- Chau Hoang
- Chris Reynolds
- Chris Whelan
- Christer van der Meeren
- David Cottrell
- David Stephens
- Ehsan Azarnasab
- Garrett-R
- Guillaume Gay
- Jake Torcasso
- Jason Sexauer
- Jeff Reback
- John McNamara
- Joris Van den Bossche
- Joschka zur Jacobsmühlen
- Juarez Bochi
- Junya Hayashi
- K.-Michael Aye
- Kerby Shedden
- Kevin Sheppard
- Kieran O’Mahony
- Kodi Arfer
- Matti Airas
- Min RK
- Mortada Mehyar
- Robert
- Scott E Lasley
- Scott Lasley
- Sergio Pascual
- Skipper Seabold
- Stephan Hoyer
- Thomas Grainger
- Tom Augspurger
- TomAugspurger
- Vladimir Filimonov
- Vyomkesh Tripathi
- Will Holmgren
- Yulong Yang
- behzad nouri
- bertrandhaut
- bjonen
- cel4
- clham
- hsperr
- ischwabacher
- jnmclarty
- josham
- jreback
- omtinez
- roch
- sinhrks
- unutbu
pandas 0.15.2¶
Release date: (December 12, 2014)
This is a minor release from 0.15.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs.
See the v0.15.2 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.2.
Thanks¶
- Aaron Staple
- Angelos Evripiotis
- Artemy Kolchinsky
- Benoit Pointet
- Brian Jacobowski
- Charalampos Papaloizou
- Chris Warth
- David Stephens
- Fabio Zanini
- Francesc Via
- Henry Kleynhans
- Jake VanderPlas
- Jan Schulz
- Jeff Reback
- Jeff Tratner
- Joris Van den Bossche
- Kevin Sheppard
- Matt Suggit
- Matthew Brett
- Phillip Cloud
- Rupert Thompson
- Scott E Lasley
- Stephan Hoyer
- Stephen Simmons
- Sylvain Corlay
- Thomas Grainger
- Tiago Antao
- Trent Hauck
- Victor Chaves
- Victor Salgado
- Vikram Bhandoh
- WANG Aiyong
- Will Holmgren
- behzad nouri
- broessli
- charalampos papaloizou
- immerrr
- jnmclarty
- jreback
- mgilbert
- onesandzeroes
- peadarcoyle
- rockg
- seth-p
- sinhrks
- unutbu
- wavedatalab
- Åsmund Hjulstad
pandas 0.15.1¶
Release date: (November 9, 2014)
This is a minor release from 0.15.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
See the v0.15.1 Whatsnew overview for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.1.
Thanks¶
- Aaron Staple
- Andrew Rosenfeld
- Anton I. Sipos
- Artemy Kolchinsky
- Bill Letson
- Dave Hughes
- David Stephens
- Guillaume Horel
- Jeff Reback
- Joris Van den Bossche
- Kevin Sheppard
- Nick Stahl
- Sanghee Kim
- Stephan Hoyer
- TomAugspurger
- WANG Aiyong
- behzad nouri
- immerrr
- jnmclarty
- jreback
- pallav-fdsi
- unutbu
pandas 0.15.0¶
Release date: (October 18, 2014)
This is a major release from 0.14.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- Drop support for numpy < 1.7.0 (GH7711)
- The
Categoricaltype was integrated as a first-class pandas type, see here - New scalar type
Timedelta, and a new index typeTimedeltaIndex, see here - New DataFrame default display for
df.info()to include memory usage, see Memory Usage - New datetimelike properties accessor
.dtfor Series, see Datetimelike Properties - Split indexing documentation into Indexing and Selecting Data and MultiIndex / Advanced Indexing
- Split out string methods documentation into Working with Text Data
read_csvwill now by default ignore blank lines when parsing, see here- API change in using Indexes in set operations, see here
- Internal refactoring of the
Indexclass to no longer sub-classndarray, see Internal Refactoring - dropping support for
PyTablesless than version 3.0.0, andnumexprless than version 2.1 (GH7990)
See the v0.15.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.15.0.
Thanks¶
- Aaron Schumacher
- Adam Greenhall
- Andy Hayden
- Anthony O’Brien
- Artemy Kolchinsky
- behzad nouri
- Benedikt Sauer
- benjamin
- Benjamin Thyreau
- Ben Schiller
- bjonen
- BorisVerk
- Chris Reynolds
- Chris Stoafer
- Dav Clark
- dlovell
- DSM
- dsm054
- FragLegs
- German Gomez-Herrero
- Hsiaoming Yang
- Huan Li
- hunterowens
- Hyungtae Kim
- immerrr
- Isaac Slavitt
- ischwabacher
- Jacob Schaer
- Jacob Wasserman
- Jan Schulz
- Jeff Tratner
- Jesse Farnham
- jmorris0x0
- jnmclarty
- Joe Bradish
- Joerg Rittinger
- John W. O’Brien
- Joris Van den Bossche
- jreback
- Kevin Sheppard
- klonuo
- Kyle Meyer
- lexual
- Max Chang
- mcjcode
- Michael Mueller
- Michael W Schatzow
- Mike Kelly
- Mortada Mehyar
- mtrbean
- Nathan Sanders
- Nathan Typanski
- onesandzeroes
- Paul Masurel
- Phillip Cloud
- Pietro Battiston
- RenzoBertocchi
- rockg
- Ross Petchler
- seth-p
- Shahul Hameed
- Shashank Agarwal
- sinhrks
- someben
- stahlous
- stas-sl
- Stephan Hoyer
- thatneat
- tom-alcorn
- TomAugspurger
- Tom Augspurger
- Tony Lorenzo
- unknown
- unutbu
- Wes Turner
- Wilfred Hughes
- Yevgeniy Grechka
- Yoshiki Vázquez Baeza
- zachcp
pandas 0.14.1¶
Release date: (July 11, 2014)
This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- New methods
select_dtypes()to select columns based on the dtype andsem()to calculate the standard error of the mean. - Support for dateutil timezones (see docs).
- Support for ignoring full line comments in the
read_csv()text parser. - New documentation section on Options and Settings.
- Lots of bug fixes.
See the v0.14.1 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.14.1.
Thanks¶
- Andrew Rosenfeld
- Andy Hayden
- Benjamin Adams
- Benjamin M. Gross
- Brian Quistorff
- Brian Wignall
- bwignall
- clham
- Daniel Waeber
- David Bew
- David Stephens
- DSM
- dsm054
- helger
- immerrr
- Jacob Schaer
- jaimefrio
- Jan Schulz
- John David Reaver
- John W. O’Brien
- Joris Van den Bossche
- jreback
- Julien Danjou
- Kevin Sheppard
- K.-Michael Aye
- Kyle Meyer
- lexual
- Matthew Brett
- Matt Wittmann
- Michael Mueller
- Mortada Mehyar
- onesandzeroes
- Phillip Cloud
- Rob Levy
- rockg
- sanguineturtle
- Schaer, Jacob C
- seth-p
- sinhrks
- Stephan Hoyer
- Thomas Kluyver
- Todd Jennings
- TomAugspurger
- unknown
- yelite
pandas 0.14.0¶
Release date: (May 31, 2014)
This is a major release from 0.13.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes.
Highlights include:
- Officially support Python 3.4
- SQL interfaces updated to use
sqlalchemy, see here. - Display interface changes, see here
- MultiIndexing using Slicers, see here.
- Ability to join a singly-indexed DataFrame with a multi-indexed DataFrame, see here
- More consistency in groupby results and more flexible groupby specifications, see here
- Holiday calendars are now supported in
CustomBusinessDay, see here - Several improvements in plotting functions, including: hexbin, area and pie plots, see here.
- Performance doc section on I/O operations, see here
See the v0.14.0 Whatsnew overview or the issue tracker on GitHub for an extensive list of all API changes, enhancements and bugs that have been fixed in 0.14.0.
Thanks¶
- Acanthostega
- Adam Marcus
- agijsberts
- akittredge
- Alex Gaudio
- Alex Rothberg
- AllenDowney
- Andrew Rosenfeld
- Andy Hayden
- ankostis
- anomrake
- Antoine Mazières
- anton-d
- bashtage
- Benedikt Sauer
- benjamin
- Brad Buran
- bwignall
- cgohlke
- chebee7i
- Christopher Whelan
- Clark Fitzgerald
- clham
- Dale Jung
- Dan Allan
- Dan Birken
- danielballan
- Daniel Waeber
- David Jung
- David Stephens
- Douglas McNeil
- DSM
- Garrett Drapala
- Gouthaman Balaraman
- Guillaume Poulin
- hshimizu77
- hugo
- immerrr
- ischwabacher
- Jacob Howard
- Jacob Schaer
- jaimefrio
- Jason Sexauer
- Jeff Reback
- Jeffrey Starr
- Jeff Tratner
- John David Reaver
- John McNamara
- John W. O’Brien
- Jonathan Chambers
- Joris Van den Bossche
- jreback
- jsexauer
- Julia Evans
- Júlio
- Katie Atkinson
- kdiether
- Kelsey Jordahl
- Kevin Sheppard
- K.-Michael Aye
- Matthias Kuhn
- Matt Wittmann
- Max Grender-Jones
- Michael E. Gruen
- michaelws
- mikebailey
- Mike Kelly
- Nipun Batra
- Noah Spies
- ojdo
- onesandzeroes
- Patrick O’Keeffe
- phaebz
- Phillip Cloud
- Pietro Battiston
- PKEuS
- Randy Carnevale
- ribonoous
- Robert Gibboni
- rockg
- sinhrks
- Skipper Seabold
- SplashDance
- Stephan Hoyer
- Tim Cera
- Tobias Brandt
- Todd Jennings
- TomAugspurger
- Tom Augspurger
- unutbu
- westurner
- Yaroslav Halchenko
- y-p
- zach powers
pandas 0.13.1¶
Release date: (February 3, 2014)
API Changes¶
Series.sortwill raise aValueError(rather than aTypeError) on sorting an object that is a view of another (GH5856, GH5853)- Raise/Warn
SettingWithCopyError(according to the optionchained_assignmentin more cases, when detecting chained assignment, related (GH5938, GH6025) - DataFrame.head(0) returns self instead of empty frame (GH5846)
autocorrelation_plotnow accepts**kwargs. (GH5623)convert_objectsnow accepts aconvert_timedeltas='coerce'argument to allow forced dtype conversion of timedeltas (GH5458,:issue:5689)- Add
-NaNand-nanto the default set of NA values (GH5952). See NA Values. NDFramenow has anequalsmethod. (GH5283)DataFrame.applywill use thereduceargument to determine whether aSeriesor aDataFrameshould be returned when theDataFrameis empty (GH6007).
Experimental Features¶
Improvements to existing features¶
- perf improvements in Series datetime/timedelta binary operations (GH5801)
- option_context context manager now available as top-level API (GH5752)
- df.info() view now display dtype info per column (GH5682)
- df.info() now honors option max_info_rows, disable null counts for large frames (GH5974)
- perf improvements in DataFrame
count/dropnaforaxis=1 - Series.str.contains now has a regex=False keyword which can be faster for plain (non-regex) string patterns. (GH5879)
- support
dtypesproperty onSeries/Panel/Panel4D - extend
Panel.applyto allow arbitrary functions (rather than only ufuncs) (GH1148) allow multiple axes to be used to operate on slabs of aPanel - The
ArrayFormatterfordatetimeandtimedelta64now intelligently limit precision based on the values in the array (GH3401) pd.show_versions()is now available for convenience when reporting issues.- perf improvements to Series.str.extract (GH5944)
- perf improvements in
dtypes/ftypesmethods (GH5968) - perf improvements in indexing with object dtypes (GH5968)
- improved dtype inference for
timedeltalike passed to constructors (GH5458, GH5689) - escape special characters when writing to latex (:issue: 5374)
- perf improvements in
DataFrame.apply(GH6013) pd.read_csvandpd.to_datetimelearned a newinfer_datetime_formatkeyword which greatly improves parsing perf in many cases. Thanks to @lexual for suggesting and @danbirken for rapidly implementing. (GH5490,:issue:6021)- add ability to recognize ‘%p’ format code (am/pm) to date parsers when the specific format is supplied (GH5361)
- Fix performance regression in JSON IO (GH5765)
- performance regression in Index construction from Series (GH6150)
Bug Fixes¶
- Bug in
io.wb.get_countriesnot including all countries (GH6008) - Bug in Series replace with timestamp dict (GH5797)
- read_csv/read_table now respects the prefix kwarg (GH5732).
- Bug in selection with missing values via
.ixfrom a duplicate indexed DataFrame failing (GH5835) - Fix issue of boolean comparison on empty DataFrames (GH5808)
- Bug in isnull handling
NaTin an object array (GH5443) - Bug in
to_datetimewhen passed anp.nanor integer datelike and a format string (GH5863) - Bug in groupby dtype conversion with datetimelike (GH5869)
- Regression in handling of empty Series as indexers to Series (GH5877)
- Bug in internal caching, related to (GH5727)
- Testing bug in reading JSON/msgpack from a non-filepath on windows under py3 (GH5874)
- Bug when assigning to .ix[tuple(...)] (GH5896)
- Bug in fully reindexing a Panel (GH5905)
- Bug in idxmin/max with object dtypes (GH5914)
- Bug in
BusinessDaywhen adding n days to a date not on offset when n>5 and n%5==0 (GH5890) - Bug in assigning to chained series with a series via ix (GH5928)
- Bug in creating an empty DataFrame, copying, then assigning (GH5932)
- Bug in DataFrame.tail with empty frame (GH5846)
- Bug in propagating metadata on
resample(GH5862) - Fixed string-representation of
NaTto be “NaT” (GH5708) - Fixed string-representation for Timestamp to show nanoseconds if present (GH5912)
pd.matchnot returning passed sentinelPanel.to_frame()no longer fails whenmajor_axisis aMultiIndex(GH5402).- Bug in
pd.read_msgpackwith inferring aDateTimeIndexfrequency incorrectly (GH5947) - Fixed
to_datetimefor array with both Tz-aware datetimes andNaT‘s (GH5961) - Bug in rolling skew/kurtosis when passed a Series with bad data (GH5749)
- Bug in scipy
interpolatemethods with a datetime index (GH5975) - Bug in NaT comparison if a mixed datetime/np.datetime64 with NaT were passed (GH5968)
- Fixed bug with
pd.concatlosing dtype information if all inputs are empty (GH5742) - Recent changes in IPython cause warnings to be emitted when using previous versions of pandas in QTConsole, now fixed. If you’re using an older version and need to suppress the warnings, see (GH5922).
- Bug in merging
timedeltadtypes (GH5695) - Bug in plotting.scatter_matrix function. Wrong alignment among diagonal and off-diagonal plots, see (GH5497).
- Regression in Series with a multi-index via ix (GH6018)
- Bug in Series.xs with a multi-index (GH6018)
- Bug in Series construction of mixed type with datelike and an integer (which should result in object type and not automatic conversion) (GH6028)
- Possible segfault when chained indexing with an object array under numpy 1.7.1 (GH6026, GH6056)
- Bug in setting using fancy indexing a single element with a non-scalar (e.g. a list), (GH6043)
to_sqldid not respectif_exists(GH4110 GH4304)- Regression in
.get(None)indexing from 0.12 (GH5652) - Subtle
ilocindexing bug, surfaced in (GH6059) - Bug with insert of strings into DatetimeIndex (GH5818)
- Fixed unicode bug in to_html/HTML repr (GH6098)
- Fixed missing arg validation in get_options_data (GH6105)
- Bug in assignment with duplicate columns in a frame where the locations are a slice (e.g. next to each other) (GH6120)
- Bug in propogating _ref_locs during construction of a DataFrame with dups index/columns (GH6121)
- Bug in
DataFrame.applywhen using mixed datelike reductions (GH6125) - Bug in
DataFrame.appendwhen appending a row with different columns (GH6129) - Bug in DataFrame construction with recarray and non-ns datetime dtype (GH6140)
- Bug in
.locsetitem indexing with a dataframe on rhs, multiple item setting, and a datetimelike (GH6152) - Fixed a bug in
query/evalduring lexicographic string comparisons (GH6155). - Fixed a bug in
querywhere the index of a single-elementSerieswas being thrown away (GH6148). - Bug in
HDFStoreon appending a dataframe with multi-indexed columns to an existing table (GH6167) - Consistency with dtypes in setting an empty DataFrame (GH6171)
- Bug in selecting on a multi-index
HDFStoreeven in the presence of under specified column spec (GH6169) - Bug in
nanops.varwithddof=1and 1 elements would sometimes returninfrather thannanon some platforms (GH6136) - Bug in Series and DataFrame bar plots ignoring the
use_indexkeyword (GH6209) - Bug in groupby with mixed str/int under python3 fixed;
argsortwas failing (GH6212)
pandas 0.13.0¶
Release date: January 3, 2014
New Features¶
plot(kind='kde')now accepts the optional parametersbw_methodandind, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set the bandwidth, and to gkde.evaluate() to specify the indicies at which it is evaluated, respectively. See scipy docs. (GH4298)- Added
isinmethod to DataFrame (GH4211) df.to_clipboard()learned a newexcelkeyword that let’s you paste df data directly into excel (enabled by default). (GH5070).- Clipboard functionality now works with PySide (GH4282)
- New
extractstring method returns regex matches more conveniently (GH4685) - Auto-detect field widths in read_fwf when unspecified (GH4488)
to_csv()now outputs datetime objects according to a specified format string via thedate_formatkeyword (GH4313)- Added
LastWeekOfMonthDateOffset (GH4637) - Added
cumcountgroupby method (GH4646) - Added
FY5253, andFY5253QuarterDateOffsets (GH4511) - Added
mode()method toSeriesandDataFrameto get the statistical mode(s) of a column/series. (GH5367)
Experimental Features¶
- The new
eval()function implements expression evaluation usingnumexprbehind the scenes. This results in large speedups for complicated expressions involving large DataFrames/Series. DataFramehas a neweval()that evaluates an expression in the context of theDataFrame; allows inline expression assignment- A
query()method has been added that allows you to select elements of aDataFrameusing a natural query syntax nearly identical to Python syntax. pd.evaland friends now evaluate operations involvingdatetime64objects in Python space becausenumexprcannot handleNaTvalues (GH4897).- Add msgpack support via
pd.read_msgpack()andpd.to_msgpack()/df.to_msgpack()for serialization of arbitrary pandas (and python objects) in a lightweight portable binary format (GH686, GH5506) - Added PySide support for the qtpandas DataFrameModel and DataFrameWidget.
- Added
pandas.io.gbqfor reading from (and writing to) Google BigQuery into a DataFrame. (GH4140)
Improvements to existing features¶
read_htmlnow raises aURLErrorinstead of catching and raising aValueError(GH4303, GH4305)read_excelnow supports an integer in itssheetnameargument giving the index of the sheet to read in (GH4301).get_dummiesworks with NaN (GH4446)- Added a test for
read_clipboard()andto_clipboard()(GH4282) - Added bins argument to
value_counts(GH3945), also sort and ascending, now available in Series method as well as top-level function. - Text parser now treats anything that reads like inf (“inf”, “Inf”, “-Inf”,
“iNf”, etc.) to infinity. (GH4220, GH4219), affecting
read_table,read_csv, etc. - Added a more informative error message when plot arguments contain overlapping color and style arguments (GH4402)
- Significant table writing performance improvements in
HDFStore - JSON date serialization now performed in low-level C code.
- JSON support for encoding datetime.time
- Expanded JSON docs, more info about orient options and the use of the numpy param when decoding.
- Add
drop_levelargument to xs (GH4180) - Can now resample a DataFrame with ohlc (GH2320)
Index.copy()andMultiIndex.copy()now accept keyword arguments to change attributes (i.e.,names,levels,labels) (GH4039)- Add
renameandset_namesmethods toIndexas well asset_names,set_levels,set_labelstoMultiIndex. (GH4039) with improved validation for all (GH4039, GH4794) - A Series of dtype
timedelta64[ns]can now be divided/multiplied by an integer series (GH4521) - A Series of dtype
timedelta64[ns]can now be divided by anothertimedelta64[ns]object to yield afloat64dtyped Series. This is frequency conversion; astyping is also supported. - Timedelta64 support
fillna/ffill/bfillwith an integer interpreted as seconds, or atimedelta(GH3371) - Box numeric ops on
timedeltaSeries (GH4984) - Datetime64 support
ffill/bfill - Performance improvements with
__getitem__onDataFrameswith when the key is a column - Support for using a
DatetimeIndex/PeriodsIndexdirectly in a datelike calculation e.g. s-s.index (GH4629) - Better/cleaned up exceptions in core/common, io/excel and core/format (GH4721, GH3954), as well as cleaned up test cases in tests/test_frame, tests/test_multilevel (GH4732).
- Performance improvement of timeseries plotting with PeriodIndex and added test to vbench (GH4705 and GH4722)
- Add
axisandlevelkeywords towhere, so that theotherargument can now be an alignable pandas object. to_datetimewith a format of ‘%Y%m%d’ now parses much faster- It’s now easier to hook new Excel writers into pandas (just subclass
ExcelWriterand register your engine). You can specify anengineinto_excelor inExcelWriter. You can also specify which writers you want to use by default with config optionsio.excel.xlsx.writerandio.excel.xls.writer. (GH4745, GH4750) Panel.to_excel()now accepts keyword arguments that will be passed to itsDataFrame‘sto_excel()methods. (GH4750)- Added XlsxWriter as an optional
ExcelWriterengine. This is about 5x faster than the default openpyxl xlsx writer and is equivalent in speed to the xlwt xls writer module. (GH4542) - allow DataFrame constructor to accept more list-like objects, e.g. list of
collections.Sequenceandarray.Arrayobjects (GH3783, GH4297, GH4851), thanks @lgautier - DataFrame constructor now accepts a numpy masked record array (GH3478), thanks @jnothman
__getitem__withtuplekey (e.g.,[:, 2]) onSerieswithoutMultiIndexraisesValueError(GH4759, GH4837)read_jsonnow raises a (more informative)ValueErrorwhen the dict contains a bad key andorient='split'(GH4730, GH4838)read_statanow accepts Stata 13 format (GH4291)ExcelWriterandExcelFilecan be used as contextmanagers. (GH3441, GH4933)pandasis now tested with two different versions ofstatsmodels(0.4.3 and 0.5.0) (GH4981).- Better string representations of
MultiIndex(including ability to roundtrip viarepr). (GH3347, GH4935) - Both ExcelFile and read_excel to accept an xlrd.Book for the io (formerly path_or_buf) argument; this requires engine to be set. (GH4961).
concatnow gives a more informative error message when passed objects that cannot be concatenated (GH4608).- Add
halflifeoption to exponentially weighted moving functions (PR GH4998) to_dictnow takesrecordsas a possible outtype. Returns an array of column-keyed dictionaries. (GH4936)tz_localizecan infer a fall daylight savings transition based on the structure of unlocalized data (GH4230)- DatetimeIndex is now in the API documentation
- Improve support for converting R datasets to pandas objects (more informative index for timeseries and numeric, support for factors, dist, and high-dimensional arrays).
read_html()now supports theparse_dates,tupleize_colsandthousandsparameters (GH4770).json_normalize()is a new method to allow you to create a flat table from semi-structured JSON data. See the docs (GH1067)DataFrame.from_records()will now accept generators (GH4910)DataFrame.interpolate()andSeries.interpolate()have been expanded to include interpolation methods from scipy. (GH4434, GH1892)Seriesnow supports ato_framemethod to convert it to a single-column DataFrame (GH5164)- DatetimeIndex (and date_range) can now be constructed in a left- or
right-open fashion using the
closedparameter (GH4579) - Python csv parser now supports usecols (GH4335)
- Added support for Google Analytics v3 API segment IDs that also supports v2 IDs. (GH5271)
NDFrame.drop()now accepts names as well as integers for the axis argument. (GH5354)- Added short docstrings to a few methods that were missing them + fixed the docstrings for Panel flex methods. (GH5336)
NDFrame.drop(),NDFrame.dropna(), and.drop_duplicates()all acceptinplaceas a keyword argument; however, this only means that the wrapper is updated inplace, a copy is still made internally. (GH1960, GH5247, GH5628, and related GH2325 [still not closed])- Fixed bug in tools.plotting.andrews_curvres so that lines are drawn grouped by color as expected.
read_excel()now tries to convert integral floats (like1.0) to int by default. (GH5394)- Excel writers now have a default option
merge_cellsinto_excel()to merge cells in MultiIndex and Hierarchical Rows. Note: using this option it is no longer possible to round trip Excel files with merged MultiIndex and Hierarchical Rows. Set themerge_cellstoFalseto restore the previous behaviour. (GH5254) - The FRED DataReader now accepts multiple series (:issue`3413`)
- StataWriter adjusts variable names to Stata’s limitations (GH5709)
API Changes¶
DataFrame.reindex()and forward/backward filling now raises ValueError if either index is not monotonic (GH4483, GH4484).pandasnow is Python 2/3 compatible without the need for 2to3 thanks to @jtratner. As a result, pandas now uses iterators more extensively. This also led to the introduction of substantive parts of the Benjamin Peterson’ssixlibrary into compat. (GH4384, GH4375, GH4372)pandas.util.compatandpandas.util.py3compathave been merged intopandas.compat.pandas.compatnow includes many functions allowing 2/3 compatibility. It contains both list and iterator versions of range, filter, map and zip, plus other necessary elements for Python 3 compatibility.lmap,lzip,lrangeandlfilterall produce lists instead of iterators, for compatibility withnumpy, subscripting andpandasconstructors.(GH4384, GH4375, GH4372)- deprecated
iterkv, which will be removed in a future release (was just an alias of iteritems used to get around2to3‘s changes). (GH4384, GH4375, GH4372) Series.getwith negative indexers now returns the same as[](GH4390)- allow
ix/locfor Series/DataFrame/Panel to set on any axis even when the single-key is not currently contained in the index for that axis (GH2578, GH5226, GH5632, GH5720, GH5744, GH5756) - Default export for
to_clipboardis now csv with a sep of t for compat (GH3368) atnow will enlarge the object inplace (and return the same) (GH2578)DataFrame.plotwill scatter plot x versus y by passingkind='scatter'(GH2215)HDFStoreappend_to_multipleautomatically synchronizes writing rows to multiple tables and adds adropnakwarg (GH4698)- handle a passed
Seriesin table format (GH4330) - added an
is_openproperty to indicate if the underlying file handle is_open; a closed store will now report ‘CLOSED’ when viewing the store (rather than raising an error) (GH4409) - a close of a
HDFStorenow will close that instance of theHDFStorebut will only close the actual file if the ref count (byPyTables) w.r.t. all of the open handles are 0. Essentially you have a local instance ofHDFStorereferenced by a variable. Once you close it, it will report closed. Other references (to the same file) will continue to operate until they themselves are closed. Performing an action on a closed file will raiseClosedFileError - removed the
_quietattribute, replace by aDuplicateWarningif retrieving duplicate rows from a table (GH4367) - removed the
warnargument fromopen. Instead aPossibleDataLossErrorexception will be raised if you try to usemode='w'with an OPEN file handle (GH4367) - allow a passed locations array or mask as a
wherecondition (GH4467) - add the keyword
dropna=Truetoappendto change whether ALL nan rows are not written to the store (default isTrue, ALL nan rows are NOT written), also settable via the optionio.hdf.dropna_table(GH4625) - the
formatkeyword now replaces thetablekeyword; allowed values arefixed(f)|table(t)theStorerformat has been renamed toFixed - a column multi-index will be recreated properly (GH4710); raise on trying to use a multi-index with data_columns on the same axis
select_as_coordinateswill now return anInt64Indexof the resultant selection set- support
timedelta64[ns]as a serialization type (GH3577) - store datetime.date objects as ordinals rather then timetuples to avoid timezone issues (GH2852), thanks @tavistmorph and @numpand
numexpr2.2.2 fixes incompatibility in PyTables 2.4 (GH4908)flushnow accepts anfsyncparameter, which defaults toFalse(GH5364)unicodeindices not supported ontableformats (GH5386)- pass thru store creation arguments; can be used to support in-memory stores
JSONIndexandMultiIndexchanges (GH4039):- Setting
levelsandlabelsdirectly onMultiIndexis now deprecated. Instead, you can use theset_levels()andset_labels()methods. levels,labelsandnamesproperties no longer return lists, but instead return containers that do not allow setting of items (‘mostly immutable’)levels,labelsandnamesare validated upon setting and are either copied or shallow-copied.- inplace setting of
levelsorlabelsnow correctly invalidates the cached properties. (GH5238). __deepcopy__now returns a shallow copy (currently: a view) of the data - allowing metadata changes.MultiIndex.astype()now only allowsnp.object_-like dtypes and now returns aMultiIndexrather than anIndex. (GH4039)- Added
is_method toIndexthat allows fast equality comparison of views (similar tonp.may_share_memorybut no false positives, and changes onlevelsandlabelssetting onMultiIndex). (GH4859 , GH4909) - Aliased
__iadd__to__add__. (GH4996) - Added
is_method toIndexthat allows fast equality comparison of views (similar tonp.may_share_memorybut no false positives, and changes onlevelsandlabelssetting onMultiIndex). (GH4859, GH4909)
- Setting
- Infer and downcast dtype if
downcast='infer'is passed tofillna/ffill/bfill(GH4604) __nonzero__for all NDFrame objects, will now raise aValueError, this reverts back to (GH1073, GH4633) behavior. Add.bool()method toNDFrameobjects to facilitate evaluating of single-element boolean SeriesDataFrame.update()no longer raises aDataConflictError, it now will raise aValueErrorinstead (if necessary) (GH4732)Series.isin()andDataFrame.isin()now raise aTypeErrorwhen passed a string (GH4763). Pass alistof one element (containing the string) instead.- Remove undocumented/unused
kindkeyword argument fromread_excel, andExcelFile. (GH4713, GH4712) - The
methodargument ofNDFrame.replace()is valid again, so that a a list can be passed toto_replace(GH4743). - provide automatic dtype conversions on _reduce operations (GH3371)
- exclude non-numerics if mixed types with datelike in _reduce operations (GH3371)
- default for
tupleize_colsis nowFalsefor bothto_csvandread_csv. Fair warning in 0.12 (GH3604) - moved timedeltas support to pandas.tseries.timedeltas.py; add timedeltas
string parsing, add top-level
to_timedeltafunction NDFramenow is compatible with Python’s toplevelabs()function (GH4821).- raise a
TypeErroron invalid comparison ops on Series/DataFrame (e.g. integer/datetime) (GH4968) - Added a new index type,
Float64Index. This will be automatically created when passing floating values in index creation. This enables a pure label-based slicing paradigm that makes[],ix,locfor scalar indexing and slicing work exactly the same. Indexing on other index types are preserved (and positional fallback for[],ix), with the exception, that floating point slicing on indexes on nonFloat64Indexwill raise aTypeError, e.g.Series(range(5))[3.5:4.5](GH263,:issue:5375) - Make Categorical repr nicer (GH4368)
- Remove deprecated
Factor(GH3650) - Remove deprecated
set_printoptions/reset_printoptions(:issue:3046) - Remove deprecated
_verbose_info(GH3215) - Begin removing methods that don’t make sense on
GroupByobjects (GH4887). - Remove deprecated
read_clipboard/to_clipboard/ExcelFile/ExcelWriterfrompandas.io.parsers(GH3717) - All non-Index NDFrames (
Series,DataFrame,Panel,Panel4D,SparsePanel, etc.), now support the entire set of arithmetic operators and arithmetic flex methods (add, sub, mul, etc.).SparsePaneldoes not supportpowormodwith non-scalars. (GH3765) - Arithmetic func factories are now passed real names (suitable for using with super) (GH5240)
- Provide numpy compatibility with 1.7 for a calling convention like
np.prod(pandas_object)as numpy call with additional keyword args (GH4435) - Provide __dir__ method (and local context) for tab completion / remove ipython completers code (GH4501)
- Support non-unique axes in a Panel via indexing operations (GH4960)
.truncatewill raise aValueErrorif invalid before and afters dates are given (GH5242)Timestampnow supportsnow/today/utcnowclass methods (GH5339)- default for display.max_seq_len is now 100 rather then None. This activates truncated display (”...”) of long sequences in various places. (GH3391)
- All division with
NDFrame- likes is now truedivision, regardless of the future import. You can use//andfloordivto do integer division.
In [3]: arr = np.array([1, 2, 3, 4])
In [4]: arr2 = np.array([5, 3, 2, 1])
In [5]: arr / arr2
Out[5]: array([0, 0, 1, 4])
In [6]: pd.Series(arr) / pd.Series(arr2) # no future import required
Out[6]:
0 0.200000
1 0.666667
2 1.500000
3 4.000000
dtype: float64
- raise/warn
SettingWithCopyError/Warningexception/warning when setting of a copy thru chained assignment is detected, settable via optionmode.chained_assignment - test the list of
NAvalues in the csv parser. addN/A,#NAas independent default na values (GH5521) - The refactoring involving``Series`` deriving from
NDFramebreaksrpy2<=2.3.8. an Issue has been opened against rpy2 and a workaround is detailed in GH5698. Thanks @JanSchulz. Series.argminandSeries.argmaxare now aliased toSeries.idxminandSeries.idxmax. These return the index of the min or max element respectively. Prior to 0.13.0 these would return the position of the min / max element (GH6214)
Internal Refactoring¶
In 0.13.0 there is a major refactor primarily to subclass Series from
NDFrame, which is the base class currently for DataFrame and Panel,
to unify methods and behaviors. Series formerly subclassed directly from
ndarray. (GH4080, GH3862, GH816)
See Internal Refactoring
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
- added
_setup_axesto created generic NDFrame structures- moved methods
from_axes,_wrap_array,axes,ix,loc,iloc,shape,empty,swapaxes,transpose,pop__iter__,keys,__contains__,__len__,__neg__,__invert__convert_objects,as_blocks,as_matrix,values__getstate__,__setstate__(compat remains in frame/panel)__getattr__,__setattr___indexed_same,reindex_like,align,where,maskfillna,replace(Seriesreplace is now consistent withDataFrame)filter(also added axis argument to selectively filter on a different axis)reindex,reindex_axis,taketruncate(moved to become part ofNDFrame)isnull/notnullnow available onNDFrameobjects
- These are API changes which make
Panelmore consistent withDataFrame
swapaxeson aPanelwith the same axes specified now return a copy- support attribute access for setting
filtersupports same API as originalDataFramefilterfillnarefactored tocore/generic.py, while > 3ndim isNotImplemented
- Series now inherits from
NDFramerather than directly fromndarray. There are several minor changes that affect the API.
- numpy functions that do not support the array interface will now return
ndarraysrather than series, e.g.np.diff,np.ones_like,np.whereSeries(0.5)would previously return the scalar0.5, this is no longer supportedTimeSeriesis now an alias forSeries. the propertyis_time_seriescan be used to distinguish (if desired)
- Refactor of Sparse objects to use BlockManager
- Created a new block type in internals,
SparseBlock, which can hold multi-dtypes and is non-consolidatable.SparseSeriesandSparseDataFramenow inherit more methods from there hierarchy (Series/DataFrame), and no longer inherit fromSparseArray(which instead is the object of theSparseBlock)- Sparse suite now supports integration with non-sparse data. Non-float sparse data is supportable (partially implemented)
- Operations on sparse structures within DataFrames should preserve sparseness, merging type operations will convert to dense (and back to sparse), so might be somewhat inefficient
- enable setitem on
SparseSeriesfor boolean/integer/slicesSparsePanelsimplementation is unchanged (e.g. not using BlockManager, needs work)
- added
ftypesmethod to Series/DataFame, similar todtypes, but indicates if the underlying is sparse/dense (as well as the dtype) - All
NDFrameobjects now have a_prop_attributes, which can be used to indicate various values to propagate to a new object from an existing (e.g. name inSerieswill follow more automatically now) - Internal type checking is now done via a suite of generated classes,
allowing
isinstance(value, klass)without having to directly import the klass, courtesy of @jtratner - Bug in Series update where the parent frame is not updating its cache based on changes (GH4080, GH5216) or types (GH3217), fillna (GH3386)
- Indexing with dtype conversions fixed (GH4463, GH4204)
- Refactor
Series.reindexto core/generic.py (GH4604, GH4618), allowmethod=in reindexing on a Series to work Series.copyno longer accepts theorderparameter and is now consistent withNDFramecopy- Refactor
renamemethods to core/generic.py; fixesSeries.renamefor (GH4605), and addsrenamewith the same signature forPanel - Series (for index) / Panel (for items) now as attribute access to its elements (GH1903)
- Refactor
clipmethods to core/generic.py (GH4798) - Refactor of
_get_numeric_data/_get_bool_datato core/generic.py, allowing Series/Panel functionality - Refactor of Series arithmetic with time-like objects (datetime/timedelta/time etc.) into a separate, cleaned up wrapper class. (GH4613)
- Complex compat for
Serieswithndarray. (GH4819) - Removed unnecessary
rwpropertyfrom codebase in favor of builtin property. (GH4843) - Refactor object level numeric methods (mean/sum/min/max...) from object
level modules to
core/generic.py(GH4435). - Refactor cum objects to core/generic.py (GH4435), note that these have a more numpy-like function signature.
read_html()now usesTextParserto parse HTML data from bs4/lxml (GH4770).- Removed the
keep_internalkeyword parameter inpandas/core/groupby.pybecause it wasn’t being used (GH5102). - Base
DateOffsetsare no longer all instantiated on importing pandas, instead they are generated and cached on the fly. The internal representation and handling of DateOffsets has also been clarified. (GH5189, related GH5004) MultiIndexconstructor now validates that passed levels and labels are compatible. (GH5213, GH5214)- Unity
dropnafor Series/DataFrame signature (GH5250), tests from GH5234, courtesy of @rockg - Rewrite assert_almost_equal() in cython for performance (GH4398)
- Added an internal
_update_inplacemethod to facilitate updatingNDFramewrappers on inplace ops (only is for convenience of caller, doesn’t actually prevent copies). (GH5247)
Bug Fixes¶
HDFStore- raising an invalid
TypeErrorrather thanValueErrorwhen appending with a different block ordering (GH4096) read_hdfwas not respecting as passedmode(GH4504)- appending a 0-len table will work correctly (GH4273)
to_hdfwas raising when passing both argumentsappendandtable(GH4584)- reading from a store with duplicate columns across dtypes would raise (GH4767)
- Fixed a bug where
ValueErrorwasn’t correctly raised when column names weren’t strings (GH4956) - A zero length series written in Fixed format not deserializing properly. (GH4708)
- Fixed decoding perf issue on pyt3 (GH5441)
- Validate levels in a multi-index before storing (GH5527)
- Correctly handle
data_columnswith a Panel (GH5717)
- raising an invalid
- Fixed bug in tslib.tz_convert(vals, tz1, tz2): it could raise IndexError exception while trying to access trans[pos + 1] (GH4496)
- The
byargument now works correctly with thelayoutargument (GH4102, GH4014) in*.histplotting methods - Fixed bug in
PeriodIndex.mapwhere usingstrwould return the str representation of the index (GH4136) - Fixed test failure
test_time_series_plot_color_with_empty_kwargswhen using custom matplotlib default colors (GH4345) - Fix running of stata IO tests. Now uses temporary files to write (GH4353)
- Fixed an issue where
DataFrame.sumwas slower thanDataFrame.meanfor integer valued frames (GH4365) read_htmltests now work with Python 2.6 (GH4351)- Fixed bug where
networktesting was throwingNameErrorbecause a local variable was undefined (GH4381) - In
to_json, raise if a passedorientwould cause loss of data because of a duplicate index (GH4359) - In
to_json, fix date handling so milliseconds are the default timestamp as the docstring says (GH4362). as_indexis no longer ignored when doing groupby apply (GH4648, GH3417)- JSON NaT handling fixed, NaTs are now serialized to null (GH4498)
- Fixed JSON handling of escapable characters in JSON object keys (GH4593)
- Fixed passing
keep_default_na=Falsewhenna_values=None(GH4318) - Fixed bug with
valuesraising an error on a DataFrame with duplicate columns and mixed dtypes, surfaced in (GH4377) - Fixed bug with duplicate columns and type conversion in
read_jsonwhenorient='split'(GH4377) - Fixed JSON bug where locales with decimal separators other than ‘.’ threw exceptions when encoding / decoding certain values. (GH4918)
- Fix
.iatindexing with aPeriodIndex(GH4390) - Fixed an issue where
PeriodIndexjoining with self was returning a new instance rather than the same instance (GH4379); also adds a test for this for the other index types - Fixed a bug with all the dtypes being converted to object when using the CSV cparser with the usecols parameter (GH3192)
- Fix an issue in merging blocks where the resulting DataFrame had partially set _ref_locs (GH4403)
- Fixed an issue where hist subplots were being overwritten when they were called using the top level matplotlib API (GH4408)
- Fixed a bug where calling
Series.astype(str)would truncate the string (GH4405, GH4437) - Fixed a py3 compat issue where bytes were being repr’d as tuples (GH4455)
- Fixed Panel attribute naming conflict if item is named ‘a’ (GH3440)
- Fixed an issue where duplicate indexes were raising when plotting (GH4486)
- Fixed an issue where cumsum and cumprod didn’t work with bool dtypes (GH4170, GH4440)
- Fixed Panel slicing issued in
xsthat was returning an incorrect dimmed object (GH4016) - Fix resampling bug where custom reduce function not used if only one group (GH3849, GH4494)
- Fixed Panel assignment with a transposed frame (GH3830)
- Raise on set indexing with a Panel and a Panel as a value which needs alignment (GH3777)
- frozenset objects now raise in the
Seriesconstructor (GH4482, GH4480) - Fixed issue with sorting a duplicate multi-index that has multiple dtypes (GH4516)
- Fixed bug in
DataFrame.set_valueswhich was causing name attributes to be lost when expanding the index. (GH3742, GH4039) - Fixed issue where individual
names,levelsandlabelscould be set onMultiIndexwithout validation (GH3714, GH4039) - Fixed (GH3334) in pivot_table. Margins did not compute if values is the index.
- Fix bug in having a rhs of
np.timedelta64ornp.offsets.DateOffsetwhen operating with datetimes (GH4532) - Fix arithmetic with series/datetimeindex and
np.timedelta64not working the same (GH4134) and buggy timedelta in numpy 1.6 (GH4135) - Fix bug in
pd.read_clipboardon windows with PY3 (GH4561); not decoding properly tslib.get_period_field()andtslib.get_period_field_arr()now raise if code argument out of range (GH4519, GH4520)- Fix boolean indexing on an empty series loses index names (GH4235), infer_dtype works with empty arrays.
- Fix reindexing with multiple axes; if an axes match was not replacing the current axes, leading to a possible lazay frequency inference issue (GH3317)
- Fixed issue where
DataFrame.applywas reraising exceptions incorrectly (causing the original stack trace to be truncated). - Fix selection with
ix/locand non_unique selectors (GH4619) - Fix assignment with iloc/loc involving a dtype change in an existing column (GH4312, GH5702) have internal setitem_with_indexer in core/indexing to use Block.setitem
- Fixed bug where thousands operator was not handled correctly for floating point numbers in csv_import (GH4322)
- Fix an issue with CacheableOffset not properly being used by many DateOffset; this prevented the DateOffset from being cached (GH4609)
- Fix boolean comparison with a DataFrame on the lhs, and a list/tuple on the rhs (GH4576)
- Fix error/dtype conversion with setitem of
NoneonSeries/DataFrame(GH4667) - Fix decoding based on a passed in non-default encoding in
pd.read_stata(GH4626) - Fix
DataFrame.from_recordswith a plain-vanillandarray. (GH4727) - Fix some inconsistencies with
Index.renameandMultiIndex.rename, etc. (GH4718, GH4628) - Bug in using
iloc/locwith a cross-sectional and duplicate indicies (GH4726) - Bug with using
QUOTE_NONEwithto_csvcausingException. (GH4328) - Bug with Series indexing not raising an error when the right-hand-side has an incorrect length (GH2702)
- Bug in multi-indexing with a partial string selection as one part of a MultIndex (GH4758)
- Bug with reindexing on the index with a non-unique index will now raise
ValueError(GH4746) - Bug in setting with
loc/ixa single indexer with a multi-index axis and a numpy array, related to (GH3777) - Bug in concatenation with duplicate columns across dtypes not merging with axis=0 (GH4771, GH4975)
- Bug in
ilocwith a slice index failing (GH4771) - Incorrect error message with no colspecs or width in
read_fwf. (GH4774) - Fix bugs in indexing in a Series with a duplicate index (GH4548, GH4550)
- Fixed bug with reading compressed files with
read_fwfin Python 3. (GH3963) - Fixed an issue with a duplicate index and assignment with a dtype change (GH4686)
- Fixed bug with reading compressed files in as
bytesrather thanstrin Python 3. Simplifies bytes-producing file-handling in Python 3 (GH3963, GH4785). - Fixed an issue related to ticklocs/ticklabels with log scale bar plots across different versions of matplotlib (GH4789)
- Suppressed DeprecationWarning associated with internal calls issued by repr() (GH4391)
- Fixed an issue with a duplicate index and duplicate selector with
.loc(GH4825) - Fixed an issue with
DataFrame.sort_indexwhere, when sorting by a single column and passing a list forascending, the argument forascendingwas being interpreted asTrue(GH4839, GH4846) - Fixed
Panel.tshiftnot working. Added freq support toPanel.shift(GH4853) - Fix an issue in TextFileReader w/ Python engine (i.e. PythonParser) with thousands != ”,” (GH4596)
- Bug in getitem with a duplicate index when using where (GH4879)
- Fix Type inference code coerces float column into datetime (GH4601)
- Fixed
_ensure_numericdoes not check for complex numbers (GH4902) - Fixed a bug in
Series.histwhere two figures were being created when thebyargument was passed (GH4112, GH4113). - Fixed a bug in
convert_objectsfor > 2 ndims (GH4937) - Fixed a bug in DataFrame/Panel cache insertion and subsequent indexing (GH4939, GH5424)
- Fixed string methods for
FrozenNDArrayandFrozenList(GH4929) - Fixed a bug with setting invalid or out-of-range values in indexing enlargement scenarios (GH4940)
- Tests for fillna on empty Series (GH4346), thanks @immerrr
- Fixed
copy()to shallow copy axes/indices as well and thereby keep separate metadata. (GH4202, GH4830) - Fixed skiprows option in Python parser for read_csv (GH4382)
- Fixed bug preventing
cutfrom working withnp.inflevels without explicitly passing labels (GH3415) - Fixed wrong check for overlapping in
DatetimeIndex.union(GH4564) - Fixed conflict between thousands separator and date parser in csv_parser (GH4678)
- Fix appending when dtypes are not the same (error showing mixing float/np.datetime64) (GH4993)
- Fix repr for DateOffset. No longer show duplicate entries in kwds. Removed unused offset fields. (GH4638)
- Fixed wrong index name during read_csv if using usecols. Applies to c parser only. (GH4201)
Timestampobjects can now appear in the left hand side of a comparison operation with aSeriesorDataFrameobject (GH4982).- Fix a bug when indexing with
np.nanviailoc/loc(GH5016) - Fixed a bug where low memory c parser could create different types in different chunks of the same file. Now coerces to numerical type or raises warning. (GH3866)
- Fix a bug where reshaping a
Seriesto its own shape raisedTypeError(GH4554) and other reshaping issues. - Bug in setting with
ix/locand a mixed int/string index (GH4544) - Make sure series-series boolean comparisons are label based (GH4947)
- Bug in multi-level indexing with a Timestamp partial indexer (GH4294)
- Tests/fix for multi-index construction of an all-nan frame (GH4078)
- Fixed a bug where
read_html()wasn’t correctly inferring values of tables with commas (GH5029) - Fixed a bug where
read_html()wasn’t providing a stable ordering of returned tables (GH4770, GH5029). - Fixed a bug where
read_html()was incorrectly parsing when passedindex_col=0(GH5066). - Fixed a bug where
read_html()was incorrectly inferring the type of headers (GH5048). - Fixed a bug where
DatetimeIndexjoins withPeriodIndexcaused a stack overflow (GH3899). - Fixed a bug where
groupbyobjects didn’t allow plots (GH5102). - Fixed a bug where
groupbyobjects weren’t tab-completing column names (GH5102). - Fixed a bug where
groupby.plot()and friends were duplicating figures multiple times (GH5102). - Provide automatic conversion of
objectdtypes on fillna, related (GH5103) - Fixed a bug where default options were being overwritten in the option parser cleaning (GH5121).
- Treat a list/ndarray identically for
ilocindexing with list-like (GH5006) - Fix
MultiIndex.get_level_values()with missing values (GH5074) - Fix bound checking for Timestamp() with datetime64 input (GH4065)
- Fix a bug where
TestReadHtmlwasn’t calling the correctread_html()function (GH5150). - Fix a bug with
NDFrame.replace()which made replacement appear as though it was (incorrectly) using regular expressions (GH5143). - Fix better error message for to_datetime (GH4928)
- Made sure different locales are tested on travis-ci (GH4918). Also adds a couple of utilities for getting locales and setting locales with a context manager.
- Fixed segfault on
isnull(MultiIndex)(now raises an error instead) (GH5123, GH5125) - Allow duplicate indices when performing operations that align (GH5185, GH5639)
- Compound dtypes in a constructor raise
NotImplementedError(GH5191) - Bug in comparing duplicate frames (GH4421) related
- Bug in describe on duplicate frames
- Bug in
to_datetimewith a format andcoerce=Truenot raising (GH5195) - Bug in
locsetting with multiple indexers and a rhs of a Series that needs broadcasting (GH5206) - Fixed bug where inplace setting of levels or labels on
MultiIndexwould not clear cachedvaluesproperty and therefore return wrongvalues. (GH5215) - Fixed bug where filtering a grouped DataFrame or Series did not maintain the original ordering (GH4621).
- Fixed
Periodwith a business date freq to always roll-forward if on a non-business date. (GH5203) - Fixed bug in Excel writers where frames with duplicate column names weren’t written correctly. (GH5235)
- Fixed issue with
dropand a non-unique index on Series (GH5248) - Fixed seg fault in C parser caused by passing more names than columns in the file. (GH5156)
- Fix
Series.isinwith date/time-like dtypes (GH5021) - C and Python Parser can now handle the more common multi-index column format which doesn’t have a row for index names (GH4702)
- Bug when trying to use an out-of-bounds date as an object dtype (GH5312)
- Bug when trying to display an embedded PandasObject (GH5324)
- Allows operating of Timestamps to return a datetime if the result is out-of-bounds related (GH5312)
- Fix return value/type signature of
initObjToJSON()to be compatible with numpy’simport_array()(GH5334, GH5326) - Bug when renaming then set_index on a DataFrame (GH5344)
- Test suite no longer leaves around temporary files when testing graphics. (GH5347) (thanks for catching this @yarikoptic!)
- Fixed html tests on win32. (GH4580)
- Make sure that
head/tailareilocbased, (GH5370) - Fixed bug for
PeriodIndexstring representation if there are 1 or 2 elements. (GH5372) - The GroupBy methods
transformandfiltercan be used on Series and DataFrames that have repeated (non-unique) indices. (GH4620) - Fix empty series not printing name in repr (GH4651)
- Make tests create temp files in temp directory by default. (GH5419)
pd.to_timedeltaof a scalar returns a scalar (GH5410)pd.to_timedeltaacceptsNaNandNaT, returningNaTinstead of raising (GH5437)- performance improvements in
isnullon larger size pandas objects - Fixed various setitem with 1d ndarray that does not have a matching length to the indexer (GH5508)
- Bug in getitem with a multi-index and
iloc(GH5528) - Bug in delitem on a Series (GH5542)
- Bug fix in apply when using custom function and objects are not mutated (GH5545)
- Bug in selecting from a non-unique index with
loc(GH5553) - Bug in groupby returning non-consistent types when user function returns a
None, (GH5592) - Work around regression in numpy 1.7.0 which erroneously raises IndexError from
ndarray.item(GH5666) - Bug in repeated indexing of object with resultant non-unique index (GH5678)
- Bug in fillna with Series and a passed series/dict (GH5703)
- Bug in groupby transform with a datetime-like grouper (GH5712)
- Bug in multi-index selection in PY3 when using certain keys (GH5725)
- Row-wise concat of differing dtypes failing in certain cases (GH5754)
pandas 0.12.0¶
Release date: 2013-07-24
New Features¶
pd.read_html()can now parse HTML strings, files or urls and returns a list ofDataFrames courtesy of @cpcloud. (GH3477, GH3605, GH3606)- Support for reading Amazon S3 files. (GH3504)
- Added module for reading and writing JSON strings/files: pandas.io.json
includes
to_jsonDataFrame/Series method, and aread_jsontop-level reader various issues (GH1226, GH3804, GH3876, GH3867, GH1305) - Added module for reading and writing Stata files: pandas.io.stata (GH1512)
includes
to_stataDataFrame method, and aread_statatop-level reader - Added support for writing in
to_csvand reading inread_csv, multi-index columns. Theheaderoption inread_csvnow accepts a list of the rows from which to read the index. Added the option,tupleize_colsto provide compatibility for the pre 0.12 behavior of writing and reading multi-index columns via a list of tuples. The default in 0.12 is to write lists of tuples and not interpret list of tuples as a multi-index column. Note: The default value will change in 0.12 to make the default to write and read multi-index columns in the new format. (GH3571, GH1651, GH3141) - Add iterator to
Series.str(GH3638) pd.set_option()now allows N option, value pairs (GH3667).- Added keyword parameters for different types of scatter_matrix subplots
- A
filtermethod on grouped Series or DataFrames returns a subset of the original (GH3680, GH919) - Access to historical Google Finance data in pandas.io.data (GH3814)
- DataFrame plotting methods can sample column colors from a Matplotlib
colormap via the
colormapkeyword. (GH3860)
Improvements to existing features¶
- Fixed various issues with internal pprinting code, the repr() for various objects including TimeStamp and Index now produces valid python code strings and can be used to recreate the object, (GH3038, GH3379, GH3251, GH3460)
convert_objectsnow accepts acopyparameter (defaults toTrue)HDFStore- will retain index attributes (freq,tz,name) on recreation (GH3499,:issue:4098)
- will warn with a
AttributeConflictWarningif you are attempting to append an index with a different frequency than the existing, or attempting to append an index with a different name than the existing - support datelike columns with a timezone as data_columns (GH2852)
- table writing performance improvements.
- support python3 (via
PyTables 3.0.0) (GH3750)
- Add modulo operator to Series, DataFrame
- Add
datemethod to DatetimeIndex - Add
dropnaargument to pivot_table (:issue: 3820) - Simplified the API and added a describe method to Categorical
meltnow accepts the optional parametersvar_nameandvalue_nameto specify custom column names of the returned DataFrame (GH3649), thanks @hoechenberger. Ifvar_nameis not specified anddataframe.columns.nameis not None, then this will be used as thevar_name(GH4144). Also support for MultiIndex columns.- clipboard functions use pyperclip (no dependencies on Windows, alternative dependencies offered for Linux) (GH3837).
- Plotting functions now raise a
TypeErrorbefore trying to plot anything if the associated objects have have a dtype ofobject(GH1818, GH3572, GH3911, GH3912), but they will try to convert object arrays to numeric arrays if possible so that you can still plot, for example, an object array with floats. This happens before any drawing takes place which eliminates any spurious plots from showing up. - Added Faq section on repr display options, to help users customize their setup.
whereoperations that result in block splitting are much faster (GH3733)- Series and DataFrame hist methods now take a
figsizeargument (GH3834) - DatetimeIndexes no longer try to convert mixed-integer indexes during join operations (GH3877)
- Add
unitkeyword toTimestampandto_datetimeto enable passing of integers or floats that are in an epoch unit ofD, s, ms, us, ns, thanks @mtkini (GH3969) (e.g. unix timestamps or epochs, with fractional seconds allowed) (GH3540) - DataFrame corr method (spearman) is now cythonized.
- Improved
networktest decorator to catchIOError(and thereforeURLErroras well). Addedwith_connectivity_checkdecorator to allow explicitly checking a website as a proxy for seeing if there is network connectivity. Plus, newoptional_argsdecorator factory for decorators. (GH3910, GH3914) read_csvwill now throw a more informative error message when a file contains no columns, e.g., all newline characters- Added
layoutkeyword to DataFrame.hist() for more customizable layout (GH4050) - Timestamp.min and Timestamp.max now represent valid Timestamp instances instead of the default datetime.min and datetime.max (respectively), thanks @SleepingPills
read_htmlnow raises when no tables are found and BeautifulSoup==4.2.0 is detected (GH4214)
API Changes¶
HDFStore- When removing an object,
remove(key)raisesKeyErrorif the key is not a valid store object. - raise a
TypeErroron passingwhereorcolumnsto select with a Storer; these are invalid parameters at this time (GH4189) - can now specify an
encodingoption toappend/putto enable alternate encodings (GH3750) - enable support for
iterator/chunksizewithread_hdf
- When removing an object,
- The repr() for (Multi)Index now obeys display.max_seq_items rather then numpy threshold print options. (GH3426, GH3466)
- Added mangle_dupe_cols option to read_table/csv, allowing users to control legacy behaviour re dupe cols (A, A.1, A.2 vs A, A ) (GH3468) Note: The default value will change in 0.12 to the “no mangle” behaviour, If your code relies on this behaviour, explicitly specify mangle_dupe_cols=True in your calls.
- Do not allow astypes on
datetime64[ns]except toobject, andtimedelta64[ns]toobject/int(GH3425) - The behavior of
datetime64dtypes has changed with respect to certain so-called reduction operations (GH3726). The following operations now raise aTypeErrorwhen performed on aSeriesand return an emptySerieswhen performed on aDataFramesimilar to performing these operations on, for example, aDataFrameofsliceobjects: - sum, prod, mean, std, var, skew, kurt, corr, and cov - Do not allow datetimelike/timedeltalike creation except with valid types
(e.g. cannot pass
datetime64[ms]) (GH3423) - Add
squeezekeyword togroupbyto allow reduction from DataFrame -> Series if groups are unique. Regression from 0.10.1, partial revert on (GH2893) with (GH3596) - Raise on
ilocwhen boolean indexing with a label based indexer mask e.g. a boolean Series, even with integer labels, will raise. Sinceilocis purely positional based, the labels on the Series are not alignable (GH3631) - The
raise_on_erroroption to plotting methods is obviated by GH3572, so it is removed. Plots now always raise when data cannot be plotted or the object being plotted has a dtype ofobject. DataFrame.interpolate()is now deprecated. Please useDataFrame.fillna()andDataFrame.replace()instead (GH3582, GH3675, GH3676).- the
methodandaxisarguments ofDataFrame.replace()are deprecated DataFrame.replace‘sinfer_typesparameter is removed and now performs conversion by default. (GH3907)- Deprecated display.height, display.width is now only a formatting option does not control triggering of summary, similar to < 0.11.0.
- Add the keyword
allow_duplicatestoDataFrame.insertto allow a duplicate column to be inserted ifTrue, default isFalse(same as prior to 0.12) (GH3679) - io API changes
- added
pandas.io.apifor i/o imports - removed
Excelsupport topandas.io.excel - added top-level
pd.read_sqlandto_sqlDataFrame methods - removed
clipboardsupport topandas.io.clipboard - replace top-level and instance methods
saveandloadwith top-levelread_pickleandto_pickleinstance method,saveandloadwill give deprecation warning.
- added
- the
methodandaxisarguments ofDataFrame.replace()are deprecated - set FutureWarning to require data_source, and to replace year/month with expiry date in pandas.io options. This is in preparation to add options data from Google (GH3822)
- the
methodandaxisarguments ofDataFrame.replace()are deprecated - Implement
__nonzero__forNDFrameobjects (GH3691, GH3696) as_matrixwith mixed signed and unsigned dtypes will result in 2 x the lcd of the unsigned as an int, maxing withint64, to avoid precision issues (GH3733)na_valuesin a list provided toread_csv/read_excelwill match string and numeric versions e.g.na_values=['99']will match 99 whether the column ends up being int, float, or string (GH3611)read_htmlnow defaults toNonewhen reading, and falls back onbs4+html5libwhen lxml fails to parse. a list of parsers to try until success is also valid- more consistency in the to_datetime return types (give string/array of string inputs) (GH3888)
- The internal
pandasclass hierarchy has changed (slightly). The previousPandasObjectnow is calledPandasContainerand a newPandasObjecthas become the baseclass forPandasContaineras well asIndex,Categorical,GroupBy,SparseList, andSparseArray(+ their base classes). Currently,PandasObjectprovides string methods (fromStringMixin). (GH4090, GH4092) - New
StringMixinthat, given a__unicode__method, gets python 2 and python 3 compatible string methods (__str__,__bytes__, and__repr__). Plus string safety throughout. Now employed in many places throughout the pandas library. (GH4090, GH4092)
Experimental Features¶
- Added experimental
CustomBusinessDayclass to supportDateOffsetswith custom holiday calendars and custom weekmasks. (GH2301)
Bug Fixes¶
- Fixed an esoteric excel reading bug, xlrd>= 0.9.0 now required for excel support. Should provide python3 support (for reading) which has been lacking. (GH3164)
- Disallow Series constructor called with MultiIndex which caused segfault (GH4187)
- Allow unioning of date ranges sharing a timezone (GH3491)
- Fix to_csv issue when having a large number of rows and
NaTin some columns (GH3437) .locwas not raising when passed an integer list (GH3449)- Unordered time series selection was misbehaving when using label slicing (GH3448)
- Fix sorting in a frame with a list of columns which contains datetime64[ns] dtypes (GH3461)
- DataFrames fetched via FRED now handle ‘.’ as a NaN. (GH3469)
- Fix regression in a DataFrame apply with axis=1, objects were not being converted back to base dtypes correctly (GH3480)
- Fix issue when storing uint dtypes in an HDFStore. (GH3493)
- Non-unique index support clarified (GH3468)
- Addressed handling of dupe columns in df.to_csv new and old (GH3454, GH3457)
- Fix assigning a new index to a duplicate index in a DataFrame would fail (GH3468)
- Fix construction of a DataFrame with a duplicate index
- ref_locs support to allow duplicative indices across dtypes, allows iget support to always find the index (even across dtypes) (GH2194)
- applymap on a DataFrame with a non-unique index now works (removed warning) (GH2786), and fix (GH3230)
- Fix to_csv to handle non-unique columns (GH3495)
- Duplicate indexes with getitem will return items in the correct order (GH3455, GH3457) and handle missing elements like unique indices (GH3561)
- Duplicate indexes with and empty DataFrame.from_records will return a correct frame (GH3562)
- Concat to produce a non-unique columns when duplicates are across dtypes is fixed (GH3602)
- Non-unique indexing with a slice via
locand friends fixed (GH3659) - Allow insert/delete to non-unique columns (GH3679)
- Extend
reindexto correctly deal with non-unique indices (GH3679) DataFrame.itertuples()now works with frames with duplicate column names (GH3873)- Bug in non-unique indexing via
iloc(GH4017); addedtakeableargument toreindexfor location-based taking - Allow non-unique indexing in series via
.ix/.locand__getitem__(GH4246) - Fixed non-unique indexing memory allocation issue with
.ix/.loc(GH4280)
- Fixed bug in groupby with empty series referencing a variable before assignment. (GH3510)
- Allow index name to be used in groupby for non MultiIndex (GH4014)
- Fixed bug in mixed-frame assignment with aligned series (GH3492)
- Fixed bug in selecting month/quarter/year from a series would not select the time element on the last day (GH3546)
- Fixed a couple of MultiIndex rendering bugs in df.to_html() (GH3547, GH3553)
- Properly convert np.datetime64 objects in a Series (GH3416)
- Raise a
TypeErroron invalid datetime/timedelta operations e.g. add datetimes, multiple timedelta x datetime - Fix
.diffon datelike and timedelta operations (GH3100) combine_firstnot returning the same dtype in cases where it can (GH3552)- Fixed bug with
Panel.transposeargument aliases (GH3556) - Fixed platform bug in
PeriodIndex.take(GH3579) - Fixed bud in incorrect conversion of datetime64[ns] in
combine_first(GH3593) - Fixed bug in reset_index with
NaNin a multi-index (GH3586) fillnamethods now raise aTypeErrorwhen thevalueparameter is alistortuple.- Fixed bug where a time-series was being selected in preference to an actual column name in a frame (GH3594)
- Make secondary_y work properly for bar plots (GH3598)
- Fix modulo and integer division on Series,DataFrames to act similary to
floatdtypes to returnnp.nanornp.infas appropriate (GH3590) - Fix incorrect dtype on groupby with
as_index=False(GH3610) - Fix
read_csv/read_excelto correctly encode identical na_values, e.g.na_values=[-999.0,-999]was failing (GH3611) - Disable HTML output in qtconsole again. (GH3657)
- Reworked the new repr display logic, which users found confusing. (GH3663)
- Fix indexing issue in ndim >= 3 with
iloc(GH3617) - Correctly parse date columns with embedded (nan/NaT) into datetime64[ns] dtype in
read_csvwhenparse_datesis specified (GH3062) - Fix not consolidating before to_csv (GH3624)
- Fix alignment issue when setitem in a DataFrame with a piece of a DataFrame (GH3626) or a mixed DataFrame and a Series (GH3668)
- Fix plotting of unordered DatetimeIndex (GH3601)
sql.write_framefailing when writing a single column to sqlite (GH3628), thanks to @stonebig- Fix pivoting with
nanin the index (GH3558) - Fix running of bs4 tests when it is not installed (GH3605)
- Fix parsing of html table (GH3606)
read_html()now only allows a single backend:html5lib(GH3616)convert_objectswithconvert_dates='coerce'was parsing some single-letter strings into today’s dateDataFrame.from_recordsdid not accept empty recarrays (GH3682)DataFrame.to_csvwill succeed with the deprecated optionnanRep, @tdsmithDataFrame.to_htmlandDataFrame.to_latexnow accept a path for their first argument (GH3702)- Fix file tokenization error with r delimiter and quoted fields (GH3453)
- Groupby transform with item-by-item not upcasting correctly (GH3740)
- Incorrectly read a HDFStore multi-index Frame with a column specification (GH3748)
read_htmlnow correctly skips tests (GH3741)- PandasObjects raise TypeError when trying to hash (GH3882)
- Fix incorrect arguments passed to concat that are not list-like (e.g. concat(df1,df2)) (GH3481)
- Correctly parse when passed the
dtype=str(or other variable-len string dtypes) inread_csv(GH3795) - Fix index name not propagating when using
loc/ix(GH3880) - Fix groupby when applying a custom function resulting in a returned DataFrame was not converting dtypes (GH3911)
- Fixed a bug where
DataFrame.replacewith a compiled regular expression in theto_replaceargument wasn’t working (GH3907) - Fixed
__truediv__in Python 2.7 withnumexprinstalled to actually do true division when dividing two integer arrays with at least 10000 cells total (GH3764) - Indexing with a string with seconds resolution not selecting from a time index (GH3925)
- csv parsers would loop infinitely if
iterator=Truebut nochunksizewas specified (GH3967), python parser failing withchunksize=1 - Fix index name not propagating when using
shift - Fixed dropna=False being ignored with multi-index stack (GH3997)
- Fixed flattening of columns when renaming MultiIndex columns DataFrame (GH4004)
- Fix
Series.clipfor datetime series. NA/NaN threshold values will now throw ValueError (GH3996) - Fixed insertion issue into DataFrame, after rename (GH4032)
- Fixed testing issue where too many sockets where open thus leading to a connection reset issue (GH3982, GH3985, GH4028, GH4054)
- Fixed failing tests in test_yahoo, test_google where symbols were not retrieved but were being accessed (GH3982, GH3985, GH4028, GH4054)
Series.histwill now take the figure from the current environment if one is not passed- Fixed bug where a 1xN DataFrame would barf on a 1xN mask (GH4071)
- Fixed running of
toxunder python3 where the pickle import was getting rewritten in an incompatible way (GH4062, GH4063) - Fixed bug where sharex and sharey were not being passed to grouped_hist (GH4089)
- Fix bug where
HDFStorewill fail to append because of a different block ordering on-disk (GH4096) - Better error messages on inserting incompatible columns to a frame (GH4107)
- Fixed bug in
DataFrame.replacewhere a nested dict wasn’t being iterated over when regex=False (GH4115) - Fixed bug in
convert_objects(convert_numeric=True)where a mixed numeric and object Series/Frame was not converting properly (GH4119) - Fixed bugs in multi-index selection with column multi-index and duplicates (GH4145, GH4146)
- Fixed bug in the parsing of microseconds when using the
formatargument into_datetime(GH4152) - Fixed bug in
PandasAutoDateLocatorwhereinvert_xaxistriggered incorrectlyMilliSecondLocator(GH3990) - Fixed bug in
Series.wherewhere broadcasting a single element input vector to the length of the series resulted in multiplying the value inside the input (GH4192) - Fixed bug in plotting that wasn’t raising on invalid colormap for matplotlib 1.1.1 (GH4215)
- Fixed the legend displaying in
DataFrame.plot(kind='kde')(GH4216) - Fixed bug where Index slices weren’t carrying the name attribute (GH4226)
- Fixed bug in initializing
DatetimeIndexwith an array of strings in a certain time zone (GH4229) - Fixed bug where html5lib wasn’t being properly skipped (GH4265)
- Fixed bug where get_data_famafrench wasn’t using the correct file edges (GH4281)
pandas 0.11.0¶
Release date: 2013-04-22
New Features¶
- New documentation section,
10 Minutes to Pandas - New documentation section,
Cookbook - Allow mixed dtypes (e.g
float32/float64/int32/int16/int8) to coexist in DataFrames and propagate in operations - Add function to pandas.io.data for retrieving stock index components from Yahoo! finance (GH2795)
- Support slicing with time objects (GH2681)
- Added
.ilocattribute, to support strict integer based indexing, analogous to.ix(GH2922) - Added
.locattribute, to support strict label based indexing, analogous to.ix(GH3053) - Added
.iatattribute, to support fast scalar access via integers (replacesiget_value/iset_value) - Added
.atattribute, to support fast scalar access via labels (replacesget_value/set_value) - Moved functionality from
irow,icol,iget_value/iset_valueto.ilocindexer (via_ixsmethods in each object) - Added support for expression evaluation using the
numexprlibrary - Added
convert=booleantotakeroutines to translate negative indices to positive, defaults to True - Added to_series() method to indices, to facilitate the creation of indexers (GH3275)
Improvements to existing features¶
Improved performance of df.to_csv() by up to 10x in some cases. (GH3059)
added
blocksattribute to DataFrames, to return a dict of dtypes to homogeneously dtyped DataFramesadded keyword
convert_numerictoconvert_objects()to try to convert object dtypes to numeric types (default is False)convert_datesinconvert_objectscan now becoercewhich will return a datetime64[ns] dtype with non-convertibles set asNaT; will preserve an all-nan object (e.g. strings), default is True (to perform soft-conversionSeries print output now includes the dtype by default
describe_option()now reports the default and current value of options.Add
formatoption topandas.to_datetimewith faster conversion of strings that can be parsed with datetime.strptimeAdd
axesproperty toSeriesfor compatibilityAdd
xsfunction toSeriesfor compatibilityAllow setitem in a frame where only mixed numerics are present (e.g. int and float), (GH3037)
HDFStoreAdd
squeezemethod to possibly remove length 1 dimensions from an object.In [1]: p = pd.Panel(np.random.randn(3,4,4),items=['ItemA','ItemB','ItemC'], ...: major_axis=pd.date_range('20010102',periods=4), ...: minor_axis=['A','B','C','D']) ...: In [2]: p Out[2]: <class 'pandas.core.panel.Panel'> Dimensions: 3 (items) x 4 (major_axis) x 4 (minor_axis) Items axis: ItemA to ItemC Major_axis axis: 2001-01-02 00:00:00 to 2001-01-05 00:00:00 Minor_axis axis: A to D In [3]: p.reindex(items=['ItemA']).squeeze()