User Guide#
The User Guide covers all of pandas by topic area. Each of the subsections introduces a topic (such as “working with missing data”), and discusses how pandas approaches the problem, with many examples throughout.
Users brand-new to pandas should start with 10 minutes to pandas.
For a high level summary of the pandas fundamentals, see Intro to data structures and Essential basic functionality.
Further information on any specific method can be obtained in the API reference.
How to read these guides#
In these guides you will see input code inside code blocks such as:
import pandas as pd
pd.DataFrame({'A': [1, 2, 3]})
or:
In [1]: import pandas as pd
In [2]: pd.DataFrame({'A': [1, 2, 3]})
Out[2]:
A
0 1
1 2
2 3
The first block is a standard python input, while in the second the In [1]:
indicates the input is inside a notebook. In Jupyter Notebooks the last line is printed and plots are shown inline.
For example:
In [3]: a = 1
In [4]: a
Out[4]: 1
is equivalent to:
a = 1
print(a)
Guides#
- 10 minutes to pandas
- Intro to data structures
- Essential basic functionality
- IO tools (text, CSV, HDF5, …)
- CSV & text files
- JSON
- HTML
- LaTeX
- XML
- Excel files
- OpenDocument Spreadsheets
- Binary Excel (.xlsb) files
- Calamine (Excel and ODS files)
- Clipboard
- Pickling
- msgpack
- HDF5 (PyTables)
- Feather
- Parquet
- ORC
- SQL queries
- Google BigQuery
- Stata format
- SAS formats
- SPSS formats
- Other file formats
- Performance considerations
- PyArrow Functionality
- Indexing and selecting data
- Different choices for indexing
- Basics
- Attribute access
- Slicing ranges
- Selection by label
- Selection by position
- Selection by callable
- Combining positional and label-based indexing
- Selecting random samples
- Setting with enlargement
- Fast scalar value getting and setting
- Boolean indexing
- Indexing with isin
- The
where()
Method and Masking - Setting with enlargement conditionally using
numpy()
- The
query()
Method - Duplicate data
- Dictionary-like
get()
method - Looking up values by index/column labels
- Index objects
- Set / reset index
- Returning a view versus a copy
- MultiIndex / advanced indexing
- Copy-on-Write (CoW)
- Merge, join, concatenate and compare
- Reshaping and pivot tables
- Working with text data
- Working with missing data
- Duplicate Labels
- Categorical data
- Nullable integer data type
- Nullable Boolean data type
- Chart visualization
- Table Visualization
- Styler Object and Customising the Display
- Formatting the Display
- Styler Object and HTML
- Methods to Add Styles
- Table Styles
- Setting Classes and Linking to External CSS
- Styler Functions
- Tooltips and Captions
- Finer Control with Slicing
- Optimization
- Builtin Styles
- Sharing styles
- Limitations
- Other Fun and Useful Stuff
- Export to Excel
- Export to LaTeX
- More About CSS and HTML
- Extensibility
- Group by: split-apply-combine
- Windowing operations
- Time series / date functionality
- Overview
- Timestamps vs. time spans
- Converting to timestamps
- Generating ranges of timestamps
- Timestamp limitations
- Indexing
- Time/date components
- DateOffset objects
- Time Series-related instance methods
- Resampling
- Time span representation
- Converting between representations
- Representing out-of-bounds spans
- Time zone handling
- Time deltas
- Options and settings
- Enhancing performance
- Scaling to large datasets
- Sparse data structures
- Frequently Asked Questions (FAQ)
- Cookbook