I want to start using pandas
In : import pandas as pd
To load the pandas package and start working with it, import the
package. The community agreed alias for pandas is pd, so loading
pandas as pd is assumed standard practice for all of the pandas
I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and sex (male/female) data.
In : df = pd.DataFrame(
...: "Name": [
...: "Braund, Mr. Owen Harris",
...: "Allen, Mr. William Henry",
...: "Bonnell, Miss. Elizabeth",
...: "Age": [22, 35, 58],
...: "Sex": ["male", "male", "female"],
In : df
Name Age Sex
0 Braund, Mr. Owen Harris 22 male
1 Allen, Mr. William Henry 35 male
2 Bonnell, Miss. Elizabeth 58 female
To manually store data in a table, create a DataFrame. When using a Python dictionary of lists, the dictionary keys will be used as column headers and
the values in each list as columns of the DataFrame.
A DataFrame is a 2-dimensional data structure that can store data of
different types (including characters, integers, floating point values,
categorical data and more) in columns. It is similar to a spreadsheet, a
SQL table or the data.frame in R.
The table has 3 columns, each of them with a column label. The column
labels are respectively Name, Age and Sex.
The column Name consists of textual data with each value a
string, the column Age are numbers and the column Sex is
In spreadsheet software, the table representation of our data would look
I’m just interested in working with the data in the column Age
In : df["Age"]
Name: Age, dtype: int64
When selecting a single column of a pandas DataFrame, the result is
a pandas Series. To select the column, use the column label in
between square brackets .
If you are familiar to Python
dictionaries, the selection of a
single column is very similar to selection of dictionary values based on
You can create a Series from scratch as well:
In : ages = pd.Series([22, 35, 58], name="Age")
In : ages
Name: Age, dtype: int64
A pandas Series has no column labels, as it is just a single column
of a DataFrame. A Series does have row labels.
I want to know the maximum Age of the passengers
We can do this on the DataFrame by selecting the Age column and
In : df["Age"].max()
Or to the Series:
In : ages.max()
As illustrated by the max() method, you can do things with a
DataFrame or Series. pandas provides a lot of functionalities,
each of them a method you can apply to a DataFrame or Series.
As methods are functions, do not forget to use parentheses ().
I’m interested in some basic statistics of the numerical data of my data table
In : df.describe()
The describe() method provides a quick overview of the numerical data in
a DataFrame. As the Name and Sex columns are textual data,
these are by default not taken into account by the describe() method.
Many pandas operations return a DataFrame or a Series. The
describe() method is an example of a pandas operation returning a
Check more options on describe in the user guide section about aggregations with describe
This is just a starting point. Similar to spreadsheet
software, pandas represents data as a table with columns and rows. Apart
from the representation, also the data manipulations and calculations
you would do in spreadsheet software are supported by pandas. Continue
reading the next tutorials to get started!
Import the package, aka import pandas as pd
import pandas as pd
A table of data is stored as a pandas DataFrame
Each column in a DataFrame is a Series
You can do things by applying a method to a DataFrame or Series
A more extended explanation to DataFrame and Series is provided in the introduction to data structures.