Cookbook¶
This is a repository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.
Adding interesting links and/or inline examples to this section is a great First Pull Request.
Simplified, condensed, new-user friendly, in-line examples have been inserted where possible to augment the Stack-Overflow and GitHub links. Many of the links contain expanded information, above what the in-line examples offer.
Pandas (pd) and Numpy (np) are the only two abbreviated imported modules. The rest are kept explicitly imported for newer users.
These examples are written for python 3.4. Minor tweaks might be necessary for earlier python versions.
Idioms¶
These are some neat pandas idioms
if-then/if-then-else on one column, and assignment to another one or more columns:
In [1]: df = pd.DataFrame(
   ...:      {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
   ...: 
Out[1]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
if-then...¶
An if-then on one column
In [2]: df.loc[df.AAA >= 5,'BBB'] = -1; df
Out[2]: 
   AAA  BBB  CCC
0    4   10  100
1    5   -1   50
2    6   -1  -30
3    7   -1  -50
An if-then with assignment to 2 columns:
In [3]: df.loc[df.AAA >= 5,['BBB','CCC']] = 555; df
Out[3]: 
   AAA  BBB  CCC
0    4   10  100
1    5  555  555
2    6  555  555
3    7  555  555
Add another line with different logic, to do the -else
In [4]: df.loc[df.AAA < 5,['BBB','CCC']] = 2000; df
Out[4]: 
   AAA   BBB   CCC
0    4  2000  2000
1    5   555   555
2    6   555   555
3    7   555   555
Or use pandas where after you’ve set up a mask
In [5]: df_mask = pd.DataFrame({'AAA' : [True] * 4, 'BBB' : [False] * 4,'CCC' : [True,False] * 2})
In [6]: df.where(df_mask,-1000)
Out[6]: 
   AAA   BBB   CCC
0    4 -1000  2000
1    5 -1000 -1000
2    6 -1000   555
3    7 -1000 -1000
if-then-else using numpy’s where()
In [7]: df = pd.DataFrame(
   ...:      {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df
   ...: 
Out[7]: 
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
In [8]: df['logic'] = np.where(df['AAA'] > 5,'high','low'); df