Pivot Table#
# general syntax
DataFrame.pivot_table(self, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False)
Parameters:
index
Row-level grouping
columns
Column-level grouping
values
Numeric values that will be summarized
aggfunc
Summary statistic calculation
Let’s start with a DataFrame
.
# import statements
import numpy as np
import pandas as pd
# create df
df = pd.DataFrame({"P": ["f1", "f1", "f1", "f1", "f1",
"b1", "b1", "b1", "b1"],
"Q": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"R": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"S": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"T": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
# show df
df
Example #1#
In our first example…
Values in the
S
column will be summarized using Numpy’ssum
functionValues in the
P
&Q
columns will be the row indicesValues in the
R
column will be the column indexValues in the
T
column will be dropped
# pivot table
pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum)
# assign to new df
dfPivotTable = pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum)
# show new df
dfPivotTable
Example #2#
.pivot_table()
has a few additional parameters that can be helpful, depending on what type of summary statistic you’re using.
fill_value
Value to replace missing values with
dropna
Ignores columns with
NaN
values
# fill value example
pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum, fill_value=0)
# dropna example
pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum, dropna=True)