Pivot Table#
# general syntax
DataFrame.pivot_table(self, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False)
Parameters:
indexRow-level grouping
columnsColumn-level grouping
valuesNumeric values that will be summarized
aggfuncSummary statistic calculation
Let’s start with a DataFrame.
# import statements
import numpy as np
import pandas as pd
# create df
df = pd.DataFrame({"P": ["f1", "f1", "f1", "f1", "f1",
"b1", "b1", "b1", "b1"],
"Q": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"R": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large"],
"S": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"T": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
# show df
df
Example #1#
In our first example…
Values in the
Scolumn will be summarized using Numpy’ssumfunctionValues in the
P&Qcolumns will be the row indicesValues in the
Rcolumn will be the column indexValues in the
Tcolumn will be dropped
# pivot table
pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum)
# assign to new df
dfPivotTable = pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum)
# show new df
dfPivotTable
Example #2#
.pivot_table() has a few additional parameters that can be helpful, depending on what type of summary statistic you’re using.
fill_valueValue to replace missing values with
dropnaIgnores columns with
NaNvalues
# fill value example
pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum, fill_value=0)
# dropna example
pd.pivot_table(df, values='S', index=['P', 'Q'],
columns=['R'], aggfunc=np.sum, dropna=True)