Pie Charts

Pie Charts#

Pie charts don’t play as nicely with DataFrames, because they only accept one-dimensional data (values with attributes) and a DataFrame is a two-dimensional structure.

One option is to generate a subplot for each column.

import pandas as pd, numpy as np # import statements
df = pd.DataFrame(3 * np.random.rand(4, 2), index=["a", "b", "c", "d"], columns=["x", "y"]) # generate data
df.plot.pie(subplots=True, figsize=(8,4)) # show plot
array([<Axes: ylabel='x'>, <Axes: ylabel='y'>], dtype=object)
../_images/e8776656a72accea6bf75bd07195670b726ff885089ab47992138e72751cd13e.png

The other option is to isolate a subset of the data to get to a one-dimension data structure.

Let’s go back to our air quality data.

df = pd.read_csv('https://raw.githubusercontent.com/kwaldenphd/more-with-matplotlib/main/data/air_quality_no2.csv', index_col=0, parse_dates=True) # load data
df.head() # inspect df
station_antwerp station_paris station_london
datetime
2019-05-07 02:00:00 NaN NaN 23.0
2019-05-07 03:00:00 50.5 25.0 19.0
2019-05-07 04:00:00 45.0 27.7 19.0
2019-05-07 05:00:00 NaN 50.4 16.0
2019-05-07 06:00:00 NaN 61.9 NaN

Let’s say we want to know what proportion of observations come from each station. We could use some of the aggregating and reshaping functions to get to a one dimensional structure.

df2 = df.melt() # melt dataframe
df3 = df2['variable'].value_counts() # get number of station observations
df3 # inspect output
station_antwerp    1035
station_paris      1035
station_london     1035
Name: variable, dtype: int64
df3.plot.pie() # generate pie chart
<Axes: ylabel='variable'>
../_images/13604cf6c72a86333b6dc57db07f416f7928b494e3972473f5257beefcf04592.png

Additional Resources#

For more on pie plots: