Case Study: South Bend City Budget Data#
Each year, South Bend’s Common Council passes an annual budget. Let’s curate a dataset that combines ammended budget info for all available years.
Getting Data#
API Calls#
import pandas as pd, json, requests, matplotlib.pyplot as plt # import statements
# budget API urls
urls = {
2018: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2018_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
2019: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2019_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
2020: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2020_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
2021: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2021_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
2022: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2022_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
2023: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2023_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson"
}
# function to get data
def budget(url):
r = requests.get(url) # get data
d = r.json() # store json object
data = [] # empty list for data
for f in d['features']: # iterate over list of dictionaries
data.append(f['properties']) # isolate value and append to list
df = pd.DataFrame(data) # create df
return df
data = [] # empty list for dataframes
for i in list(urls.values()): # iterate over urls
df = budget(i) # function call
data.append(df) # append to list of dfs
combined = pd.concat(data) # concatenate dataframes
combined # show output
Data Wrangling#
Some data wrangling that might be helpful
# relabel columns
combined.columns = ['year', 'fund', 'department', 'division', 'subdivision', 'accountCategory',
'accountSubcategory', 'description', 'type', 'account', 'amendedBudget',
'objectID', 'adoptedBudget', 'objectID2', 'objectID3']
combined['amendedBudget'] = combined['amendedBudget'].fillna(combined.pop('adoptedBudget')) # combined amended/adopted columns
combined = combined[combined['amendedBudget'].notna()] # remove rows with no budget number
budget = combined.iloc[:, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]].copy() # subset columns
budget # show output
We’ll need to convert the amendedBudget
to a numeric data type to be able to perform arithmetic operations.
budget['amendedBudget'] = pd.to_numeric(budget['amendedBudget'], errors='coerce') # convert data type
budget.info() # check output
Aggregating & Calculating#
We can use some of these workflows from a previous section to prime our data for analysis.
A few different examples:
# show total budget over time
budget.groupby('year')['amendedBudget'].sum().reset_index()
# show total by division over time
budget.groupby(['year', 'division'])['amendedBudget'].sum().reset_index()
Exploratory Visualization#
Now that we have a sense of what’s in our data, we can explore some different analysis and visualization workflows.
# total budget by year as a bar chart
plt.style.use('fivethirtyeight') # set stylesheet
budget.groupby('year')['amendedBudget'].sum().plot.bar(xlabel='Year', ylabel='Total Amount', title='City of South Bend Budget', colormap='Accent')
# total budget by year as a line plot
plt.style.use('fast') # set stylesheet
budget.groupby('year')['amendedBudget'].sum().plot(xlabel='Year', ylabel='Total Amount', title='City of South Bend Budget', colormap='Accent')
We could isolate the budget for a specific year and compare funding by department.
plt.style.use('seaborn-v0_8') # set stylesheet
f23 = budget[budget['year'] == '2023'] # subset data
f23.groupby('department')['amendedBudget'].sum().plot.bar(xlabel='Department', ylabel='Total Budget', title='City of South Bend 2023 Budget By Department', colormap='Accent', rot=70) # plot budget amount by department
We could also compare a the budget for a specific department over time.
plt.style.use('ggplot') # set stylesheet
innovation = budget.loc[budget['department'].str.contains('Innovation')] # subset data
innovation.groupby('year')['amendedBudget'].sum().plot.bar(xlabel='Year', ylabel='Total Budget', title='City of South Bend Deparmtent of Innovation & Technology Budget, Over Time', colormap='Accent', rot=70) # plot budget amount by department