Case Study: South Bend City Budget Data#

Each year, South Bend’s Common Council passes an annual budget. Let’s curate a dataset that combines ammended budget info for all available years.

Getting Data#

API Calls#

import pandas as pd, json, requests, matplotlib.pyplot as plt # import statements

# budget API urls
urls = {
    2018: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2018_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
    2019: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2019_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
    2020: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2020_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
    2021: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2021_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
    2022: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2022_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson",
    2023: "https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/2023_Amended_Budget/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson"
    }

# function to get data
def budget(url):
  r = requests.get(url) # get data
  d = r.json() # store json object
  data = [] # empty list for data

  for f in d['features']: # iterate over list of dictionaries
    data.append(f['properties']) # isolate value and append to list

  df = pd.DataFrame(data) # create df
  return df

data = [] # empty list for dataframes

for i in list(urls.values()): # iterate over urls
  df = budget(i) # function call
  data.append(df) # append to list of dfs

combined = pd.concat(data) # concatenate dataframes
combined # show output

Data Wrangling#

Some data wrangling that might be helpful

# relabel columns
combined.columns = ['year', 'fund', 'department', 'division', 'subdivision', 'accountCategory',
                    'accountSubcategory', 'description', 'type', 'account', 'amendedBudget',
                    'objectID', 'adoptedBudget', 'objectID2', 'objectID3']

combined['amendedBudget'] = combined['amendedBudget'].fillna(combined.pop('adoptedBudget')) # combined amended/adopted columns
combined = combined[combined['amendedBudget'].notna()] # remove rows with no budget number
budget = combined.iloc[:, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]].copy() # subset columns
budget # show output

We’ll need to convert the amendedBudget to a numeric data type to be able to perform arithmetic operations.

budget['amendedBudget'] = pd.to_numeric(budget['amendedBudget'], errors='coerce') # convert data type
budget.info() # check output

Aggregating & Calculating#

We can use some of these workflows from a previous section to prime our data for analysis.

A few different examples:

# show total budget over time
budget.groupby('year')['amendedBudget'].sum().reset_index()
# show total by division over time
budget.groupby(['year', 'division'])['amendedBudget'].sum().reset_index()

Exploratory Visualization#

Now that we have a sense of what’s in our data, we can explore some different analysis and visualization workflows.

# total budget by year as a bar chart
plt.style.use('fivethirtyeight') # set stylesheet
budget.groupby('year')['amendedBudget'].sum().plot.bar(xlabel='Year', ylabel='Total Amount', title='City of South Bend Budget', colormap='Accent')
# total budget by year as a line plot
plt.style.use('fast') # set stylesheet
budget.groupby('year')['amendedBudget'].sum().plot(xlabel='Year', ylabel='Total Amount', title='City of South Bend Budget', colormap='Accent')

We could isolate the budget for a specific year and compare funding by department.

plt.style.use('seaborn-v0_8') # set stylesheet
f23 = budget[budget['year'] == '2023'] # subset data
f23.groupby('department')['amendedBudget'].sum().plot.bar(xlabel='Department', ylabel='Total Budget', title='City of South Bend 2023 Budget By Department', colormap='Accent', rot=70) # plot budget amount by department

We could also compare a the budget for a specific department over time.

plt.style.use('ggplot') # set stylesheet
innovation = budget.loc[budget['department'].str.contains('Innovation')] # subset data
innovation.groupby('year')['amendedBudget'].sum().plot.bar(xlabel='Year', ylabel='Total Budget', title='City of South Bend Deparmtent of Innovation & Technology Budget, Over Time', colormap='Accent', rot=70) # plot budget amount by department