City of South Bend Open Data Portal

City of South Bend Open Data Portal#

Head to the City of South Bend’s Open Data Portal. Identify a specific dataset you want to load in Python.

I’m working with the raw data from the City’s 2022 Digital Literacy Survey.

We could download this data as a CSV file, but let’s instead explore the API options.

Spend some time exploring the options on this page. As you customize the query, notice how the Query URL changes. We can think about the URL as the data’s “address.” Modifying what data we want requires modifying the address.

Once you have finalized your query (what parts of this data you want to work with in Python), we’ll use the requests and json modules to load the data.

import requests, json # import statements
r = requests.get("https://opendata.arcgis.com/datasets/c97085b608604f5c8c07487c24dcaff4_0/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json") # request data from url
data = r.json() # parse response as JSON object
print(data) # show output

Exploring the API Return#

We can use Python dictionary syntax to make sense of what’s in this output.

# show top-level keys
print(data.keys())

At first glance, everything up through the fields key looks like technical information about the API return. fields doesn’t have the survey responses, but it does have important metadata about the survey data.

Let’s start by creating a DataFrame with this metadata (data about data).

metadata = pd.DataFrame(data['fields']) # create dataframe
metadata.to_csv("metadata.csv", index=False) # write metadata to csv file
print(metadata) # show output

The fields key includes survey response data, but it’s structured as a list of dictionaries. We’ll need to use a combination of list and dictionary Python syntax (and some trial and error) here.

print(type(data['features'])) # shows us data type for the value associated with the features key
response = data['features'][0] # selects the first response
print(type(response)) # show response data type
print(response.keys()) # shows keys for single response
print(type(response['attributes'])) # show data type for attributes value
print(response['attributes'].keys()) # gets value for attributes key, shows those keys

That’s starting to look like survey results. We could use another print statement to confirm.

for key, value in response['attributes'].items(): # iterate over dictionary items
  print(key, value) # show key and value

Creating a DataFrame#

Now that we understand how the API return data is structured, we can create a DataFrame with the survey data. We’ll first need a list with the survey response dictionaries.

import pandas as pd # import statement

surveyData = [] # empty list for responses

for d in data['features']: # iterate over list of dictionaries
  surveyData.append(d['attributes']) # isolate value and append to list

df = pd.DataFrame(surveyData) # create df
print(df) # show output

Now, there’s more we might want to do with this DataFrame (rename columns, remove unneeded columns, perform other kinds of calculations, etc). But this gets us started!

To recap what we just did:

  • Customize query URL

  • Load response in Python

  • Parse the data structure to isolate the data we want

  • Store that data as a Pandas DataFrame

Application#

Q3A: Select another open civic dataset to use for an API call.

You’re welcome to work with the same dataset you used for Q1 or Q2 (as long as there is an API option).

But if you want to flex you API muscles, try a different open data portal to learn a new interface and data structure! Lots to choose from, but Chicago and NYC are good places to start.

Remember open data portals are catalogs of datasets- you will need to explore the websites to identify and then download a specific dataset (that has an API option).

Q3B: What kinds of information are available in the documentation? How does that material help us start to make sense of this data?

Q3C: Write a program that stores the API return in Python as a Pandas DataFrame, using the workflows covered in this lab.

Answer to this question includes program + comments that document process and explain your code.

Q3D: What challenges did you encounter? How did you address or solve them?