City of South Bend Open Data Portal#
Head to the City of South Bend’s Open Data Portal. Identify a specific dataset you want to load in Python.
I’m working with the raw data from the City’s 2022 Digital Literacy Survey.
We could download this data as a CSV
file, but let’s instead explore the API options.
Spend some time exploring the options on this page. As you customize the query, notice how the Query URL
changes. We can think about the URL
as the data’s “address.” Modifying what data we want requires modifying the address.
Once you have finalized your query (what parts of this data you want to work with in Python), we’ll use the requests
and json
modules to load the data.
import requests, json # import statements
r = requests.get("https://opendata.arcgis.com/datasets/c97085b608604f5c8c07487c24dcaff4_0/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json") # request data from url
data = r.json() # parse response as JSON object
print(data) # show output
Exploring the API Return#
We can use Python dictionary syntax to make sense of what’s in this output.
# show top-level keys
print(data.keys())
At first glance, everything up through the fields
key looks like technical information about the API return. fields
doesn’t have the survey responses, but it does have important metadata about the survey data.
Let’s start by creating a DataFrame
with this metadata
(data about data).
metadata = pd.DataFrame(data['fields']) # create dataframe
metadata.to_csv("metadata.csv", index=False) # write metadata to csv file
print(metadata) # show output
The fields
key includes survey response data, but it’s structured as a list of dictionaries. We’ll need to use a combination of list and dictionary Python syntax (and some trial and error) here.
print(type(data['features'])) # shows us data type for the value associated with the features key
response = data['features'][0] # selects the first response
print(type(response)) # show response data type
print(response.keys()) # shows keys for single response
print(type(response['attributes'])) # show data type for attributes value
print(response['attributes'].keys()) # gets value for attributes key, shows those keys
That’s starting to look like survey results. We could use another print statement to confirm.
for key, value in response['attributes'].items(): # iterate over dictionary items
print(key, value) # show key and value
Creating a DataFrame
#
Now that we understand how the API return data is structured, we can create a DataFrame
with the survey data. We’ll first need a list with the survey response dictionaries.
import pandas as pd # import statement
surveyData = [] # empty list for responses
for d in data['features']: # iterate over list of dictionaries
surveyData.append(d['attributes']) # isolate value and append to list
df = pd.DataFrame(surveyData) # create df
print(df) # show output
Now, there’s more we might want to do with this DataFrame
(rename columns, remove unneeded columns, perform other kinds of calculations, etc). But this gets us started!
To recap what we just did:
Customize query URL
Load response in Python
Parse the data structure to isolate the data we want
Store that data as a Pandas
DataFrame
Application#
Q3A: Select another open civic dataset to use for an API call.
You’re welcome to work with the same dataset you used for Q1 or Q2 (as long as there is an API option).
But if you want to flex you API muscles, try a different open data portal to learn a new interface and data structure! Lots to choose from, but Chicago and NYC are good places to start.
Remember open data portals are catalogs of datasets- you will need to explore the websites to identify and then download a specific dataset (that has an API option).
Q3B: What kinds of information are available in the documentation? How does that material help us start to make sense of this data?
Q3C: Write a program that stores the API return in Python as a Pandas DataFrame
, using the workflows covered in this lab.
Answer to this question includes program + comments that document process and explain your code.
Q3D: What challenges did you encounter? How did you address or solve them?