U.S. Census Bureau#

As we’ve seen previously, the U.S. Census Bureau collects a wealth of information about people, households, communities, etc. They also have a number of APIs and a robust user community.

Unlike the APIs we encountered in the previous section of the lab, the Census Bureau’s API requires a key. Head to the Census Bureau’s “About” page for developers and click the “Request a Key” icon. Once you’ve completed the “Request a Key” form, you should receive an API key via email. This will look like a long string of letters and numbers.

  • It’s fine to use your name or “University of Notre Dame” for “Organization Name”

Selecting a Dataset#

The first step is figuring out what Census Bureau dataset and API you want to work with. One option is to use data.census.gov’s data explorer interface. Keyword search results can point you to datasets of interest (which you would then have to track down in the API documentation).

Looking at the APIs for specific Census Bureau data collection instruments is also a place to start.

You could also jump into the API Discovery Tool, which is available in a couple different formats. This is a machine-readable inventory of Census Bureau datatasets. It can also be slightly overwhelming.

For this example, I’m going to use microdata from the American Community Survey, specifically the “2022 American Community Survey 1-Year Estimates Public Use Microdata Sample (2005-2019, 2021, 2022).”

Constructing Your Query#

Before you jump into writing the API call in Python, use the available documentation to make sense of how the URL is structured, what variables are included in the dataset, etc.

Note that for the Census Bureau API, your key is part of the URL. Concatenation and a string variable can be helpful here.

Let’s break down one of the example URLs included in the documentation for this survey.

https://api.census.gov/data/2022/acs/acs1/pums?get=SEX,PWGTP,MAR&SCHL=24&key=YOUR_KEY_GOES_HERE

We can start to break down the pieces of information included here:

  • Root: https://api.census.gov/data

  • Survey: acs (American Community Survey)

  • Coverage: acs1 (1 year estimates)

  • Subset: pums (Public Use Microdata Sample)

  • Query: ?get= (prefix for selecting specific variables)

  • Variables: SEX,PWGTP,MAR&SCHL=24 (codes for specific variables, from the documentation)

    • NOTE: In this example, the URL is structured to select all values for the SEX, PWGTP, MAR variables. This query will only return records where the SCHL variable value is 24.

    • In a scenario where we are not filtering for specific variable values, we could just include variable names separated by commas (i.e. SEX, PWGTP, MAR, SCHL)

  • &key= (prefix for entering your API key)

Writing the API Call#

Remember our general API workflow:

  • Customize query URL

  • Load response in Python

  • Parse the data structure to isolate the data we want

  • Store that data as a Pandas DataFrame

import requests, pandas as pd # import statements
key = "YOUR KEY GOES HERE" # add your key to make this a string variable
url = f"https://api.census.gov/data/2022/acs/acs1/pums?get=SEX,PWGTP,MAR&SCHL=24&key={key}" # use f strings and concatenation to construct the query
r = requests.get(url) # requests data
data = r.json() # store response
df = pd.DataFrame(data[1:], columns=data[0]) # create the dataframe, making the first sublist the column headers, and starting with the first row of data to avoid duplicating headers)
print(df) # show output

Additional Resources#

This is an example query- there are lots of ways to customize Census Bureau API requests. Check out census-docs and the Census Bureau’s API documentation for more info.

Some datasets include Groups, which let you return multiple related variables.

DataMade also has a Python wrapper for the Census Bureau APIs, which is designed to streamline programming workflows.

The Census Bureau APIs can be overwhelming, but they’re a powerful tool for accessing data. Wading through the documentation is worth your time.

Application#

Q4A: Write your own Census Bureau API call.

  • You could use the same dataset as the example and modify the year, variables, geographic scope, etc.

  • You could also explore other datasets.

Q4B: What kinds of information are available in the documentation? How does that material help us start to make sense of this data?

Q4C: Write a program that stores the API return in Python as a Pandas DataFrame, using the workflows covered in this lab.

  • Answer to this question includes program + comments that document process and explain your code.

Q4D: What challenges did you encounter? How did you address or solve them?