Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Structured Data in Python

In programming languages and computing more broadly, I/O stands for Input and Output.

We’ve seen I/O at at work elsewhere in teh course, where we provided inputs to the computer (using a CPU simulator or working at the terminal), a task or operation was performed, and there was an endpoint or output for that process.

Similarly, programming languages can take a variety of inputs (user-provided values, data, files, etc) and return outputs in a variety of formats (data stored in memory, output that shows up in the console, newly-created or -modified files, etc). We’ve seen I/O in action in previous units, with Python’s print() and input() functions.

In this chapter, we’re going to look at aspects of I/O that have to do with reading and writing files, with a focus on two key data structures:

We’ll do so by interacting with Python’s csv module and json package, along with the pandas Python library.

Goals

This first part of this chapter provides an overview of fundamental programming concepts in the areas of file I/O and working with structured data, with a focus on Python syntax. Topics covered include:

Panopto logoLab overview

This second part of this chapter covers the core components of pandas, including Series and DataFrame objects. It covers how to manually create and interact with Series and DataFrame objects in the Python programming environment. It covers loading a structured data file (CSV and JSON) as a DataFrame, and sorting, selecting, and filtering the resulting DataFrame. The lab also covers common data parsing and wrangling challenges like duplicate entries and missing data.

By the end of this chapter, students will be able to;

Information and exercises in the pandas sections were developed in consultation with the following resources:

Data Files

You’ll need four data files for this chapter.

You can also access them via Google Drive (ND users only).

You’ll need to download these files and put them in the same folder as your Jupyter Notebook (or upload them to Google Colab).

Application

Your answers to this chapter’s application questions should be added to the notebook template.

Submit the Colab link on Canvas for the assignment submission.

Acknowledgments

Peer review and editing on the CSV portion of this chapter was provided by Spring 2021 graduate teaching assistant Aidan Draper.

Peer review and editing on the JSON portion of this chapter was provided by Spring 2021 graduate teaching assistant Subhadyuti Sahoo.

When building the CSV, JSON & file methods sections, the author consulted the following materials: