Data Wrangling in Pandas

Data Wrangling in Pandas#

This chapter provides an introduction to reshaping and processing data in Python using Pandas. It reviews foundational syntax for interacting with a DataFrame and introduces time series data along with data aggregation and calculation workflows.

It then moves to data reshaping in Pandas, including operations like .groupby, .pivot, .melt, .pivot_table, .explode, and .transpose. It introduces merging and joining operations in Pandas and includes some discussion of multi-level indexing and regular expressions.

Acknowledgements#

The author consulted the following resources when building this chapter:

pandas package “Getting started” documentation.
W3Resource Pandas guides
Wes McKinney’s Python for Data Analysis: Data Wrangling With pandas, Numpy, and IPython (O’Reilly, 2017)
- Chapter 5 “Getting Started with pandas” (125-168)
- Chapter 7 “Data Cleaning and Preparation” (195-224)
- Chapter 8 “Data Wrangling: Join, Combine, and Reshape” (225-256)
- Chapter 10 “Data Aggregation and Group Operations” (293-322)

All figures shown in this lab are from the pandas “Getting Started” tutorials.

Chapter Contents#

Data#

We’ll work with a few different datasets in this chapter.

American Community Survey 1-Year Estimates Public Use Microdata Sample
- Code for this API call is included
Air quality data (code is included to load this data from URLs)
- air_quality_no2 provies NO₂ values for three measurement stations.
- air_quality_pm25 provides PM₂₅ values (particulate matter less than 2.5 micrometers) for the same three measurement stations.
- air_quality_stations provides latitude and longitude coordinates for five different measurement stations.
- air_quality_parameters provides parameter full description and name for five different element types.
Educational outcome attribute data from the American Community Survey
- Link to download
- Note: I’m working with 2022’s 5 year estimate, which I’ve renamed data.csv.

Application#

Click here for a Jupyter Notebook template for this chapter’s application problems.