.concat()#

While .merge() connects DataFrames based on one or more key fields, concat concatenates or “stacks” objects together along an axis.

# general syntax
combined = pd.concat([df1, df2], axis=0)

Parameters:

  • []: DataFrame variable names, as a list

  • axis=: axis to concatenate based on.

The default for .concat() is axis 0, so the resulting table combines the input table rows.

Remember in a pandas dataframe axis 0 is vertical (rows) and axis 1 is horizontal (columns).

Let’s work with a few air quality datasets.

  • air_quality_no2 provies NO2 values for three measurement stations.

  • air_quality_pm25 provides PM25 values (particulate matter less than 2.5 micrometers) for the same three measurement stations.

  • air_quality_stations provides latitude and longitude coordinates for five different measurement stations.

  • air_quality_parameters provides parameter full description and name for five different element types.

import pandas as pd # import statement

# load data
no2 = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_no2_long.csv", parse_dates=True)
pm25 = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_pm25_long.csv", parse_dates=True)
stations = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_stations.csv")
param = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_parameters.csv")

Example #1#

Let’s say we want to combine the NO2 and PM25 measurements in a single table. Since the two original tables have a similar strucure, we can perform a concatenation operation on multiple tables using one of the axes.

combined = pd.concat([pm25, no2], axis=0) # concatenate
combined # show output

We can use .shape and some quick arithmetic to verify the operation worked correctly.

  • 1110 + 2068 = 3178

pm25.shape # pm25 df shape
no2.shape # no2 df shape
combined.shape # combined df shape

Example #2#

In the first example, the combined data shows observations for both NO2 and PM25.

In a situation where we don’t have something like the parameter column, we can add an additional row index can help identify the data source.

combined = pd.concat([pm25, no2], keys=["PM25", "NO2"]) # concatenate and add a key
combined # show output

By giving a keys argument to the .concat() function, we create a hierarchical index or a MultiIndex.

Additional Resources#

Consult the pandas documentation on object concatenation for more on this function.