.concat()#
While .merge() connects DataFrames based on one or more key fields, concat concatenates or “stacks” objects together along an axis.
# general syntax
combined = pd.concat([df1, df2], axis=0)
Parameters:
[]:DataFramevariable names, as a listaxis=: axis to concatenate based on.
The default for .concat() is axis 0, so the resulting table combines the input table rows.
Remember in a pandas dataframe axis 0 is vertical (rows) and axis 1 is horizontal (columns).
Let’s work with a few air quality datasets.
air_quality_no2provies NO2 values for three measurement stations.air_quality_pm25provides PM25 values (particulate matter less than 2.5 micrometers) for the same three measurement stations.air_quality_stationsprovides latitude and longitude coordinates for five different measurement stations.air_quality_parametersprovides parameter full description and name for five different element types.
import pandas as pd # import statement
# load data
no2 = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_no2_long.csv", parse_dates=True)
pm25 = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_pm25_long.csv", parse_dates=True)
stations = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_stations.csv")
param = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_parameters.csv")
Example #1#
Let’s say we want to combine the NO2 and PM25 measurements in a single table. Since the two original tables have a similar strucure, we can perform a concatenation operation on multiple tables using one of the axes.
combined = pd.concat([pm25, no2], axis=0) # concatenate
combined # show output
We can use .shape and some quick arithmetic to verify the operation worked correctly.
1110 + 2068 = 3178
pm25.shape # pm25 df shape
no2.shape # no2 df shape
combined.shape # combined df shape
Example #2#
In the first example, the combined data shows observations for both NO2 and PM25.
In a situation where we don’t have something like the parameter column, we can add an additional row index can help identify the data source.
combined = pd.concat([pm25, no2], keys=["PM25", "NO2"]) # concatenate and add a key
combined # show output
By giving a keys argument to the .concat() function, we create a hierarchical index or a MultiIndex.
Additional Resources#
Consult the pandas documentation on object concatenation for more on this function.