.concat()
#
While .merge()
connects DataFrames
based on one or more key fields, concat
concatenates or “stacks” objects together along an axis.
# general syntax
combined = pd.concat([df1, df2], axis=0)
Parameters:
[]
:DataFrame
variable names, as a listaxis=
: axis to concatenate based on.
The default for .concat()
is axis 0, so the resulting table combines the input table rows.
Remember in a pandas
dataframe axis 0
is vertical (rows) and axis 1
is horizontal (columns).
Let’s work with a few air quality datasets.
air_quality_no2
provies NO2 values for three measurement stations.air_quality_pm25
provides PM25 values (particulate matter less than 2.5 micrometers) for the same three measurement stations.air_quality_stations
provides latitude and longitude coordinates for five different measurement stations.air_quality_parameters
provides parameter full description and name for five different element types.
import pandas as pd # import statement
# load data
no2 = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_no2_long.csv", parse_dates=True)
pm25 = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_pm25_long.csv", parse_dates=True)
stations = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_stations.csv")
param = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch5/air_quality_parameters.csv")
Example #1#
Let’s say we want to combine the NO2 and PM25 measurements in a single table. Since the two original tables have a similar strucure, we can perform a concatenation operation on multiple tables using one of the axes.
combined = pd.concat([pm25, no2], axis=0) # concatenate
combined # show output
We can use .shape
and some quick arithmetic to verify the operation worked correctly.
1110 + 2068 = 3178
pm25.shape # pm25 df shape
no2.shape # no2 df shape
combined.shape # combined df shape
Example #2#
In the first example, the combined data shows observations for both NO2 and PM25.
In a situation where we don’t have something like the parameter
column, we can add an additional row index can help identify the data source.
combined = pd.concat([pm25, no2], keys=["PM25", "NO2"]) # concatenate and add a key
combined # show output
By giving a keys
argument to the .concat()
function, we create a hierarchical index or a MultiIndex.
Additional Resources#
Consult the pandas
documentation on object concatenation for more on this function.