Merging & Joining#
The SQL queries and joins lab covered how we can use joins in a relational database system to create new data structures. pandas
has somewhat similar functionality that allows you to merge and combine data from multiple tables.
As with the reshaping functions, Pandas
includes functionality geared toward combining or connecting different datasets.
A digest of these operations, courtesty of Pandas documentation:
pandas.concat
: Merge multipleSeries
orDataFrame
objects along a shared index or columnDataFrame.join
: Merge multipleDataFrame
objects along the columnsDataFrame.combine_first
: Update missing values with non-missing values in the same locationpandas.merge
: Combine twoSeries
orDataFrame
objects with SQL-style joiningpandas.merge_ordered
: Combine twoSeries
orDataFrame
objects along an ordered axispandas.merge_asof
: Combine twoSeries
orDataFrame
objects by near instead of exact matching keys
Series.compare
andDataFrame.compare
: Show differences in values between twoSeries
orDataFrame
objects
We’ll cover a couple of these workflows in greater depth.