Merging & Joining#
The SQL queries and joins lab covered how we can use joins in a relational database system to create new data structures. pandas has somewhat similar functionality that allows you to merge and combine data from multiple tables.
As with the reshaping functions, Pandas includes functionality geared toward combining or connecting different datasets.
A digest of these operations, courtesty of Pandas documentation:
pandas.concat: Merge multipleSeriesorDataFrameobjects along a shared index or columnDataFrame.join: Merge multipleDataFrameobjects along the columnsDataFrame.combine_first: Update missing values with non-missing values in the same locationpandas.merge: Combine twoSeriesorDataFrameobjects with SQL-style joiningpandas.merge_ordered: Combine twoSeriesorDataFrameobjects along an ordered axispandas.merge_asof: Combine twoSeriesorDataFrameobjects by near instead of exact matching keys
Series.compareandDataFrame.compare: Show differences in values between twoSeriesorDataFrameobjects
We’ll cover a couple of these workflows in greater depth.