Visualizing Data in GeoPandas
#
!pip install geopandas # install
import geopandas as gpd, pandas as pd # import
Point Data#
First, let’s make a scatterplot with the latitude and longtidue data in the parks dataset.
parks = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch12/parks.csv") # load data
parks.plot.scatter(y='Latitude', x='Longitude') # plot data
We can start to see a map taking shape.
# create geodataframe
gdf = gpd.GeoDataFrame(
parks, geometry=gpd.points_from_xy(parks.Longitude, parks.Latitude)
)
We can also plot the geodataframe
.
gdf.plot() # plot gdf
Then we can add a basemap layer to create a static 2D map.
geopandas
includes a number of built-in basemap layers, but we’ll use the zip code boundary file for St. Joseph County.
gpd.datasets.available # available basemap layers
['naturalearth_lowres', 'naturalearth_cities', 'nybb']
basemap = gpd.read_file('zip.geojson') # load zip code data
basemap.plot() # plot county data
Now we can connect these datasets, plotting the parks data on the zip code basemap.
ax = basemap.plot(figsize=(15, 5), linewidth=0.25, edgecolor="white", color="lightgrey") # basemap plot
ax.set_title("City of South Bend Parks") # set plot title
ax.axis('off') # turn off default axes
gdf.plot(markersize=10, column='Park_Name', cmap='viridis', alpha=0.5, ax=ax, legend=True) # plot data
Not ready for primetime, but a map nonetheless!
Polygon Data#
gdf = gpd.read_file("zip.geojson") # load file
gdf = gdf[gdf['City_Town'] == 'South Bend'] # filter cities
df = pd.read_csv("data.csv") # load attribute data
df.columns = df.columns.str.split("!!", 2, expand=True) # split column headers into multi-level index based on separator
df = df.T # transpose
header = df.iloc[0] # isolate first row to be new header
df = df[1:] # subset dataframe (everything past the first row)
df.columns = header # reassign headers
df = df.reset_index() # reset index
df.columns.values[0] = 'area' # rename columns
df.columns.values[1] = 'coverage'
df.columns.values[2] = 'type'
df = pd.melt(df, id_vars=['area', 'coverage', 'type']) # melt variable column
df.columns.values[3] = 'variable'
df = df[['area', 'coverage', 'variable', 'value']] # subset columns
df = df[df['value'].notnull()] # remove rows with NaN in value
df = df.reset_index(drop=True) # reset index
df['area'] = df['area'].str.replace("ZCTA5 ", "") # clean up area column to be able to join on zip code
df['variable'] = df['coverage'] + df['variable'] # concatenate variable columns
df['value'] = pd.to_numeric(df['value'], errors='coerce') # convert data type
df = df.pivot_table(index='area', columns='variable', values='value', aggfunc='sum').reset_index() # reshape data
merged = gdf.merge(df, left_on="ZIP", right_on="area") # merged attribute and geospatial data
# merged.info() # show output
A lot of data wrangling, but a geodataframe we can start to plot (even if there’s more we might want to do with the column names).
ax = merged.plot(column=merged.iloc[:, 99], cmap='viridis', legend=True) # choropleth map with the last data column
ax.set_title("Number of South Bend Residents With Some College, No Degree") # plot title
ax.axis('off') # configure axis
We’d want to verify some of this data, but that’s a map!
Additional Resources#
We’re only scratching the surface of the data tasks and workflows GeoPandas
can facilitate. A good place to start is the GeoPandas User Guide.