Getting Started With GeoPandas
#
Setup & Environment#
First, we’ll need to install GeoPandas
. Installing and configuring Geopandas
requires creating a new Python environment.
A few resources that can get folks started:
Anaconda
Tanish Gupta, “Fastest Way to Install Geopandas in Jupyter Notebooks” Analytics Vidhya (6 December 2020)
Anaconda, “conda-forge packages, geopandas” Anaconda documentation
GeoPandas, “Installation” GeoPandas documentation
Google CoLab
Abdishakur Hassan, Jupyter notebook on using
geopandas
in Google CoLab, from “Geographic data science tutorials with Python” GitHub repository
Additional GeoPandas
resources:
Jonathan Soma, “Mapping with geopandas” from 2017 “Foundations of Computing” course, Columbia Graduate School of Journalism
CoderzColumn, “Plotting Static Maps with geopandas” CoderzColumn (11 March 2020)
GeoPandas, “Plotting with Geoplot and GeoPandas” GeoPandas documentation
# if working in Google Colab
!pip install geopandas
# import statements
import pandas as pd, geopandas as gpd, json, requests
When possible, loading geospatial data (especially polygon data) through GeoPandas
will simplify other workflows.
What distinguishes a GeoDataFrame
from a standard DataFrame
? The all important geometry
column.
For more on data structures in GeoPandas
:
Dataset #1#
The first dataset we’ll use in this chapter is data about City of South Bend parks.
An API call to bring that data into Python:
import pandas as pd, json, requests # import statements
r = requests.get('https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/Parks_Locations_and_Features/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson') # load page
d = r.json() # store as json object
data = [] # empty list for data
for i in d['features']: # iterate over list
data.append(i['properties']) # isolate value, append to list
df = pd.DataFrame(data) # create dataframe
df.info() # show output
We’ll need latitude
and longitude
, and those values are currently buried in the Location_1
column. So we’ll start there by splitting out that column on the \n
character.
df[['Address', 'City', 'LatLon']] = df['Location_1'].str.split(r'\n', expand=True) # split column
df.head() # show output
We’re closer!
The next step is breaking out the latitude
and longitude
values, and removing the ()
characters.
df['LatLon'] = df['LatLon'].str.replace('[()]', '') # remove parentheses
df[['Latitude', 'Longitude']] = df['LatLon'].str.split(',', expand=True) # split column
df.head() # show output
Now we can convert this to a GeoDataFrame
.
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Latitude, df.Longitude), crs="EPSG:4326") # convert to gdf
gdf.to_file("parks.json", driver='GeoJSON')
gdf.info() # inspect output
Now we have the all-important geometry
column for our first dataset.
Polygon Data#
For our second dataset, let’s work with the St. Joseph County zip code boundary file.
-
Note: I’ve renamed this file
zip.geojson
.
gdf = gpd.read_file("zip.geojson") # load file
gdf.head() # show geo dataframe head
Let’s connect those polygons with educational outcome attribute data from the American Community Survey with the St. Joseph County zip code boundary file.
-
Note: I’m working with 2022’s 5 year estimate, which I’ve renamed
data.csv
.
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch12/data.csv") # load attribute data
df.info() # show output
Now we can use GeoPandas
to merge these datasets.
merged = gdf.merge(df, left_on="ZIP", right_on="area") # merged attribute and geospatial data
merged # show merged geodataframe