Getting Started With GeoPandas

Getting Started With GeoPandas#

Setup & Environment#

First, we’ll need to install GeoPandas. Installing and configuring Geopandas requires creating a new Python environment.

A few resources that can get folks started:

Additional GeoPandas resources:

# if working in Google Colab
!pip install geopandas
# import statements
import pandas as pd, geopandas as gpd, json, requests

When possible, loading geospatial data (especially polygon data) through GeoPandas will simplify other workflows.

What distinguishes a GeoDataFrame from a standard DataFrame? The all important geometry column.

For more on data structures in GeoPandas:

Dataset #1#

The first dataset we’ll use in this chapter is data about City of South Bend parks.

An API call to bring that data into Python:

import pandas as pd, json, requests # import statements
r = requests.get('https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/Parks_Locations_and_Features/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson') # load page
d = r.json() # store as json object

data = [] # empty list for data

for i in d['features']: # iterate over list
  data.append(i['properties']) # isolate value, append to list

df = pd.DataFrame(data) # create dataframe
df.info() # show output

We’ll need latitude and longitude, and those values are currently buried in the Location_1 column. So we’ll start there by splitting out that column on the \n character.

df[['Address', 'City', 'LatLon']] = df['Location_1'].str.split(r'\n', expand=True) # split column
df.head() # show output

We’re closer!

The next step is breaking out the latitude and longitude values, and removing the () characters.

df['LatLon'] = df['LatLon'].str.replace('[()]', '') # remove parentheses
df[['Latitude', 'Longitude']] = df['LatLon'].str.split(',', expand=True) # split column
df.head() # show output

Now we can convert this to a GeoDataFrame.

gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Latitude, df.Longitude), crs="EPSG:4326") # convert to gdf
gdf.to_file("parks.json", driver='GeoJSON')
gdf.info() # inspect output

Now we have the all-important geometry column for our first dataset.

Polygon Data#

For our second dataset, let’s work with the St. Joseph County zip code boundary file.

gdf = gpd.read_file("zip.geojson") # load file
gdf.head() # show geo dataframe head

Let’s connect those polygons with educational outcome attribute data from the American Community Survey with the St. Joseph County zip code boundary file.

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch12/data.csv") # load attribute data
df.info() # show output

Now we can use GeoPandas to merge these datasets.

merged = gdf.merge(df, left_on="ZIP", right_on="area") # merged attribute and geospatial data
merged # show merged geodataframe