Getting Started With GeoPandas

Getting Started With `GeoPandas`#

Setup & Environment#

First, we’ll need to install GeoPandas. Installing and configuring Geopandas requires creating a new Python environment.

A few resources that can get folks started:

Anaconda
- Tanish Gupta, “Fastest Way to Install Geopandas in Jupyter Notebooks” Analytics Vidhya (6 December 2020)
- Anaconda, “conda-forge packages, geopandas” Anaconda documentation
- GeoPandas, “Installation” GeoPandas documentation
Google CoLab
- Abdishakur Hassan, Jupyter notebook on using geopandas in Google CoLab, from “Geographic data science tutorials with Python” GitHub repository
  - Google CoLab
  - GitHub

Additional GeoPandas resources:

Jonathan Soma, “Mapping with geopandas” from 2017 “Foundations of Computing” course, Columbia Graduate School of Journalism
CoderzColumn, “Plotting Static Maps with geopandas” CoderzColumn (11 March 2020)
GeoPandas, “Plotting with Geoplot and GeoPandas” GeoPandas documentation

# if working in Google Colab
!pip install geopandas

# import statements
import pandas as pd, geopandas as gpd, json, requests

When possible, loading geospatial data (especially polygon data) through GeoPandas will simplify other workflows.

What distinguishes a GeoDataFrame from a standard DataFrame? The all important geometry column.

For more on data structures in GeoPandas:

Dataset #1#

The first dataset we’ll use in this chapter is data about City of South Bend parks.

An API call to bring that data into Python:

import pandas as pd, json, requests # import statements
r = requests.get('https://services1.arcgis.com/0n2NelSAfR7gTkr1/arcgis/rest/services/Parks_Locations_and_Features/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson') # load page
d = r.json() # store as json object

data = [] # empty list for data

for i in d['features']: # iterate over list
  data.append(i['properties']) # isolate value, append to list

df = pd.DataFrame(data) # create dataframe
df.info() # show output

We’ll need latitude and longitude, and those values are currently buried in the Location_1 column. So we’ll start there by splitting out that column on the \n character.

df[['Address', 'City', 'LatLon']] = df['Location_1'].str.split(r'\n', expand=True) # split column
df.head() # show output

We’re closer!

The next step is breaking out the latitude and longitude values, and removing the () characters.

df['LatLon'] = df['LatLon'].str.replace('[()]', '') # remove parentheses
df[['Latitude', 'Longitude']] = df['LatLon'].str.split(',', expand=True) # split column
df.head() # show output

Now we can convert this to a GeoDataFrame.

gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Latitude, df.Longitude), crs="EPSG:4326") # convert to gdf
gdf.to_file("parks.json", driver='GeoJSON')
gdf.info() # inspect output

Now we have the all-important geometry column for our first dataset.

Polygon Data#

For our second dataset, let’s work with the St. Joseph County zip code boundary file.

Link to download
- Note: I’ve renamed this file zip.geojson.

gdf = gpd.read_file("zip.geojson") # load file
gdf.head() # show geo dataframe head

Let’s connect those polygons with educational outcome attribute data from the American Community Survey with the St. Joseph County zip code boundary file.

Download file from GitHub
- Note: I’m working with 2022’s 5 year estimate, which I’ve renamed data.csv.

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/kwaldenphd/elements-of-computing/main/book/data/ch12/data.csv") # load attribute data
df.info() # show output

Now we can use GeoPandas to merge these datasets.

merged = gdf.merge(df, left_on="ZIP", right_on="area") # merged attribute and geospatial data
merged # show merged geodataframe

Getting Started With GeoPandas

Contents

Getting Started With GeoPandas#

Setup & Environment#

Dataset #1#

Polygon Data#

Getting Started With `GeoPandas`#