More Advanced Modeling Workflows#
A deep dive into statistical analysis and modeling techniques is outside the scope of this course. But if folks have encountered those methods in other classes, some Python resources to get started.
geopandas
#
We’ve encountered geopandas
previously, which includes robust support for analyzing geospatial data.
geopandas
incorporates shapely
geometry operation workflows.
spaCy
and nltk
#
We’ve already seen some of the building blocks for computational text analysis and natural language processing. There is a whole field of statistical analysis that focuses on working with text data.
Documentation resources:
Tutorials that can get folks started:
statsmodels
#
statsmodels
is a Python module that supports a variety of regression and linear models, time series analysis, survival and duration analysis, and multivariate statistics.
scikit-learn
#
scikit-learn
is a machine learning Python library that supports a variety of classification algorithms (nearest neighbors, random forest, etc), regression models (nearest neighbors, random forest, etc), and clustering algorithms (k-Means, spectral, mean-shift, etc).scikit-learn
incorporates/is built onmatplotlib
and has robust support for plotting and visualization.
Additional Machine Learning Resources#
For a more holistic introduction to machine learning concept and workflows in Python: Prof. Walden’s “Getting Started With Machine Learning in Python lab