More Advanced Modeling Workflows#

A deep dive into statistical analysis and modeling techniques is outside the scope of this course. But if folks have encountered those methods in other classes, some Python resources to get started.

geopandas#

We’ve encountered geopandas previously, which includes robust support for analyzing geospatial data.

geopandas incorporates shapely geometry operation workflows.

spaCy and nltk#

We’ve already seen some of the building blocks for computational text analysis and natural language processing. There is a whole field of statistical analysis that focuses on working with text data.

Documentation resources:

Tutorials that can get folks started:

statsmodels#

  • statsmodels is a Python module that supports a variety of regression and linear models, time series analysis, survival and duration analysis, and multivariate statistics.

scikit-learn#

  • scikit-learn is a machine learning Python library that supports a variety of classification algorithms (nearest neighbors, random forest, etc), regression models (nearest neighbors, random forest, etc), and clustering algorithms (k-Means, spectral, mean-shift, etc). scikit-learn incorporates/is built on matplotlib and has robust support for plotting and visualization.

Additional Machine Learning Resources#

For a more holistic introduction to machine learning concept and workflows in Python: Prof. Walden’s “Getting Started With Machine Learning in Python lab