Pandas#

Some of you may be wondering why we are talking about pandas in a computing course…
Panda: “a large black-and-white mammal (Ailuropoda melanoleuca) of chiefly central China that feeds primarily on bamboo shoots and is now usually classified with the bears (family Ursidae)” (Merriam-Webster)
Wait that’s not right…
pandas: “a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language” (pandas documentation)
That makes more sense.
Background#
At its core, “pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series” (Wikipedia. The name pandas is derived from “panel data,” an econometrics term used to describe particular types of datasets. The name pandas is also a play on “Python data analysis.”
Using Python’s built-in csv and json modules to work with structured data (especially large amounts of data) can be cumbersome and limiting. Since it first launched in 2008, pandas has become a mainstay in Python data analysis workflows.
Additional Resources#
For more on the history and origins of
pandas, check out Wes McKinney’s “pandas: a Foundational Python Library for Data Analysis and Statistics” 2011 paper.