Structured Data#

Specifically, we’re thinking about a couple key data structures:

  • Tabular data, or data stored as a table consisting of rows and columns

  • Data stored as name-value pairs, specifically in the JavaScript Object Notation (.json) format

Linear ArrayAssociative Array

We can connect these two types of data structures back to the concepts of linear and associative arrays covered previously in the course. A quick refresher:

  • “In computer science, an array is a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key…The simplest type of data structure is a linear array, also called one-dimensional array” (Wikipedia, “Array (data structure)”)

  • Associative arrays are “an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection” (Wikpedia, Associative Array)

Tabular (or Delimited) Data#

“A delimited file is a sequential file with column delimiters. Each delimited file is a stream of records, which consists of fields that are ordered by column. Each record contains fields for one row. Within each row, individual fields are separated by column delimiters. All fields must be delimited character strings, non-delimited character strings, or external numeric values. Delimited character strings can contain column delimiters and can also contain character string delimiters when two successive character string delimiters are used to represent one character.” (IBM, “Delimited File Format”, 5 February 2022)

A few characteristics that distinguish .csv files (or other plain-text delimited data formats) from proprietary spreadsheet file types:

  • Columns in a .csv file don’t have a value type. Everything is a string.

  • Values in a .csv file don’t have font or color formatting

  • .csv files only contain single worksheets

  • .csv files don’t store formatting information like cell width/height

  • .csv files don’t recognize merged cells or other kinds of special formatting (frozen or hidden rows/columns, embedded images, etc.)

JavaScript Object Notation#

Start by watching a 3 minute video overview of JSON files & Python.

JavaScript Object Notation (JSON) is as popular way to format data as a single (purportedly human-readable) string. JavaScript programs use JSON data structures, but we can frequently encounter JSON data outside of a JavaScript environment.

Websites that make machine-readable data available via an application programming interface (API- more on this soon) often provide that data in a JSON format. JSON structures can vary WIDELY depending on the specific data provider, but the easiest way to think of JSON data as a plain-text data format made up of something like key-value pairs, like we’ve encountered previously in working with dictionaries (as a type of associative array).

Example JSON string: stringOfJsonData = '{"name": Zophie", "isCat": true, "miceCaught": 0, "felineIQ": null}'. From looking at the example string, we can see field names or keys (name, isCat, miceCaught, felineIQ) and values for those fields.

To use more precise terminology, JSON data has the following attributes:

  • uses name/value pairs

  • separates data using commas

  • holds objects using curly braces {}

  • holds arrays using square brackets []

In our example stringOfJsonData, we have an object contained in curly braces. An object can include multiple name/value pairs. Multiple objects together can form an array.

How is data stored in a JSON format different than a .csv?

  • A .csv file uses characters as delimiters and has more of a tabular (table-like) structure.

  • .json data uses characters as part of the syntax, but not in the same way as delimited data files.

  • The data stored in a JSON format has values that are attached to names (or keys).

    • NOTE: We can mimic this structure somewhat by loading a .csv as a dictionary data structure

  • JSON can also have a hierarchical or nested structure, in that objects can be stored or nested inside other objects as part of the same array.