Using OpenRefine#

As described in Library Carpentry’s “Introduction to OpenRefine”:

"OpenRefine displays data in a tabular format. Each row will usually represent a ‘record’ in the data, while each column represents a type of information. This is very similar to how you might view data in a spreadsheet or database. As with a spreadsheet, the individual bits of data live in ‘cells’ at the intersection of a row and a column.


OpenRefine only displays a limited number of rows of data at one time. You can adjust the number choosing between 5, 10 (the default), 25 and 50 at the top left of the table of data. You can navigate through the records by using the previous/next/first/last navigation options at the top right of the table of data."

Faceting and Filtering#

As described in Library Carpentry’s “Introduction to OpenRefine”:

"Facets are one of the most useful features of OpenRefine and can help in both getting an overview of the data and to improve the consistency of the data.


A ‘Facet’ groups all the values that appear in a column, and then allows you to filter the data by these values and edit values across many records at the same time.


The simplest type of Facet is called a ‘Text facet’. This simply groups all the text values in a column and lists each value with the number of records it appears in. The facet information always appears in the left hand panel in the OpenRefine interface.


To create a Text Facet for a column, click on the drop down menu at the top of the publisher column and choose `Facet -> Text Facet`. The facet will then appear in the left hand panel.


The facet consists of a list of values used in the data. You can filter the data displayed by clicking on one of these headings.


You can include multiple values from the facet in a filter at one time by using the `Include` option which appears when you put your mouse over a value in the Facet.


You can also invert the filter to show all records which do not match your selected values. This option appears at the top of the Facet panel when you select a value from the facet to apply as a filter."

Select the drop-down arrow for one of the columns that contains a pattern error. Select Facet > Text Facet.

The facet will now appear on the left-hand side of the page. Click a line in the facet to select rows with that value.

  • Use the Include option to select multiple values.

  • Use the Edit option to address a pattern error.

Do this for other pattern errors. Consult the following resources as needed to understand this data:

Exporting from OpenRefine#

Click on the Export button in the top right-hand corner and select the option to export your OpenRefine project as a CSV file.

NOTE: OpenRefine will only export the rows of data selected in your current review. Be sure to remove all filters or facets before exporting.

Make sure this file has a unique name and is saved in a location where you can find it again. Open this new CSV file in a spreadsheet program. Check to see the pattern errors have been addressed.

Application#

Q2A: Starting with the pattern errors you identified for Q1, how would you address those issues using OpenRefine?

Q2B: Provide an outline for your data processing workflow. A set of steps or tasks is a good place to start. You’re welcome to include screenshots as that would be helpful.

Q2C: Work through cleaning the players and teams tables. Once you’re done, export the .csv files and add them to the Google Sheets template for this chapter.

Q2D: Compare your experience working in OpenRefine to other experiences you have had in a text editor or spreadsheet program. In what ways do you understand, perceive, or relate to the data differently through working in OpenRefine? Reflect on and describe your experience cleaning this data in OpenRefine.

Additional Resources#

We are barely scratching the surface of what is possible with data wrangling in OpenRefine. The progam can also standardize capitalization, remove leading and trailing spaces, and address other commonly-found data errors. Library Carpentry’s OpenRefine tutorial goes in to greater detail about many of these other functions.