Instructor Notes

Datasets


Throughout this workshop, we use four primary data sets:

The Check-In Dataset is based on a 2018 state election. The data set tracks check-in times and lengths at ballot scanners across various locations, as well as the precinct that each device belongs to. Additionally, all identifiable information has been anonymized using pseudo-anonymization. The direct download link for the data file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/checkin_data.csv The direct download link for the sampled data file (for ggplot2) is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/checkin_sample_plotting.csv

The Messy Dataset is based on a real-life election example and tracks the amount of time individuals took to check-in to a voting location. For check-ins that took a longer amount of time than average, an explanation is given. The direct download link for the data file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/messy_data.csv

The GoT Dataset is a fictional data set based on the Game of Thrones universe. It consists of graphing polygons and voting data representing the percentage of voters that voted for Jon Snow or Daenerys Targaryen. The direct download link for the CSV file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/voting_GoT.csv The direct download link for the GeoJSON file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/polygons_GoT.json

The Check-In Snippet is a JSON representation of a fictional data set based on the Anonymized Dataset. It includes information as to what precinct, polling location, and scanner was used, as well as the amount of arrivals, with the time of the first and last arrival. The direct download link for the data file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/checkin_snippet.json

Lesson Plans


The lesson contains significantly more material than can be taught in a day. Instructors should pick an appropriate subset of episodes to use in a standard one day course.

Suggested path for a half-day course: - Before we Start - Introduction to R - Starting with Data - Data Wranging with dplyr

Suggested path for a full-day course: - Before we Start - Introduction to R - Starting with Data - Data Wrangling with tidyr (OPTIONAL) - Data Visualization with ggplot2

For a two-day workshop, it may be possible to cover all of the episodes. Feedback from the community on successful lesson plans is always appreciated!

Technical Tips and Tricks


  • Show learners how to use the ‘zoom’ button to blow up graphs without constantly resizing RStudio windows.

  • Sometimes a package will not install. You can try a different CRAN mirror by following the path: “Tools > Global Options > Packages > CRAN Mirror” Alternatively you can go to CRAN and download the package and install from a ZIP file, following the path: “Tools > Install Packages > set to ‘from Zip/TAR’””

  • It’s often easier to ensure learners have all the needed packages installed at once, rather than dealing with these issues over and over. See the “Setup Instructions” section on the homepage of the course website for package installation instructions.

  • In regards to the | character on Spanish keyboards, the Spanish Mac keyboard does not have a | key. This character can be created using:

    `alt` + `1`

Other Resources


If you encounter a problem during a workshop, feel free to contact the maintainers by email or open an issue.

For a more in-depth coverage of topics of the workshops, you may want to read “R for Data Science” by Hadley Wickham and Garrett Grolemund.

Before we Start


Instructor Note

  • The main goal here is to help learners be comfortable with the RStudio interface.
  • Go very slowly in the “Getting set up” section. Make sure everyone is following along (remind learners to use the stickies). Plan with the helpers at this point to go around the room, and be available to help. It’s important to make sure that learners are in the correct working directory, and that they create a data (all lowercase) sub-folder.


Introduction to R


Instructor Note

  • The main goal is to introduce users to the various objects in R, from atomic types to creating your own objects.
  • While this episode is foundational, be careful not to get caught in the weeds as the variety of types and operations can be overwhelming for new users, especially before they understand how this fits into their own “workflow.”


Starting with Data


Instructor Note

The main goals for this lessons are:

  • Ensure learners are comfortable with working with data frames and tibbles, and can use the bracket notation to select slices/columns.
  • Make sure learners can import data into R and convert data into tibbles for analysis purposes.
  • Expose learners to factors. Their behavior is not necessarily intuitive, so it is important that they are guided through it the first time they are exposed to it. The content of the lesson should be enough for learners to avoid common mistakes with them.
  • Expose learners to dates, and ensure they can convert strings to date format if need be.


Data Wrangling with dplyr


Instructor Note

  • The main goal of this lesson is to introduce the dplyr package – a powerful tool for data manipulation in R.
  • We will cover the basics such as selecting columns, filtering rows, chaining commands with pipes, creating new columns with mutate, and summarizing data by grouping.
  • When covering pipes, some may find it helpful to read the pipe like the word “then”. Thus, when explaining the workflow, phrase it as “we take the data then we filter our rows then we select our columns”
  • While the dplyr functions simplify data wrangling, it’s important to ensure learners grasp each concept step by step to build a solid foundation for their future data analysis workflow.
  • If you would like additional information and visual representation of the function, the following site showcases some good tutorials and visuals: https://tidydatatutor.com/


Data Wrangling with tidyr


Instructor Note

  • The primary goal is to help learners understand how to clean their data and change table formats for different uses.
  • Similarly to Data Wrangling with dplyr, this lesson works better if you can use graphics to demonstrate the difference between long and wide table formats. We have created this Google Slides deck for this purpose!


Data Visualisation with ggplot2


Getting Started with R Markdown (optional)


Instructor Note

  • This is an optional lesson intended to introduce learners to R Markdown.
  • While it is listed after the core lessons, some instructors may prefer to teach it earlier in the workshop, depending on the needs of the audience.