Summary and Setup
This is a new lesson built with The Carpentries Workbench.
Setup Instructions
First, it’s important to understand that R and RStudio are two different programs that need to be downloaded and installed separately. R serves as the underlying statistical computing environment, but using R by itself is very difficult. TO simplify the experience of using R, RStudio (a graphical integrated development environment, or IDE) is used, as it is much simpler and more interactive. However, before you install RStudio, you still need to install R, as it depends on the underlying processing of R to run. Additionally, there is no need to manually run R, as RStudio will automatically start it and run it in the background.
After ensuring the installation of both programs, you will need to
install the tidyverse and
here packages from within RStudio. The
tidyverse package provides a powerful
collection of data science tools within R (you can see
the see the tidyverse
website for more details), and the
here package simplifies file access.
Follow the instructions below to install/update R and RStudio for
your operating system, and then follow the instructions at the end to
install tidyverse and
here.
After installing R and RStudio:
- If you are running Linux, before installing the
tidyversepackage, Ubuntu (and related) users may need to install the following dependencies:libcurl4-openssl-dev libssl-dev libxml2-dev(e.g.sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev). - To install the
tidyversepackage, in the console, typeinstall.packages("tidyverse"), followed by the enter key. - To install the
herepackage, in the console, typeinstall.packages("here"), followed by the enter key. - To ensure both packages are installed, select Packages on
the right, under User Library, check that
tidyverseandhereare listed.
Datasets
Throughout this workshop, we use four primary data sets:
We recommend that you download a single zip file with all of the files and then unzip it. Move the unzipped folder to somewhere on your system that you can find (e.g. Desktop or Documents).
The Check-In Dataset is based on a 2018 state election. The data set tracks check-in times and lengths at ballot scanners across various locations, as well as the precinct that each device belongs to. Additionally, all identifiable information has been anonymized using pseudo-anonymization. The direct download link for the data file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/checkin_data.csv The direct download link for the sampled data file (for ggplot2) is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/checkin_sample_plotting.csv
The Messy Dataset is based on a real-life election example and tracks the amount of time individuals took to check-in to a voting location. For check-ins that took a longer amount of time than average, an explanation is given. The direct download link for the data file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/messy_data.csv
The GoT Dataset is a fictional data set based on the Game of Thrones universe. It consists of graphing polygons and voting data representing the percentage of voters that voted for Jon Snow or Daenerys Targaryen. The direct download link for the CSV file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/voting_GoT.csv The direct download link for the GeoJSON file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/polygons_GoT.json
The Check-In Snippet is a JSON representation of a fictional data set based on the Anonymized Dataset. It includes information as to what precinct, polling location, and scanner was used, as well as the amount of arrivals, with the time of the first and last arrival. The direct download link for the data file is: https://raw.githubusercontent.com/EngineeringForDemocracy/r-election-workers/main/episodes/data/checkin_snippet.json