5 Session 2: R Fundamentals
Master data transformation and visualisation
- Duration: 4 hours
- Goal: Gain confidence with R syntax, data transformation, and visualisation using the tidyverse.
In Session 1, you ran pre-written code. Today you learn to write your own – to ask questions of data and get visual answers. By the end, you’ll build an indicator chart from scratch.
5.1 Learning Outcomes
- Understand R’s key data structure (tibble/data frame)
- Use the pipe operator (
%>%or|>) to chain operations - Apply core dplyr verbs:
filter,select,mutate,summarise,group_by - Create multi-layered ggplot2 visualisations
- Debug common R errors
5.2 Session Structure
5.2.1 Part 1: Homework Review & Setup (30 min)
Homework Review (25 min):
- Show-and-tell: Volunteers demonstrate their homework modifications
- Address common questions from homework prep
- Quick poll: What was most confusing? (Adjust session focus accordingly)
Setting Up Your Practise Script (5 min):
Today you’ll write R code from scratch. You need somewhere to work:
- In Positron, open the indicators project if it isn’t already open
File > New File > R Script(orCtrl+Shift+N)- Save it straight away as
scripts/R/session-02-practice.R - Add a header comment at the top:
# Session 2 Practice - R Fundamentals
# Your name, today's dateYou’ll use this file for all the “Try It” exercises today. Run individual lines with Ctrl+Enter or highlight a block and run it with Ctrl+Shift+Enter.
5.2.2 Part 2: R Basics - Data Structures (45 min)
Your indicator data arrives as a table of rows and columns – understanding tibbles is how R represents this.
The Tibble: R’s Spreadsheet
library(tidyverse)
# Load a real WECA dataset
area_data <- read_csv(here::here("data", "examples", "area_employment.csv"))
# View it
area_data
# The $ operator accesses a single column (base R syntax)
area_data$employment_rateKey Concepts:
- Tibbles are like Excel tables with named columns
<-means “assign to” (store a value)$accesses a column- R is case-sensitive:
Population≠population - Comments start with
#
BREAK (15 min)
5.2.3 Part 3: The Tidyverse Way - Verbs and Pipes (60 min)
Building an indicator means filtering to your area, calculating rates, and summarising trends – these five verbs do exactly that.
The Pipe Operator:
# Old way (nested, hard to read)
summarise(filter(area_data, population > 100000),
mean_pop = mean(population))
# Tidyverse way (sequential, readable)
area_data %>%
filter(population > 100000) %>%
summarise(mean_pop = mean(population))Try It (5 min): Rewrite this nested expression using pipes:
select(filter(area_data, employment_rate > 0.5), area, employment_rate)
The Five Core Verbs:
filter()- Keep rows that match a conditionselect()- Keep specific columnsmutate()- Create new columnssummarise()- Calculate summary statisticsgroup_by()- Do calculations by category
Try It (25 min): Using your example dataset, work through each verb:
filter()– keep only rows for a specific areaselect()– keep just the area, year, and value columnsmutate()– add a new column calculating a rate or percentagesummarise()– calculate the mean and max of a numeric columngroup_by() %>% summarise()– calculate summary statistics by areaPair up for these exercises. Person A writes the
filter()andselect()steps; Person B writesmutate()andsummarise(). Swap roles for the ggplot2 exercise in Part 4.
BREAK (15 min)
5.2.4 Part 4: ggplot2 - Grammar of Graphics (60 min)
Every indicator chapter needs at least one chart. This is how you build them.
# Layer 1: Data + aesthetic mapping
ggplot(bus_data, aes(x = year, y = ridership))
# Layer 2: Add geometry
ggplot(bus_data, aes(x = year, y = ridership)) +
geom_line()
# Layer 3: Add styling
ggplot(bus_data, aes(x = year, y = ridership)) +
geom_line(colour = get_weca_color("forest_green"), linewidth = 1) +
geom_point(size = 2) +
labs(title = "Bus Ridership Over Time",
x = "Year",
y = "Ridership (millions)") +
theme_weca()Key ggplot2 Geometries:
geom_line()- Line charts (trends over time)geom_point()- Scatter plots (relationships)geom_col()- Bar charts (comparisons)geom_smooth()- Trend lines
Try It (10 min): Create a bar chart of employment rate by area using
geom_col(). Add WECA colours and labels withlabs().
5.2.5 Part 5: Debugging R Errors (30 min)
Common Error Messages and Fixes:
could not find function "filter"→ Load library:library(tidyverse)object 'data' not found→ Run the chunk that loads the data firstunexpected symbol→ Cheque for missing commas, quotes, or parentheses'x' and 'y' lengths differ→ Data columns have different numbers of rows
Debugging Strategy:
- Read the error message (bottom-up in stack trace)
- Cheque the line number mentioned
- Look for typos in variable/column names
- Run code chunk-by-chunk to isolate the problem
- Use
View(data)to inspect your data frame, use the data explorer or use data |> glimpse()
5.2.6 Part 6: Wrap-up & Homework (15 min)
Reflection (5 min):
Before we finish, take 2 minutes to write down:
- One thing you understand now that you didn’t at the start
- One thing that’s still unclear or you’d like more practise with
Which dplyr verb felt most natural? Which was most confusing? Share with the group if you’re comfortable.
Homework (1-2 hours):
- Complete: The “R Basics” chapter in R for Data Science
- Practise: Using your assigned example dataset:
- Calculate 3 summary statistics (min, max, mean)
- Create 2 different chart types
- Add WECA theme and appropriate labels
- Create: A new indicator section that includes data loading, transformation, visualisation, and findings paragraph
- Prepare: Bring a real WECA dataset (CSV) for Session 3