Beginner

The Tidyverse

Understand the tidyverse ecosystem, its core packages, tidy data principles, tibbles vs data frames, and the pipe operator.

What is the Tidyverse?

The tidyverse is an opinionated collection of R packages designed for data science. All packages share a common design philosophy, grammar, and data structures. Install and load everything at once:

R
# Install (only once)
install.packages("tidyverse")

# Load (every session)
library(tidyverse)
# Attaches: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats

Core Packages

PackagePurposeKey Functions
ggplot2Data visualizationggplot(), aes(), geom_*()
dplyrData manipulationfilter(), select(), mutate(), summarise()
tidyrData tidyingpivot_longer(), pivot_wider(), separate()
readrData importread_csv(), read_tsv(), write_csv()
purrrFunctional programmingmap(), map_dbl(), walk()
tibbleModern data framestibble(), as_tibble(), tribble()
stringrString manipulationstr_detect(), str_replace(), str_extract()
forcatsFactor handlingfct_reorder(), fct_lump(), fct_recode()

Tidy Data Principles

Data is "tidy" when it follows three rules:

  1. Each variable has its own column
  2. Each observation has its own row
  3. Each value has its own cell
R
# NOT tidy (wide format)
#   country  2020  2021  2022
#   USA      100   110   120
#   UK        80    85    90

# TIDY (long format)
#   country  year  value
#   USA      2020  100
#   USA      2021  110
#   USA      2022  120
#   UK       2020   80
#   UK       2021   85
#   UK       2022   90

Tibbles vs Data Frames

Tibbles are the tidyverse's enhanced data frames. Key differences:

R
# Create a tibble
tb <- tibble(
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 35),
  score = c(92.5, 87.3, 95.1)
)

# Create row-by-row with tribble
tb2 <- tribble(
  ~name,     ~age, ~score,
  "Alice",    25,  92.5,
  "Bob",      30,  87.3,
  "Charlie",  35,  95.1
)

# Convert data frame to tibble
as_tibble(mtcars)

# Tibble advantages:
# - Never converts strings to factors
# - Prints only first 10 rows (no console flooding)
# - Shows column types in output
# - Stricter subsetting (no partial name matching)

The Pipe Operator

The pipe is central to tidyverse code. It passes the result of one step as the first argument to the next:

R
library(dplyr)

# Without pipe (nested, hard to read)
arrange(filter(select(mtcars, mpg, cyl, hp), cyl == 6), desc(mpg))

# With pipe (reads top-to-bottom, left-to-right)
mtcars |>
  select(mpg, cyl, hp) |>
  filter(cyl == 6) |>
  arrange(desc(mpg))

# Read as: "Take mtcars, THEN select columns,
#           THEN filter rows, THEN arrange."
Pipe shortcut: In RStudio, press Ctrl+Shift+M (Windows/Linux) or Cmd+Shift+M (macOS) to insert the pipe operator.