Beginner

The Tidyverse

Understand the tidyverse ecosystem, its core packages, tidy data principles, tibbles vs data frames, and the pipe operator.

What is the Tidyverse?

The tidyverse is an opinionated collection of R packages designed for data science. All packages share a common design philosophy, grammar, and data structures. Install and load everything at once:

# Install (only once)
install.packages("tidyverse")

# Load (every session)
library(tidyverse)
# Attaches: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats

Core Packages

Package	Purpose	Key Functions
ggplot2	Data visualization	ggplot(), aes(), geom_*()
dplyr	Data manipulation	filter(), select(), mutate(), summarise()
tidyr	Data tidying	pivot_longer(), pivot_wider(), separate()
readr	Data import	read_csv(), read_tsv(), write_csv()
purrr	Functional programming	map(), map_dbl(), walk()
tibble	Modern data frames	tibble(), as_tibble(), tribble()
stringr	String manipulation	str_detect(), str_replace(), str_extract()
forcats	Factor handling	fct_reorder(), fct_lump(), fct_recode()

Tidy Data Principles

Data is "tidy" when it follows three rules:

Each variable has its own column
Each observation has its own row
Each value has its own cell

# NOT tidy (wide format)
#   country  2020  2021  2022
#   USA      100   110   120
#   UK        80    85    90

# TIDY (long format)
#   country  year  value
#   USA      2020  100
#   USA      2021  110
#   USA      2022  120
#   UK       2020   80
#   UK       2021   85
#   UK       2022   90

Tibbles vs Data Frames

Tibbles are the tidyverse's enhanced data frames. Key differences:

# Create a tibble
tb <- tibble(
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 35),
  score = c(92.5, 87.3, 95.1)
)

# Create row-by-row with tribble
tb2 <- tribble(
  ~name,     ~age, ~score,
  "Alice",    25,  92.5,
  "Bob",      30,  87.3,
  "Charlie",  35,  95.1
)

# Convert data frame to tibble
as_tibble(mtcars)

# Tibble advantages:
# - Never converts strings to factors
# - Prints only first 10 rows (no console flooding)
# - Shows column types in output
# - Stricter subsetting (no partial name matching)

The Pipe Operator

The pipe is central to tidyverse code. It passes the result of one step as the first argument to the next:

library(dplyr)

# Without pipe (nested, hard to read)
arrange(filter(select(mtcars, mpg, cyl, hp), cyl == 6), desc(mpg))

# With pipe (reads top-to-bottom, left-to-right)
mtcars |>
  select(mpg, cyl, hp) |>
  filter(cyl == 6) |>
  arrange(desc(mpg))

# Read as: "Take mtcars, THEN select columns,
#           THEN filter rows, THEN arrange."

✅

Pipe shortcut: In RStudio, press Ctrl+Shift+M (Windows/Linux) or Cmd+Shift+M (macOS) to insert the pipe operator.

← Previous Introduction Next → Data Wrangling (dplyr)