Advanced

Best Practices

Write clean, maintainable R code with proper style, project organization, debugging, testing, and package management.

R Style Guide (Tidyverse Style)

The Tidyverse Style Guide is the community standard. Key rules:

# Naming: use snake_case for variables and functions
day_one <- 1          # Good
DayOne <- 1           # Bad (CamelCase)
day.one <- 1          # Bad (dots)

# Spacing: spaces around operators
x <- 1 + 2            # Good
x<-1+2               # Bad

# Use <- for assignment, not =
x <- 10               # Good
x = 10                # Avoid

# Line length: max 80 characters
# Use pipe for readability
df |>
  filter(age > 30) |>
  select(name, salary) |>
  arrange(desc(salary))

Project Organization

Project Structure

my-project/
  my-project.Rproj     # RStudio project file
  R/                   # R scripts
    01-load-data.R
    02-clean-data.R
    03-analysis.R
    functions.R         # Helper functions
  data/
    raw/               # Original data (never modify)
    processed/         # Cleaned data
  output/
    figures/           # Plots
    tables/            # Result tables
  docs/
    report.Rmd         # R Markdown report
  renv.lock            # Package dependencies
  README.md

RStudio Projects (.Rproj)

Always use RStudio Projects. They provide:

Automatic working directory set to the project root
Separate R history and workspace per project
Easy switching between projects
Integration with version control (Git)

Package Management with renv

# Initialize renv in a project
renv::init()

# Take a snapshot of current packages
renv::snapshot()

# Restore packages from lockfile (on another machine)
renv::restore()

# Update packages
renv::update()

Debugging

# Insert browser() to pause execution
my_function <- function(x) {
  result <- x * 2
  browser()  # Execution stops here; inspect variables
  result + 10
}

# View the call stack after an error
traceback()

# Debug a specific function
debug(my_function)    # Step through on next call
undebug(my_function)  # Stop debugging

# Print debugging
message("Processing row: ", i)  # Goes to stderr
cat("Value:", x, "\n")        # Goes to stdout

Testing with testthat

library(testthat)

# Basic tests
test_that("addition works", {
  expect_equal(1 + 1, 2)
  expect_true(2 > 1)
  expect_type(42, "double")
})

test_that("errors are caught", {
  expect_error(log("text"))
  expect_warning(log(-1))
})

Common R vs Python Differences

Concept	R	Python
Indexing	1-based	0-based
Assignment	`<-`	`=`
Boolean	`TRUE / FALSE`	`True / False`
NULL check	`is.null(x)`	`x is None`
Missing value	`NA`	`None / NaN`
String concat	`paste()`	`+ or f-strings`
Package install	`install.packages()`	`pip install`
Data frame	`data.frame()`	`pd.DataFrame()`
Pipe	`\|>` or `%>%`	Method chaining

Frequently Asked Questions

Use <- for assignment. It is the community standard and avoids confusion with function argument assignment, which uses =. For example: mean(x = 1:10) passes an argument, while x <- mean(1:10) assigns the result.

Choose R for statistical analysis, data visualization, academic research, and biostatistics. Choose Python for production ML systems, web development, automation, and general-purpose programming. Many professionals use both.

1) Use vectorized operations instead of loops. 2) Pre-allocate vectors instead of growing them. 3) Use data.table for large datasets. 4) Profile with system.time() or the profvis package. 5) Consider Rcpp for C++ integration in bottlenecks.

The tidyverse is a collection of R packages (ggplot2, dplyr, tidyr, etc.) designed for data science. Yes, absolutely learn it. It provides a consistent, readable syntax that is now the standard for data analysis in R. Our "R for Data Science" course covers it in depth.

← Previous File I/O