Best Practices
Write clean, maintainable R code with proper style, project organization, debugging, testing, and package management.
R Style Guide (Tidyverse Style)
The Tidyverse Style Guide is the community standard. Key rules:
# Naming: use snake_case for variables and functions day_one <- 1 # Good DayOne <- 1 # Bad (CamelCase) day.one <- 1 # Bad (dots) # Spacing: spaces around operators x <- 1 + 2 # Good x<-1+2 # Bad # Use <- for assignment, not = x <- 10 # Good x = 10 # Avoid # Line length: max 80 characters # Use pipe for readability df |> filter(age > 30) |> select(name, salary) |> arrange(desc(salary))
Project Organization
my-project/
my-project.Rproj # RStudio project file
R/ # R scripts
01-load-data.R
02-clean-data.R
03-analysis.R
functions.R # Helper functions
data/
raw/ # Original data (never modify)
processed/ # Cleaned data
output/
figures/ # Plots
tables/ # Result tables
docs/
report.Rmd # R Markdown report
renv.lock # Package dependencies
README.md
RStudio Projects (.Rproj)
Always use RStudio Projects. They provide:
- Automatic working directory set to the project root
- Separate R history and workspace per project
- Easy switching between projects
- Integration with version control (Git)
Package Management with renv
# Initialize renv in a project renv::init() # Take a snapshot of current packages renv::snapshot() # Restore packages from lockfile (on another machine) renv::restore() # Update packages renv::update()
Debugging
# Insert browser() to pause execution my_function <- function(x) { result <- x * 2 browser() # Execution stops here; inspect variables result + 10 } # View the call stack after an error traceback() # Debug a specific function debug(my_function) # Step through on next call undebug(my_function) # Stop debugging # Print debugging message("Processing row: ", i) # Goes to stderr cat("Value:", x, "\n") # Goes to stdout
Testing with testthat
library(testthat) # Basic tests test_that("addition works", { expect_equal(1 + 1, 2) expect_true(2 > 1) expect_type(42, "double") }) test_that("errors are caught", { expect_error(log("text")) expect_warning(log(-1)) })
Common R vs Python Differences
| Concept | R | Python |
|---|---|---|
| Indexing | 1-based | 0-based |
| Assignment | <- | = |
| Boolean | TRUE / FALSE | True / False |
| NULL check | is.null(x) | x is None |
| Missing value | NA | None / NaN |
| String concat | paste() | + or f-strings |
| Package install | install.packages() | pip install |
| Data frame | data.frame() | pd.DataFrame() |
| Pipe | |> or %>% | Method chaining |
Frequently Asked Questions
Use <- for assignment. It is the community standard and avoids confusion with function argument assignment, which uses =. For example: mean(x = 1:10) passes an argument, while x <- mean(1:10) assigns the result.
Choose R for statistical analysis, data visualization, academic research, and biostatistics. Choose Python for production ML systems, web development, automation, and general-purpose programming. Many professionals use both.
1) Use vectorized operations instead of loops. 2) Pre-allocate vectors instead of growing them. 3) Use data.table for large datasets. 4) Profile with system.time() or the profvis package. 5) Consider Rcpp for C++ integration in bottlenecks.
The tidyverse is a collection of R packages (ggplot2, dplyr, tidyr, etc.) designed for data science. Yes, absolutely learn it. It provides a consistent, readable syntax that is now the standard for data analysis in R. Our "R for Data Science" course covers it in depth.