Beginner

Introduction to R

Discover what R is, where it came from, why it's the language of choice for statisticians and data scientists, and write your very first R code.

What is R?

R is a programming language and environment designed specifically for statistical computing and graphics. It is one of the most widely used languages in academia, pharmaceutical research, finance, and any field that relies on rigorous data analysis and visualization.

Unlike general-purpose languages, R was built from the ground up for data. It provides an extensive collection of statistical and graphical techniques, including linear and nonlinear modeling, time-series analysis, classification, clustering, and much more.

💡
Good to know: R is free and open source, released under the GNU General Public License. It runs on Windows, macOS, and Linux.

History of R

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, in 1993. The name "R" comes partly from the first names of its creators and partly as a play on the language S, developed at Bell Labs in the 1970s.

Key milestones:

  • 1993 — Initial development by Ihaka and Gentleman
  • 1995 — R released as open-source software
  • 1997 — R Core Team formed to guide development
  • 2000 — R 1.0.0 officially released
  • 2004 — First useR! conference
  • 2015 — R Consortium founded (with support from Microsoft, RStudio, and others)
  • 2020+ — Native pipe operator |> introduced in R 4.1

Why Learn R?

R has unique strengths that make it indispensable in certain domains:

  • Statistics: R was designed by statisticians for statisticians. It has the most comprehensive collection of statistical methods of any programming language.
  • Data visualization: With ggplot2, R produces publication-quality graphics that are unmatched in flexibility and aesthetics.
  • Academia & research: R is the standard in many academic disciplines, especially biostatistics, epidemiology, and social sciences.
  • Bioconductor: A massive repository of packages for genomics and computational biology, built on R.
  • Reproducible research: R Markdown and Quarto allow you to combine code, results, and narrative in a single document.
  • Community: Over 20,000 packages on CRAN covering every statistical method imaginable.

R vs Python: A Quick Comparison

Feature R Python
Primary strength Statistical analysis & visualization General-purpose programming
Data visualization ggplot2 (grammar of graphics) matplotlib, seaborn
Data wrangling dplyr, tidyr (tidyverse) pandas
Machine learning tidymodels, caret, mlr3 scikit-learn, TensorFlow, PyTorch
Indexing 1-based 0-based
Community focus Statistics, academia, pharma Web dev, ML engineering, DevOps
IDE RStudio (dominant) VS Code, PyCharm, Jupyter
Key takeaway: R and Python are complementary. Many data professionals use both. R excels in statistical analysis and visualization; Python excels in production engineering and deep learning.

Where R is Used

R is used across many industries and domains:

  • Academia: Research papers, dissertations, and coursework in statistics, biology, psychology, and economics.
  • Pharmaceutical & healthcare: Clinical trial analysis, drug discovery, regulatory submissions (FDA accepts R-generated reports).
  • Finance: Risk modeling, quantitative analysis, time-series forecasting, and portfolio optimization.
  • Biotechnology: Genomics, proteomics, and bioinformatics through Bioconductor packages.
  • Government: Census analysis, public health reporting, and policy research.

Your First R Code: "Hello, World!"

Let's write the simplest possible R program:

R
# Your first R program
print("Hello, World!")

# R can also print without the print() function
"Hello from R!"

# Basic arithmetic
2 + 3       # 5
10 / 3     # 3.333333
2 ^ 10     # 1024

# Assign a variable
x <- 42
print(x)   # 42

The R Console and Scripts

You can interact with R in two main ways:

  • R Console (interactive): Type commands one at a time and see results immediately. Great for exploration and quick calculations.
  • R Scripts (.R files): Write multiple lines of code in a file and run them all at once. Essential for reproducible work.
R Console Session
> 1 + 1
[1] 2
> name <- "R Programming"
> nchar(name)
[1] 13
> toupper(name)
[1] "R PROGRAMMING"
📚
About the [1]: When R prints a result, the [1] indicates the index of the first element in the output. R is 1-indexed, meaning counting starts from 1, not 0.