Introduction

This tutorial demonstrates how to perform self-contained gene set testing using three related methods from the limma package:

  • ROAST: Rotation gene set test for a single gene set
  • MROAST: Multiple ROAST tests with multiple testing correction
  • FRY: Fast approximation to MROAST with infinite rotations

These methods test whether genes in a pathway are differentially expressed, comparing against a null hypothesis of no differential expression. This is different from competitive tests like CAMERA, which test whether genes in a set are more differentially expressed than genes outside the set.

We’ll use the ALL (Acute Lymphoblastic Leukemia) dataset to compare B-cell vs T-cell ALL samples and identify enriched KEGG pathways.


1. Load Required Packages

We need several Bioconductor packages for this analysis:

  • limma: For differential expression and gene set testing
  • ALL: The dataset we’ll analyze
  • hgu95av2.db: Annotation for the microarray platform
  • org.Hs.eg.db: Human gene annotations
  • clusterProfiler: For downloading KEGG pathway information
library(limma)
library(ALL)
library(hgu95av2.db)
library(org.Hs.eg.db)
library(clusterProfiler)

2. Load and Prepare the ALL Dataset

The ALL dataset contains gene expression measurements from acute lymphoblastic leukemia patients. We’ll compare B-cell and T-cell samples to identify differentially expressed pathways.

# Load the ALL dataset
data(ALL)

# Select B-cell and T-cell samples
bcell_samples <- which(ALL$BT %in% c("B", "B1", "B2", "B3", "B4"))
tcell_samples <- which(ALL$BT == "T")

# Create a balanced subset
samples_to_keep <- c(bcell_samples[1:10], tcell_samples[1:5])
ALL_subset <- ALL[, samples_to_keep]

# Extract expression data
exprs_data <- exprs(ALL_subset)

# Create design matrix
cell_type <- factor(ifelse(ALL_subset$BT == "T", "T", "B"))
design <- model.matrix(~0 + cell_type)
colnames(design) <- c("B_cell", "T_cell")

print("Sample composition:")
## [1] "Sample composition:"
print(table(cell_type))
## cell_type
##  B  T 
## 10  5

The design matrix uses a group-means parameterization (no intercept), which makes it easier to specify contrasts between groups.


3. Differential Expression Analysis

We use limma to identify differentially expressed genes between T-cell and B-cell samples. This analysis provides the statistical foundation for gene set testing.

# Fit linear model
fit <- lmFit(exprs_data, design)

# Define contrast: T-cell vs B-cell
contrast_matrix <- makeContrasts(
  TvsB = T_cell - B_cell,
  levels = design
)

# Fit contrasts and apply empirical Bayes
fit2 <- contrasts.fit(fit, contrast_matrix)
fit2 <- eBayes(fit2)

# View top genes
top_genes <- topTable(fit2, coef = "TvsB", number = 10)
print("Top 10 differentially expressed genes:")
## [1] "Top 10 differentially expressed genes:"
print(top_genes[, c("logFC", "AveExpr", "t", "P.Value", "adj.P.Val")])
##                logFC   AveExpr          t      P.Value    adj.P.Val
## 38319_at    4.483282  6.385744  13.829293 3.094569e-10 3.906894e-06
## 38147_at    2.726426  4.519145  12.825759 9.269009e-10 5.851062e-06
## 2031_s_at  -3.292335  8.801557 -11.095824 7.347931e-09 3.092254e-05
## 37039_at   -2.678274 11.234888  -9.753845 4.397852e-08 1.388072e-04
## 41723_s_at -3.354824  8.195279  -8.804350 1.746216e-07 4.409197e-04
## 41409_at   -1.954163  7.165671  -8.589302 2.420616e-07 4.421542e-04
## 2059_s_at   2.228525  7.176821   8.581013 2.451548e-07 4.421542e-04
## 38095_i_at -3.272734 10.384079  -8.465467 2.928754e-07 4.621940e-04
## 266_s_at   -3.522492  7.100962  -8.141603 4.863535e-07 5.729679e-04
## 32506_at   -1.204799  6.644397  -8.103231 5.169197e-07 5.729679e-04

The eBayes function computes moderated t-statistics by borrowing information across genes, which improves statistical power especially for small sample sizes.


4. Prepare KEGG Pathway Gene Sets

Gene set testing requires mapping our microarray probes to gene identifiers and then to pathway annotations. We use KEGG pathways as our gene sets.

4.1 Map Probes to Entrez Gene IDs

# Get probe to Entrez Gene ID mapping
probe_to_entrez <- as.list(hgu95av2ENTREZID)
probe_to_entrez <- probe_to_entrez[!is.na(probe_to_entrez)]

# Map our probes to Entrez IDs
probe_ids <- rownames(exprs_data)
entrez_mapping <- unlist(probe_to_entrez[probe_ids])
entrez_mapping <- entrez_mapping[!is.na(entrez_mapping)]

print(paste("Number of probes mapped to Entrez IDs:", length(entrez_mapping)))
## [1] "Number of probes mapped to Entrez IDs: 11683"

4.2 Get KEGG Pathway Annotations

We use the org.Hs.egPATH object which maps Entrez Gene IDs to KEGG pathway IDs.

# Get gene-to-pathway mapping
pathway_map <- org.Hs.egPATH
mapped_genes <- mappedkeys(pathway_map)
gene_to_pathway <- as.list(pathway_map[mapped_genes])

# Invert to create pathway-to-genes mapping
pathway_to_genes <- list()
for (gene_id in names(gene_to_pathway)) {
  pathways <- gene_to_pathway[[gene_id]]
  for (pathway_id in pathways) {
    if (is.null(pathway_to_genes[[pathway_id]])) {
      pathway_to_genes[[pathway_id]] <- character(0)
    }
    pathway_to_genes[[pathway_id]] <- c(pathway_to_genes[[pathway_id]], gene_id)
  }
}

print(paste("Total KEGG pathways:", length(pathway_to_genes)))
## [1] "Total KEGG pathways: 229"

4.3 Create Gene Set Indices

ROAST, MROAST, and FRY require gene sets as lists of row indices in the expression matrix.

create_gene_set_indices <- function(pathway_list, gene_mapping, min_genes = 10) {
  pathway_indices <- list()
  
  for (pathway_id in names(pathway_list)) {
    pathway_genes <- pathway_list[[pathway_id]]
    probes_in_pathway <- names(gene_mapping)[gene_mapping %in% pathway_genes]
    indices <- which(rownames(exprs_data) %in% probes_in_pathway)
    
    # Only keep pathways with sufficient genes
    if (length(indices) >= min_genes) {
      pathway_indices[[pathway_id]] <- indices
    }
  }
  
  return(pathway_indices)
}

gene_set_indices <- create_gene_set_indices(pathway_to_genes, entrez_mapping)

print(paste("Pathways with >=10 genes:", length(gene_set_indices)))
## [1] "Pathways with >=10 genes: 210"

We filter out pathways with fewer than 10 genes because very small gene sets have limited statistical power and biological interpretability.


5. Download KEGG Pathway Descriptions

To make results interpretable, we need pathway names and descriptions.

# Download KEGG data using clusterProfiler
kegg_info <- download_KEGG("hsa")

# Create pathway ID to name mapping
pathway_names <- setNames(
  kegg_info$KEGGPATHID2NAME$to, 
  kegg_info$KEGGPATHID2NAME$from
)

print(paste("Pathway names available:", length(pathway_names)))
## [1] "Pathway names available: 368"
print("Example pathways:")
## [1] "Example pathways:"
print(head(pathway_names, 3))
##                          hsa01100                          hsa01200 
##              "Metabolic pathways"               "Carbon metabolism" 
##                          hsa01210 
## "2-Oxocarboxylic acid metabolism"
# Helper function to add descriptions to results
add_pathway_descriptions <- function(results, pathway_ids_col = NULL) {
  if (is.null(pathway_ids_col)) {
    pathway_ids <- rownames(results)
  } else {
    pathway_ids <- results[[pathway_ids_col]]
  }
  
  # Pad pathway IDs to match KEGG format (e.g., "810" -> "hsa00810")
  pathway_ids_padded <- sprintf("%05s", pathway_ids)
  pathway_ids_formatted <- paste0("hsa", pathway_ids_padded)
  
  results$Description <- pathway_names[pathway_ids_formatted]
  results$Description[is.na(results$Description)] <- NA
  
  return(results)
}

6. ROAST: Single Gene Set Test

ROAST (Rotation gene Set Test) tests a single gene set using rotation-based p-values. It’s ideal when you have a specific pathway hypothesis to test.

ROAST estimates p-values by randomly rotating orthogonalized residuals, which preserves the correlation structure while breaking the association with the outcome.

# Test the first pathway as an example
test_pathway_id <- names(gene_set_indices)[1]
test_pathway_indices <- gene_set_indices[[test_pathway_id]]

# Get pathway description
pathway_id_formatted <- paste0("hsa", sprintf("%05s", test_pathway_id))
pathway_description <- pathway_names[pathway_id_formatted]

cat("Testing pathway:", test_pathway_id, "\n")
## Testing pathway: 04610
cat("Description:", pathway_description, "\n")
## Description: Complement and coagulation cascades
cat("Number of genes:", length(test_pathway_indices), "\n\n")
## Number of genes: 73
# Run ROAST with high number of rotations
roast_result <- roast(
  y = exprs_data,
  index = test_pathway_indices,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  nrot = 9999
)

print(roast_result)
##          Active.Prop     P.Value
## Down      0.04109589 0.996699835
## Up        0.34246575 0.003350168
## UpOrDown  0.34246575 0.006700000
## Mixed     0.38356164 0.002300000

Interpreting ROAST output:

  • Down: Tests if genes tend to be down-regulated (negative t-statistics)
  • Up: Tests if genes tend to be up-regulated (positive t-statistics)
  • UpOrDown: Two-sided test for any directional change
  • Mixed: Tests for differential expression in either direction
  • Active.Prop: Proportion of genes contributing materially to significance

The number of rotations (nrot=9999) should be high for publication-quality results. The minimum possible p-value is 1/(nrot+1).


7. MROAST: Multiple Gene Set Tests

MROAST extends ROAST to test multiple gene sets simultaneously with appropriate multiple testing correction.

MROAST is suitable when testing a moderate number of pathways (up to ~100). For larger numbers, FRY is more efficient.

# Test a subset of pathways (first 50)
n_pathways_test <- min(50, length(gene_set_indices))
pathways_to_test <- gene_set_indices[1:n_pathways_test]

cat("Testing", n_pathways_test, "KEGG pathways with MROAST...\n")
## Testing 50 KEGG pathways with MROAST...
cat("This may take a few minutes...\n\n")
## This may take a few minutes...
# Run MROAST
mroast_results <- mroast(
  y = exprs_data,
  index = pathways_to_test,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  nrot = 9999,
  adjust.method = "BH",
  sort = "directional"
)

# Add pathway descriptions
mroast_results <- add_pathway_descriptions(mroast_results)

# Display top results
print("Top 15 pathways from MROAST:")
## [1] "Top 15 pathways from MROAST:"
print(mroast_results[1:15, c("NGenes", "PropDown", "PropUp", 
                              "Direction", "PValue", "FDR", "Description")])
##       NGenes   PropDown    PropUp Direction PValue        FDR
## 04146     61 0.04918033 0.3278689        Up 0.0003 0.01250000
## 00650     31 0.00000000 0.4193548        Up 0.0007 0.01416667
## 00410     23 0.00000000 0.4782609        Up 0.0009 0.01416667
## 00592     11 0.00000000 0.4545455        Up 0.0013 0.01458333
## 00640     29 0.00000000 0.4482759        Up 0.0017 0.01458333
## 05131    100 0.27000000 0.0500000      Down 0.0018 0.01458333
## 00071     43 0.02325581 0.4418605        Up 0.0024 0.01678571
## 00250     33 0.03030303 0.3636364        Up 0.0028 0.01718750
## 00340     22 0.04545455 0.4545455        Up 0.0034 0.01861111
## 00280     45 0.00000000 0.4444444        Up 0.0056 0.02410714
## 01100    966 0.09420290 0.2826087        Up 0.0060 0.02410714
## 04610     73 0.04109589 0.3424658        Up 0.0063 0.02410714
## 00380     45 0.00000000 0.3777778        Up 0.0064 0.02410714
## 00620     39 0.07692308 0.3076923        Up 0.0068 0.02410714
## 00740     12 0.08333333 0.4166667        Up 0.0096 0.03183333
##                                       Description
## 04146                                  Peroxisome
## 00650                        Butanoate metabolism
## 00410                     beta-Alanine metabolism
## 00592             alpha-Linolenic acid metabolism
## 00640                       Propanoate metabolism
## 05131                                 Shigellosis
## 00071                      Fatty acid degradation
## 00250 Alanine, aspartate and glutamate metabolism
## 00340                        Histidine metabolism
## 00280  Valine, leucine and isoleucine degradation
## 01100                          Metabolic pathways
## 04610         Complement and coagulation cascades
## 00380                       Tryptophan metabolism
## 00620                         Pyruvate metabolism
## 00740                       Riboflavin metabolism

Interpreting MROAST columns:

  • NGenes: Number of genes in the pathway
  • PropDown: Proportion of genes with z-score < -√2
  • PropUp: Proportion of genes with z-score > √2
  • Direction: Overall direction of enrichment (“Up” or “Down”)
  • PValue: Two-sided directional p-value
  • FDR: False discovery rate (Benjamini-Hochberg correction)
  • PValue.Mixed: Non-directional p-value
  • FDR.Mixed: FDR for non-directional test

8. FRY: Fast Gene Set Testing

FRY is a fast approximation that simulates what MROAST would give with an infinite number of rotations. It’s ideal for testing large numbers of pathways.

FRY can test thousands of pathways in seconds, making it suitable for genome-wide pathway screening.

cat("Testing all", length(gene_set_indices), "KEGG pathways with FRY...\n\n")
## Testing all 210 KEGG pathways with FRY...
# Run FRY on all pathways
fry_results <- fry(
  y = exprs_data,
  index = gene_set_indices,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  sort = "directional"
)

# Add pathway descriptions
fry_results <- add_pathway_descriptions(fry_results)

# Display top results
print("Top 15 pathways from FRY:")
## [1] "Top 15 pathways from FRY:"
print(fry_results[1:15, c("NGenes", "Direction", "PValue", "FDR", "Description")])
##       NGenes Direction       PValue          FDR
## 00100     17        Up 4.680105e-06 0.0009828221
## 05310     35      Down 1.388330e-04 0.0096607672
## 04146     61        Up 1.713447e-04 0.0096607672
## 05143     52        Up 2.116427e-04 0.0096607672
## 00600     28        Up 2.300183e-04 0.0096607672
## 00563     16        Up 3.616102e-04 0.0122243671
## 00500     30        Up 4.074789e-04 0.0122243671
## 00650     31        Up 5.058900e-04 0.0128557869
## 00410     23        Up 5.509623e-04 0.0128557869
## 00053     12        Up 6.930409e-04 0.0145538593
## 00360     18        Up 9.434818e-04 0.0167869769
## 00561     34        Up 9.652182e-04 0.0167869769
## 00592     11        Up 1.085564e-03 0.0167869769
## 00510     38        Up 1.186483e-03 0.0167869769
## 04070    103        Up 1.201969e-03 0.0167869769
##                                                  Description
## 00100                                   Steroid biosynthesis
## 05310                                                 Asthma
## 04146                                             Peroxisome
## 05143                                African trypanosomiasis
## 00600                                Sphingolipid metabolism
## 00563 Glycosylphosphatidylinositol (GPI)-anchor biosynthesis
## 00500                          Starch and sucrose metabolism
## 00650                                   Butanoate metabolism
## 00410                                beta-Alanine metabolism
## 00053                      Ascorbate and aldarate metabolism
## 00360                               Phenylalanine metabolism
## 00561                                Glycerolipid metabolism
## 00592                        alpha-Linolenic acid metabolism
## 00510                                  N-Glycan biosynthesis
## 04070                  Phosphatidylinositol signaling system

Key advantages of FRY:

  • Very fast - can test entire pathway databases
  • High-resolution p-values not limited by number of rotations
  • Directional p-values approximate MROAST with infinite rotations
  • When df.prior is large and set.statistic="mean", FRY gives exactly the same directional p-values as infinite MROAST

9. Compare MROAST and FRY Results

Let’s compare the results from MROAST and FRY to see how well FRY approximates MROAST.

# Find common pathways tested by both methods
common_pathways <- intersect(rownames(mroast_results), rownames(fry_results))

comparison <- data.frame(
  PathwayID = common_pathways,
  MROAST_PValue = mroast_results[common_pathways, "PValue"],
  FRY_PValue = fry_results[common_pathways, "PValue"],
  MROAST_FDR = mroast_results[common_pathways, "FDR"],
  FRY_FDR = fry_results[common_pathways, "FDR"],
  Description = fry_results[common_pathways, "Description"]
)

# Sort by FRY p-value
comparison_sorted <- comparison[order(comparison$FRY_PValue), ]

print("Comparison of top 10 pathways:")
## [1] "Comparison of top 10 pathways:"
print(head(comparison_sorted, 10))
##    PathwayID MROAST_PValue   FRY_PValue MROAST_FDR     FRY_FDR
## 1      04146        0.0003 0.0001713447 0.01250000 0.009660767
## 2      00650        0.0007 0.0005058900 0.01416667 0.012855787
## 3      00410        0.0009 0.0005509623 0.01416667 0.012855787
## 4      00592        0.0013 0.0010855638 0.01458333 0.016786977
## 6      05131        0.0018 0.0016524321 0.01458333 0.019278374
## 5      00640        0.0017 0.0017941372 0.01458333 0.019657027
## 7      00071        0.0024 0.0026166463 0.01678571 0.025517554
## 9      00340        0.0034 0.0034728424 0.01861111 0.025517554
## 8      00250        0.0028 0.0040090375 0.01718750 0.027157996
## 14     00620        0.0068 0.0070212020 0.02410714 0.042870023
##                                    Description
## 1                                   Peroxisome
## 2                         Butanoate metabolism
## 3                      beta-Alanine metabolism
## 4              alpha-Linolenic acid metabolism
## 6                                  Shigellosis
## 5                        Propanoate metabolism
## 7                       Fatty acid degradation
## 9                         Histidine metabolism
## 8  Alanine, aspartate and glutamate metabolism
## 14                         Pyruvate metabolism
# Calculate correlation
cor_pval <- cor(comparison$MROAST_PValue, comparison$FRY_PValue, 
                method = "spearman")
cat("\nSpearman correlation between MROAST and FRY p-values:", 
    round(cor_pval, 3), "\n")
## 
## Spearman correlation between MROAST and FRY p-values: 0.991

A high correlation indicates that FRY provides a good approximation to MROAST results while being much faster.


10. Visualize Results

Visual comparisons help understand the agreement between methods and the biological findings.

par(mfrow = c(2, 2), mar = c(8, 4, 4, 2))

# 1. Top pathways from MROAST
top_mroast <- head(mroast_results, 15)
barplot(
  -log10(top_mroast$FDR),
  names.arg = rownames(top_mroast),
  las = 2,
  main = "MROAST: Top 15 Pathways",
  ylab = "-log10(FDR)",
  col = ifelse(top_mroast$Direction == "Up", "steelblue", "coral"),
  cex.names = 0.6
)
abline(h = -log10(0.05), lty = 2, col = "red")
legend("topright", legend = c("Up", "Down", "FDR = 0.05"), 
       fill = c("steelblue", "coral", NA), 
       border = c("black", "black", NA),
       lty = c(NA, NA, 2), col = c(NA, NA, "red"))

# 2. Top pathways from FRY
top_fry <- head(fry_results, 15)
barplot(
  -log10(top_fry$FDR),
  names.arg = rownames(top_fry),
  las = 2,
  main = "FRY: Top 15 Pathways",
  ylab = "-log10(FDR)",
  col = ifelse(top_fry$Direction == "Up", "steelblue", "coral"),
  cex.names = 0.6
)
abline(h = -log10(0.05), lty = 2, col = "red")

# 3. P-value comparison scatter plot
plot(
  -log10(comparison$MROAST_PValue),
  -log10(comparison$FRY_PValue),
  pch = 16,
  col = rgb(0, 0, 0, 0.3),
  xlab = "-log10(MROAST P-value)",
  ylab = "-log10(FRY P-value)",
  main = "MROAST vs FRY P-values"
)
abline(0, 1, col = "red", lty = 2)
text(x = max(-log10(comparison$MROAST_PValue)) * 0.7,
     y = max(-log10(comparison$FRY_PValue)) * 0.1,
     labels = paste("Correlation =", round(cor_pval, 3)),
     col = "blue")

# 4. Distribution of pathway sizes
hist(
  fry_results$NGenes,
  breaks = 30,
  main = "Distribution of Pathway Sizes",
  xlab = "Number of Genes",
  col = "lightblue",
  border = "white"
)

par(mfrow = c(1, 1))

11. Summary of Significant Pathways

Let’s summarize how many pathways are significantly enriched at FDR < 0.05.

# MROAST significant pathways
mroast_sig <- mroast_results[mroast_results$FDR < 0.05, ]
cat("MROAST - Significant pathways (FDR < 0.05):", nrow(mroast_sig), "\n")
## MROAST - Significant pathways (FDR < 0.05): 16
cat("  Up-regulated:", sum(mroast_sig$Direction == "Up"), "\n")
##   Up-regulated: 15
cat("  Down-regulated:", sum(mroast_sig$Direction == "Down"), "\n\n")
##   Down-regulated: 1
# FRY significant pathways
fry_sig <- fry_results[fry_results$FDR < 0.05, ]
cat("FRY - Significant pathways (FDR < 0.05):", nrow(fry_sig), "\n")
## FRY - Significant pathways (FDR < 0.05): 39
cat("  Up-regulated:", sum(fry_sig$Direction == "Up"), "\n")
##   Up-regulated: 34
cat("  Down-regulated:", sum(fry_sig$Direction == "Down"), "\n\n")
##   Down-regulated: 5
# Overlap between methods
sig_overlap <- intersect(rownames(mroast_sig), rownames(fry_sig))
cat("Pathways significant in both methods:", length(sig_overlap), "\n\n")
## Pathways significant in both methods: 14
# Show top significant pathways from FRY
cat("Top 10 significant pathways (FRY):\n")
## Top 10 significant pathways (FRY):
print(head(fry_sig[, c("NGenes", "Direction", "PValue", "FDR", "Description")], 10))
##       NGenes Direction       PValue          FDR
## 00100     17        Up 4.680105e-06 0.0009828221
## 05310     35      Down 1.388330e-04 0.0096607672
## 04146     61        Up 1.713447e-04 0.0096607672
## 05143     52        Up 2.116427e-04 0.0096607672
## 00600     28        Up 2.300183e-04 0.0096607672
## 00563     16        Up 3.616102e-04 0.0122243671
## 00500     30        Up 4.074789e-04 0.0122243671
## 00650     31        Up 5.058900e-04 0.0128557869
## 00410     23        Up 5.509623e-04 0.0128557869
## 00053     12        Up 6.930409e-04 0.0145538593
##                                                  Description
## 00100                                   Steroid biosynthesis
## 05310                                                 Asthma
## 04146                                             Peroxisome
## 05143                                African trypanosomiasis
## 00600                                Sphingolipid metabolism
## 00563 Glycosylphosphatidylinositol (GPI)-anchor biosynthesis
## 00500                          Starch and sucrose metabolism
## 00650                                   Butanoate metabolism
## 00410                                beta-Alanine metabolism
## 00053                      Ascorbate and aldarate metabolism

12. Advanced Example: Using Gene Weights

Gene weights allow you to specify directional expectations for genes in a pathway based on prior knowledge or experimental evidence.

What are Gene Weights?

Gene weights are directional values (positive or negative) that indicate: - +1: Gene expected to be up-regulated - -1: Gene expected to be down-regulated - Other values: Weighted contributions

When you specify gene weights, a significant “Up” p-value means the observed changes are positively correlated with the weights (genes are changing in the expected direction). A significant “Down” p-value means changes are negatively correlated with the weights.

# Create weighted gene set based on differential expression
# (In practice, weights would come from prior knowledge)
top_de_genes <- topTable(fit2, coef = "TvsB", number = 100, sort.by = "none")
gene_weights <- sign(top_de_genes$logFC)

# Get indices of these genes
top_gene_probes <- rownames(top_de_genes)
weighted_indices <- which(rownames(exprs_data) %in% top_gene_probes)

# Create data.frame with gene indices and weights
weighted_set <- data.frame(
  Gene = weighted_indices,
  Weight = gene_weights
)

cat("Testing weighted gene set with ROAST:\n\n")
## Testing weighted gene set with ROAST:
roast_weighted <- roast(
  y = exprs_data,
  index = weighted_set,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  nrot = 9999
)

print(roast_weighted)
##      NGenes PropDown PropUp Direction PValue   FDR PValue.Mixed FDR.Mixed
## set1    100        0   0.38        Up  1e-04 1e-04       0.0215    0.0215

Interpretation with gene weights:

  • Significant “Up”: Genes are changing in the direction specified by weights (positive correlation)
  • Significant “Down”: Genes are changing opposite to weights (negative correlation)
  • This is useful for testing whether your data matches a known expression signature

13. Key Differences Between Methods

Summary Table

Feature ROAST MROAST FRY
Number of sets Single Multiple Multiple
Speed Slow Moderate Fast
P-value method Rotation-based Rotation-based Analytical approximation
Rotations needed Many (9999+) Many (9999+) None
Best use case Focused hypothesis Moderate # pathways Large # pathways
PropDown/PropUp No Yes No
Active.Prop Yes No No

When to Use Each Method

Use ROAST when:

  • Testing a single specific pathway hypothesis
  • You want detailed output including Active.Prop
  • You have time for rotation-based testing

Use MROAST when:

  • Testing a moderate number of pathways (~10-100)
  • You want PropDown and PropUp statistics
  • You need rotation-based exact p-values

Use FRY when:

  • Testing a large number of pathways (hundreds to thousands)
  • Speed is important
  • You need high-resolution p-values
  • Screening entire pathway databases

Self-Contained vs Competitive Tests

All three methods (ROAST, MROAST, FRY) are self-contained tests: - Test whether genes in a set are differentially expressed - Compare against a null hypothesis of NO differential expression - Answer: “Are these genes DE?”

In contrast, CAMERA is a competitive test: - Tests whether genes in a set are MORE differentially expressed than genes outside the set - Compare against other genes in the genome - Answer: “Are these genes MORE DE than others?”


14. Conclusions

In this tutorial, we demonstrated:

  1. How to prepare gene sets from KEGG pathways
  2. Single pathway testing with ROAST
  3. Multiple pathway testing with MROAST
  4. Fast pathway screening with FRY
  5. Comparison between methods showing high agreement
  6. Use of gene weights for directional testing

Key Takeaways

  • FRY provides an excellent fast approximation to MROAST
  • Use high nrot values (≥9999) for publication-quality ROAST/MROAST results
  • Gene weights enable testing of directional hypotheses
  • All three methods are self-contained tests (vs. CAMERA’s competitive test)
  • Choose the method based on number of pathways and computational time available

Further Reading

  • Wu et al. (2010). ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26:2176-2182
  • Goeman & Bühlmann (2007). Analyzing gene expression data in terms of gene sets. Bioinformatics 23:980-987
  • Limma User’s Guide

Session Information

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.0.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] clusterProfiler_4.16.0 hgu95av2.db_3.13.0     org.Hs.eg.db_3.21.0   
##  [4] AnnotationDbi_1.70.0   IRanges_2.42.0         S4Vectors_0.46.0      
##  [7] ALL_1.50.0             Biobase_2.68.0         BiocGenerics_0.55.1   
## [10] generics_0.1.4         limma_3.64.3          
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.2.3               gson_0.1.0              rlang_1.1.6            
##  [4] magrittr_2.0.3          DOSE_4.2.0              compiler_4.5.1         
##  [7] RSQLite_2.4.3           png_0.1-8               vctrs_0.6.5            
## [10] reshape2_1.4.4          stringr_1.5.2           pkgconfig_2.0.3        
## [13] crayon_1.5.3            fastmap_1.2.0           XVector_0.48.0         
## [16] rmarkdown_2.29          enrichplot_1.28.4       UCSC.utils_1.4.0       
## [19] purrr_1.1.0             bit_4.6.0               xfun_0.53              
## [22] cachem_1.1.0            aplot_0.2.8             GenomeInfoDb_1.44.2    
## [25] jsonlite_2.0.0          blob_1.2.4              BiocParallel_1.42.1    
## [28] parallel_4.5.1          R6_2.6.1                bslib_0.9.0            
## [31] stringi_1.8.7           RColorBrewer_1.1-3      jquerylib_0.1.4        
## [34] GOSemSim_2.34.0         Rcpp_1.1.0              knitr_1.50             
## [37] ggtangle_0.0.7          R.utils_2.13.0          Matrix_1.7-3           
## [40] splines_4.5.1           igraph_2.1.4            tidyselect_1.2.1       
## [43] qvalue_2.40.0           rstudioapi_0.17.1       dichromat_2.0-0.1      
## [46] yaml_2.3.10             codetools_0.2-20        lattice_0.22-7         
## [49] tibble_3.3.0            plyr_1.8.9              treeio_1.32.0          
## [52] KEGGREST_1.48.1         evaluate_1.0.5          gridGraphics_0.5-1     
## [55] Biostrings_2.76.0       pillar_1.11.0           ggtree_3.16.3          
## [58] ggfun_0.2.0             ggplot2_3.5.2           scales_1.4.0           
## [61] tidytree_0.4.6          glue_1.8.0              lazyeval_0.2.2         
## [64] tools_4.5.1             data.table_1.17.8       fgsea_1.34.2           
## [67] fs_1.6.6                fastmatch_1.1-6         cowplot_1.2.0          
## [70] grid_4.5.1              tidyr_1.3.1             ape_5.8-1              
## [73] colorspace_2.1-1        nlme_3.1-168            GenomeInfoDbData_1.2.14
## [76] patchwork_1.3.2         cli_3.6.5               rappdirs_0.3.3         
## [79] dplyr_1.1.4             gtable_0.3.6            R.methodsS3_1.8.2      
## [82] yulab.utils_0.2.1       sass_0.4.10             digest_0.6.37          
## [85] ggrepel_0.9.6           ggplotify_0.1.2         farver_2.1.2           
## [88] memoise_2.0.1           htmltools_0.5.8.1       R.oo_1.27.1            
## [91] lifecycle_1.0.4         httr_1.4.7              GO.db_3.21.0           
## [94] statmod_1.5.0           bit64_4.6.0-1