Introduction

This tutorial demonstrates how to perform self-contained gene set testing using three related methods from the limma package:

ROAST: Rotation gene set test for a single gene set
MROAST: Multiple ROAST tests with multiple testing correction
FRY: Fast approximation to MROAST with infinite rotations

These methods test whether genes in a pathway are differentially expressed, comparing against a null hypothesis of no differential expression. This is different from competitive tests like CAMERA, which test whether genes in a set are more differentially expressed than genes outside the set.

We’ll use the ALL (Acute Lymphoblastic Leukemia) dataset to compare B-cell vs T-cell ALL samples and identify enriched KEGG pathways.

1. Load Required Packages

We need several Bioconductor packages for this analysis:

limma: For differential expression and gene set testing
ALL: The dataset we’ll analyze
hgu95av2.db: Annotation for the microarray platform
org.Hs.eg.db: Human gene annotations
clusterProfiler: For downloading KEGG pathway information

library(limma)
library(ALL)
library(hgu95av2.db)
library(org.Hs.eg.db)
library(clusterProfiler)

2. Load and Prepare the ALL Dataset

The ALL dataset contains gene expression measurements from acute lymphoblastic leukemia patients. We’ll compare B-cell and T-cell samples to identify differentially expressed pathways.

# Load the ALL dataset
data(ALL)

# Select B-cell and T-cell samples
bcell_samples <- which(ALL$BT %in% c("B", "B1", "B2", "B3", "B4"))
tcell_samples <- which(ALL$BT == "T")

# Create a balanced subset
samples_to_keep <- c(bcell_samples[1:10], tcell_samples[1:5])
ALL_subset <- ALL[, samples_to_keep]

# Extract expression data
exprs_data <- exprs(ALL_subset)

# Create design matrix
cell_type <- factor(ifelse(ALL_subset$BT == "T", "T", "B"))
design <- model.matrix(~0 + cell_type)
colnames(design) <- c("B_cell", "T_cell")

print("Sample composition:")

## [1] "Sample composition:"

print(table(cell_type))

## cell_type
##  B  T 
## 10  5

The design matrix uses a group-means parameterization (no intercept), which makes it easier to specify contrasts between groups.

3. Differential Expression Analysis

We use limma to identify differentially expressed genes between T-cell and B-cell samples. This analysis provides the statistical foundation for gene set testing.

# Fit linear model
fit <- lmFit(exprs_data, design)

# Define contrast: T-cell vs B-cell
contrast_matrix <- makeContrasts(
  TvsB = T_cell - B_cell,
  levels = design
)

# Fit contrasts and apply empirical Bayes
fit2 <- contrasts.fit(fit, contrast_matrix)
fit2 <- eBayes(fit2)

# View top genes
top_genes <- topTable(fit2, coef = "TvsB", number = 10)
print("Top 10 differentially expressed genes:")

## [1] "Top 10 differentially expressed genes:"

print(top_genes[, c("logFC", "AveExpr", "t", "P.Value", "adj.P.Val")])

##                logFC   AveExpr          t      P.Value    adj.P.Val
## 38319_at    4.483282  6.385744  13.829293 3.094569e-10 3.906894e-06
## 38147_at    2.726426  4.519145  12.825759 9.269009e-10 5.851062e-06
## 2031_s_at  -3.292335  8.801557 -11.095824 7.347931e-09 3.092254e-05
## 37039_at   -2.678274 11.234888  -9.753845 4.397852e-08 1.388072e-04
## 41723_s_at -3.354824  8.195279  -8.804350 1.746216e-07 4.409197e-04
## 41409_at   -1.954163  7.165671  -8.589302 2.420616e-07 4.421542e-04
## 2059_s_at   2.228525  7.176821   8.581013 2.451548e-07 4.421542e-04
## 38095_i_at -3.272734 10.384079  -8.465467 2.928754e-07 4.621940e-04
## 266_s_at   -3.522492  7.100962  -8.141603 4.863535e-07 5.729679e-04
## 32506_at   -1.204799  6.644397  -8.103231 5.169197e-07 5.729679e-04

The eBayes function computes moderated t-statistics by borrowing information across genes, which improves statistical power especially for small sample sizes.

4. Prepare KEGG Pathway Gene Sets

Gene set testing requires mapping our microarray probes to gene identifiers and then to pathway annotations. We use KEGG pathways as our gene sets.

4.1 Map Probes to Entrez Gene IDs

# Get probe to Entrez Gene ID mapping
probe_to_entrez <- as.list(hgu95av2ENTREZID)
probe_to_entrez <- probe_to_entrez[!is.na(probe_to_entrez)]

# Map our probes to Entrez IDs
probe_ids <- rownames(exprs_data)
entrez_mapping <- unlist(probe_to_entrez[probe_ids])
entrez_mapping <- entrez_mapping[!is.na(entrez_mapping)]

print(paste("Number of probes mapped to Entrez IDs:", length(entrez_mapping)))

## [1] "Number of probes mapped to Entrez IDs: 11683"

4.2 Get KEGG Pathway Annotations

We use the org.Hs.egPATH object which maps Entrez Gene IDs to KEGG pathway IDs.

# Get gene-to-pathway mapping
pathway_map <- org.Hs.egPATH
mapped_genes <- mappedkeys(pathway_map)
gene_to_pathway <- as.list(pathway_map[mapped_genes])

# Invert to create pathway-to-genes mapping
pathway_to_genes <- list()
for (gene_id in names(gene_to_pathway)) {
  pathways <- gene_to_pathway[[gene_id]]
  for (pathway_id in pathways) {
    if (is.null(pathway_to_genes[[pathway_id]])) {
      pathway_to_genes[[pathway_id]] <- character(0)
    }
    pathway_to_genes[[pathway_id]] <- c(pathway_to_genes[[pathway_id]], gene_id)
  }
}

print(paste("Total KEGG pathways:", length(pathway_to_genes)))

## [1] "Total KEGG pathways: 229"

4.3 Create Gene Set Indices

ROAST, MROAST, and FRY require gene sets as lists of row indices in the expression matrix.

create_gene_set_indices <- function(pathway_list, gene_mapping, min_genes = 10) {
  pathway_indices <- list()
  
  for (pathway_id in names(pathway_list)) {
    pathway_genes <- pathway_list[[pathway_id]]
    probes_in_pathway <- names(gene_mapping)[gene_mapping %in% pathway_genes]
    indices <- which(rownames(exprs_data) %in% probes_in_pathway)
    
    # Only keep pathways with sufficient genes
    if (length(indices) >= min_genes) {
      pathway_indices[[pathway_id]] <- indices
    }
  }
  
  return(pathway_indices)
}

gene_set_indices <- create_gene_set_indices(pathway_to_genes, entrez_mapping)

print(paste("Pathways with >=10 genes:", length(gene_set_indices)))

## [1] "Pathways with >=10 genes: 210"

We filter out pathways with fewer than 10 genes because very small gene sets have limited statistical power and biological interpretability.

5. Download KEGG Pathway Descriptions

To make results interpretable, we need pathway names and descriptions.

# Download KEGG data using clusterProfiler
kegg_info <- download_KEGG("hsa")

# Create pathway ID to name mapping
pathway_names <- setNames(
  kegg_info$KEGGPATHID2NAME$to, 
  kegg_info$KEGGPATHID2NAME$from
)

print(paste("Pathway names available:", length(pathway_names)))

## [1] "Pathway names available: 368"

print("Example pathways:")

## [1] "Example pathways:"

print(head(pathway_names, 3))

##                          hsa01100                          hsa01200 
##              "Metabolic pathways"               "Carbon metabolism" 
##                          hsa01210 
## "2-Oxocarboxylic acid metabolism"

# Helper function to add descriptions to results
add_pathway_descriptions <- function(results, pathway_ids_col = NULL) {
  if (is.null(pathway_ids_col)) {
    pathway_ids <- rownames(results)
  } else {
    pathway_ids <- results[[pathway_ids_col]]
  }
  
  # Pad pathway IDs to match KEGG format (e.g., "810" -> "hsa00810")
  pathway_ids_padded <- sprintf("%05s", pathway_ids)
  pathway_ids_formatted <- paste0("hsa", pathway_ids_padded)
  
  results$Description <- pathway_names[pathway_ids_formatted]
  results$Description[is.na(results$Description)] <- NA
  
  return(results)
}

6. ROAST: Single Gene Set Test

ROAST (Rotation gene Set Test) tests a single gene set using rotation-based p-values. It’s ideal when you have a specific pathway hypothesis to test.

ROAST estimates p-values by randomly rotating orthogonalized residuals, which preserves the correlation structure while breaking the association with the outcome.

# Test the first pathway as an example
test_pathway_id <- names(gene_set_indices)[1]
test_pathway_indices <- gene_set_indices[[test_pathway_id]]

# Get pathway description
pathway_id_formatted <- paste0("hsa", sprintf("%05s", test_pathway_id))
pathway_description <- pathway_names[pathway_id_formatted]

cat("Testing pathway:", test_pathway_id, "\n")

## Testing pathway: 04610

cat("Description:", pathway_description, "\n")

## Description: Complement and coagulation cascades

cat("Number of genes:", length(test_pathway_indices), "\n\n")

## Number of genes: 73

# Run ROAST with high number of rotations
roast_result <- roast(
  y = exprs_data,
  index = test_pathway_indices,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  nrot = 9999
)

print(roast_result)

##          Active.Prop     P.Value
## Down      0.04109589 0.996699835
## Up        0.34246575 0.003350168
## UpOrDown  0.34246575 0.006700000
## Mixed     0.38356164 0.002300000

Interpreting ROAST output:

Down: Tests if genes tend to be down-regulated (negative t-statistics)
Up: Tests if genes tend to be up-regulated (positive t-statistics)
UpOrDown: Two-sided test for any directional change
Mixed: Tests for differential expression in either direction
Active.Prop: Proportion of genes contributing materially to significance

The number of rotations (nrot=9999) should be high for publication-quality results. The minimum possible p-value is 1/(nrot+1).

7. MROAST: Multiple Gene Set Tests

MROAST extends ROAST to test multiple gene sets simultaneously with appropriate multiple testing correction.

MROAST is suitable when testing a moderate number of pathways (up to ~100). For larger numbers, FRY is more efficient.

# Test a subset of pathways (first 50)
n_pathways_test <- min(50, length(gene_set_indices))
pathways_to_test <- gene_set_indices[1:n_pathways_test]

cat("Testing", n_pathways_test, "KEGG pathways with MROAST...\n")

## Testing 50 KEGG pathways with MROAST...

cat("This may take a few minutes...\n\n")

## This may take a few minutes...

# Run MROAST
mroast_results <- mroast(
  y = exprs_data,
  index = pathways_to_test,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  nrot = 9999,
  adjust.method = "BH",
  sort = "directional"
)

# Add pathway descriptions
mroast_results <- add_pathway_descriptions(mroast_results)

# Display top results
print("Top 15 pathways from MROAST:")

## [1] "Top 15 pathways from MROAST:"

print(mroast_results[1:15, c("NGenes", "PropDown", "PropUp", 
                              "Direction", "PValue", "FDR", "Description")])

##       NGenes   PropDown    PropUp Direction PValue        FDR
## 04146     61 0.04918033 0.3278689        Up 0.0003 0.01250000
## 00650     31 0.00000000 0.4193548        Up 0.0007 0.01416667
## 00410     23 0.00000000 0.4782609        Up 0.0009 0.01416667
## 00592     11 0.00000000 0.4545455        Up 0.0013 0.01458333
## 00640     29 0.00000000 0.4482759        Up 0.0017 0.01458333
## 05131    100 0.27000000 0.0500000      Down 0.0018 0.01458333
## 00071     43 0.02325581 0.4418605        Up 0.0024 0.01678571
## 00250     33 0.03030303 0.3636364        Up 0.0028 0.01718750
## 00340     22 0.04545455 0.4545455        Up 0.0034 0.01861111
## 00280     45 0.00000000 0.4444444        Up 0.0056 0.02410714
## 01100    966 0.09420290 0.2826087        Up 0.0060 0.02410714
## 04610     73 0.04109589 0.3424658        Up 0.0063 0.02410714
## 00380     45 0.00000000 0.3777778        Up 0.0064 0.02410714
## 00620     39 0.07692308 0.3076923        Up 0.0068 0.02410714
## 00740     12 0.08333333 0.4166667        Up 0.0096 0.03183333
##                                       Description
## 04146                                  Peroxisome
## 00650                        Butanoate metabolism
## 00410                     beta-Alanine metabolism
## 00592             alpha-Linolenic acid metabolism
## 00640                       Propanoate metabolism
## 05131                                 Shigellosis
## 00071                      Fatty acid degradation
## 00250 Alanine, aspartate and glutamate metabolism
## 00340                        Histidine metabolism
## 00280  Valine, leucine and isoleucine degradation
## 01100                          Metabolic pathways
## 04610         Complement and coagulation cascades
## 00380                       Tryptophan metabolism
## 00620                         Pyruvate metabolism
## 00740                       Riboflavin metabolism

Interpreting MROAST columns:

NGenes: Number of genes in the pathway
PropDown: Proportion of genes with z-score < -√2
PropUp: Proportion of genes with z-score > √2
Direction: Overall direction of enrichment (“Up” or “Down”)
PValue: Two-sided directional p-value
FDR: False discovery rate (Benjamini-Hochberg correction)
PValue.Mixed: Non-directional p-value
FDR.Mixed: FDR for non-directional test

8. FRY: Fast Gene Set Testing

FRY is a fast approximation that simulates what MROAST would give with an infinite number of rotations. It’s ideal for testing large numbers of pathways.

FRY can test thousands of pathways in seconds, making it suitable for genome-wide pathway screening.

cat("Testing all", length(gene_set_indices), "KEGG pathways with FRY...\n\n")

## Testing all 210 KEGG pathways with FRY...

# Run FRY on all pathways
fry_results <- fry(
  y = exprs_data,
  index = gene_set_indices,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  sort = "directional"
)

# Add pathway descriptions
fry_results <- add_pathway_descriptions(fry_results)

# Display top results
print("Top 15 pathways from FRY:")

## [1] "Top 15 pathways from FRY:"

print(fry_results[1:15, c("NGenes", "Direction", "PValue", "FDR", "Description")])

##       NGenes Direction       PValue          FDR
## 00100     17        Up 4.680105e-06 0.0009828221
## 05310     35      Down 1.388330e-04 0.0096607672
## 04146     61        Up 1.713447e-04 0.0096607672
## 05143     52        Up 2.116427e-04 0.0096607672
## 00600     28        Up 2.300183e-04 0.0096607672
## 00563     16        Up 3.616102e-04 0.0122243671
## 00500     30        Up 4.074789e-04 0.0122243671
## 00650     31        Up 5.058900e-04 0.0128557869
## 00410     23        Up 5.509623e-04 0.0128557869
## 00053     12        Up 6.930409e-04 0.0145538593
## 00360     18        Up 9.434818e-04 0.0167869769
## 00561     34        Up 9.652182e-04 0.0167869769
## 00592     11        Up 1.085564e-03 0.0167869769
## 00510     38        Up 1.186483e-03 0.0167869769
## 04070    103        Up 1.201969e-03 0.0167869769
##                                                  Description
## 00100                                   Steroid biosynthesis
## 05310                                                 Asthma
## 04146                                             Peroxisome
## 05143                                African trypanosomiasis
## 00600                                Sphingolipid metabolism
## 00563 Glycosylphosphatidylinositol (GPI)-anchor biosynthesis
## 00500                          Starch and sucrose metabolism
## 00650                                   Butanoate metabolism
## 00410                                beta-Alanine metabolism
## 00053                      Ascorbate and aldarate metabolism
## 00360                               Phenylalanine metabolism
## 00561                                Glycerolipid metabolism
## 00592                        alpha-Linolenic acid metabolism
## 00510                                  N-Glycan biosynthesis
## 04070                  Phosphatidylinositol signaling system

Key advantages of FRY:

Very fast - can test entire pathway databases
High-resolution p-values not limited by number of rotations
Directional p-values approximate MROAST with infinite rotations
When df.prior is large and set.statistic="mean", FRY gives exactly the same directional p-values as infinite MROAST

9. Compare MROAST and FRY Results

Let’s compare the results from MROAST and FRY to see how well FRY approximates MROAST.

# Find common pathways tested by both methods
common_pathways <- intersect(rownames(mroast_results), rownames(fry_results))

comparison <- data.frame(
  PathwayID = common_pathways,
  MROAST_PValue = mroast_results[common_pathways, "PValue"],
  FRY_PValue = fry_results[common_pathways, "PValue"],
  MROAST_FDR = mroast_results[common_pathways, "FDR"],
  FRY_FDR = fry_results[common_pathways, "FDR"],
  Description = fry_results[common_pathways, "Description"]
)

# Sort by FRY p-value
comparison_sorted <- comparison[order(comparison$FRY_PValue), ]

print("Comparison of top 10 pathways:")

## [1] "Comparison of top 10 pathways:"

print(head(comparison_sorted, 10))

##    PathwayID MROAST_PValue   FRY_PValue MROAST_FDR     FRY_FDR
## 1      04146        0.0003 0.0001713447 0.01250000 0.009660767
## 2      00650        0.0007 0.0005058900 0.01416667 0.012855787
## 3      00410        0.0009 0.0005509623 0.01416667 0.012855787
## 4      00592        0.0013 0.0010855638 0.01458333 0.016786977
## 6      05131        0.0018 0.0016524321 0.01458333 0.019278374
## 5      00640        0.0017 0.0017941372 0.01458333 0.019657027
## 7      00071        0.0024 0.0026166463 0.01678571 0.025517554
## 9      00340        0.0034 0.0034728424 0.01861111 0.025517554
## 8      00250        0.0028 0.0040090375 0.01718750 0.027157996
## 14     00620        0.0068 0.0070212020 0.02410714 0.042870023
##                                    Description
## 1                                   Peroxisome
## 2                         Butanoate metabolism
## 3                      beta-Alanine metabolism
## 4              alpha-Linolenic acid metabolism
## 6                                  Shigellosis
## 5                        Propanoate metabolism
## 7                       Fatty acid degradation
## 9                         Histidine metabolism
## 8  Alanine, aspartate and glutamate metabolism
## 14                         Pyruvate metabolism

# Calculate correlation
cor_pval <- cor(comparison$MROAST_PValue, comparison$FRY_PValue, 
                method = "spearman")
cat("\nSpearman correlation between MROAST and FRY p-values:", 
    round(cor_pval, 3), "\n")

## 
## Spearman correlation between MROAST and FRY p-values: 0.991

A high correlation indicates that FRY provides a good approximation to MROAST results while being much faster.

10. Visualize Results

Visual comparisons help understand the agreement between methods and the biological findings.

par(mfrow = c(2, 2), mar = c(8, 4, 4, 2))

# 1. Top pathways from MROAST
top_mroast <- head(mroast_results, 15)
barplot(
  -log10(top_mroast$FDR),
  names.arg = rownames(top_mroast),
  las = 2,
  main = "MROAST: Top 15 Pathways",
  ylab = "-log10(FDR)",
  col = ifelse(top_mroast$Direction == "Up", "steelblue", "coral"),
  cex.names = 0.6
)
abline(h = -log10(0.05), lty = 2, col = "red")
legend("topright", legend = c("Up", "Down", "FDR = 0.05"), 
       fill = c("steelblue", "coral", NA), 
       border = c("black", "black", NA),
       lty = c(NA, NA, 2), col = c(NA, NA, "red"))

# 2. Top pathways from FRY
top_fry <- head(fry_results, 15)
barplot(
  -log10(top_fry$FDR),
  names.arg = rownames(top_fry),
  las = 2,
  main = "FRY: Top 15 Pathways",
  ylab = "-log10(FDR)",
  col = ifelse(top_fry$Direction == "Up", "steelblue", "coral"),
  cex.names = 0.6
)
abline(h = -log10(0.05), lty = 2, col = "red")

# 3. P-value comparison scatter plot
plot(
  -log10(comparison$MROAST_PValue),
  -log10(comparison$FRY_PValue),
  pch = 16,
  col = rgb(0, 0, 0, 0.3),
  xlab = "-log10(MROAST P-value)",
  ylab = "-log10(FRY P-value)",
  main = "MROAST vs FRY P-values"
)
abline(0, 1, col = "red", lty = 2)
text(x = max(-log10(comparison$MROAST_PValue)) * 0.7,
     y = max(-log10(comparison$FRY_PValue)) * 0.1,
     labels = paste("Correlation =", round(cor_pval, 3)),
     col = "blue")

# 4. Distribution of pathway sizes
hist(
  fry_results$NGenes,
  breaks = 30,
  main = "Distribution of Pathway Sizes",
  xlab = "Number of Genes",
  col = "lightblue",
  border = "white"
)

par(mfrow = c(1, 1))

11. Summary of Significant Pathways

Let’s summarize how many pathways are significantly enriched at FDR < 0.05.

# MROAST significant pathways
mroast_sig <- mroast_results[mroast_results$FDR < 0.05, ]
cat("MROAST - Significant pathways (FDR < 0.05):", nrow(mroast_sig), "\n")

## MROAST - Significant pathways (FDR < 0.05): 16

cat("  Up-regulated:", sum(mroast_sig$Direction == "Up"), "\n")

##   Up-regulated: 15

cat("  Down-regulated:", sum(mroast_sig$Direction == "Down"), "\n\n")

##   Down-regulated: 1

# FRY significant pathways
fry_sig <- fry_results[fry_results$FDR < 0.05, ]
cat("FRY - Significant pathways (FDR < 0.05):", nrow(fry_sig), "\n")

## FRY - Significant pathways (FDR < 0.05): 39

cat("  Up-regulated:", sum(fry_sig$Direction == "Up"), "\n")

##   Up-regulated: 34

cat("  Down-regulated:", sum(fry_sig$Direction == "Down"), "\n\n")

##   Down-regulated: 5

# Overlap between methods
sig_overlap <- intersect(rownames(mroast_sig), rownames(fry_sig))
cat("Pathways significant in both methods:", length(sig_overlap), "\n\n")

## Pathways significant in both methods: 14

# Show top significant pathways from FRY
cat("Top 10 significant pathways (FRY):\n")

## Top 10 significant pathways (FRY):

print(head(fry_sig[, c("NGenes", "Direction", "PValue", "FDR", "Description")], 10))

##       NGenes Direction       PValue          FDR
## 00100     17        Up 4.680105e-06 0.0009828221
## 05310     35      Down 1.388330e-04 0.0096607672
## 04146     61        Up 1.713447e-04 0.0096607672
## 05143     52        Up 2.116427e-04 0.0096607672
## 00600     28        Up 2.300183e-04 0.0096607672
## 00563     16        Up 3.616102e-04 0.0122243671
## 00500     30        Up 4.074789e-04 0.0122243671
## 00650     31        Up 5.058900e-04 0.0128557869
## 00410     23        Up 5.509623e-04 0.0128557869
## 00053     12        Up 6.930409e-04 0.0145538593
##                                                  Description
## 00100                                   Steroid biosynthesis
## 05310                                                 Asthma
## 04146                                             Peroxisome
## 05143                                African trypanosomiasis
## 00600                                Sphingolipid metabolism
## 00563 Glycosylphosphatidylinositol (GPI)-anchor biosynthesis
## 00500                          Starch and sucrose metabolism
## 00650                                   Butanoate metabolism
## 00410                                beta-Alanine metabolism
## 00053                      Ascorbate and aldarate metabolism

12. Advanced Example: Using Gene Weights

Gene weights allow you to specify directional expectations for genes in a pathway based on prior knowledge or experimental evidence.

What are Gene Weights?

Gene weights are directional values (positive or negative) that indicate: - +1: Gene expected to be up-regulated - -1: Gene expected to be down-regulated - Other values: Weighted contributions

When you specify gene weights, a significant “Up” p-value means the observed changes are positively correlated with the weights (genes are changing in the expected direction). A significant “Down” p-value means changes are negatively correlated with the weights.

# Create weighted gene set based on differential expression
# (In practice, weights would come from prior knowledge)
top_de_genes <- topTable(fit2, coef = "TvsB", number = 100, sort.by = "none")
gene_weights <- sign(top_de_genes$logFC)

# Get indices of these genes
top_gene_probes <- rownames(top_de_genes)
weighted_indices <- which(rownames(exprs_data) %in% top_gene_probes)

# Create data.frame with gene indices and weights
weighted_set <- data.frame(
  Gene = weighted_indices,
  Weight = gene_weights
)

cat("Testing weighted gene set with ROAST:\n\n")

## Testing weighted gene set with ROAST:

roast_weighted <- roast(
  y = exprs_data,
  index = weighted_set,
  design = design,
  contrast = contrast_matrix[, "TvsB"],
  nrot = 9999
)

print(roast_weighted)

##      NGenes PropDown PropUp Direction PValue   FDR PValue.Mixed FDR.Mixed
## set1    100        0   0.38        Up  1e-04 1e-04       0.0215    0.0215

Interpretation with gene weights:

Significant “Up”: Genes are changing in the direction specified by weights (positive correlation)
Significant “Down”: Genes are changing opposite to weights (negative correlation)
This is useful for testing whether your data matches a known expression signature

13. Key Differences Between Methods

Summary Table

Feature	ROAST	MROAST	FRY
Number of sets	Single	Multiple	Multiple
Speed	Slow	Moderate	Fast
P-value method	Rotation-based	Rotation-based	Analytical approximation
Rotations needed	Many (9999+)	Many (9999+)	None
Best use case	Focused hypothesis	Moderate # pathways	Large # pathways
PropDown/PropUp	No	Yes	No
Active.Prop	Yes	No	No

When to Use Each Method

Use ROAST when:

Testing a single specific pathway hypothesis
You want detailed output including Active.Prop
You have time for rotation-based testing

Use MROAST when:

Testing a moderate number of pathways (~10-100)
You want PropDown and PropUp statistics
You need rotation-based exact p-values

Use FRY when:

Testing a large number of pathways (hundreds to thousands)
Speed is important
You need high-resolution p-values
Screening entire pathway databases

Self-Contained vs Competitive Tests

All three methods (ROAST, MROAST, FRY) are self-contained tests: - Test whether genes in a set are differentially expressed - Compare against a null hypothesis of NO differential expression - Answer: “Are these genes DE?”

In contrast, CAMERA is a competitive test: - Tests whether genes in a set are MORE differentially expressed than genes outside the set - Compare against other genes in the genome - Answer: “Are these genes MORE DE than others?”

14. Conclusions

In this tutorial, we demonstrated:

How to prepare gene sets from KEGG pathways
Single pathway testing with ROAST
Multiple pathway testing with MROAST
Fast pathway screening with FRY
Comparison between methods showing high agreement
Use of gene weights for directional testing

Key Takeaways

FRY provides an excellent fast approximation to MROAST
Use high nrot values (≥9999) for publication-quality ROAST/MROAST results
Gene weights enable testing of directional hypotheses
All three methods are self-contained tests (vs. CAMERA’s competitive test)
Choose the method based on number of pathways and computational time available

Session Information

sessionInfo()

## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.0.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] clusterProfiler_4.16.0 hgu95av2.db_3.13.0     org.Hs.eg.db_3.21.0   
##  [4] AnnotationDbi_1.70.0   IRanges_2.42.0         S4Vectors_0.46.0      
##  [7] ALL_1.50.0             Biobase_2.68.0         BiocGenerics_0.55.1   
## [10] generics_0.1.4         limma_3.64.3          
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.2.3               gson_0.1.0              rlang_1.1.6            
##  [4] magrittr_2.0.3          DOSE_4.2.0              compiler_4.5.1         
##  [7] RSQLite_2.4.3           png_0.1-8               vctrs_0.6.5            
## [10] reshape2_1.4.4          stringr_1.5.2           pkgconfig_2.0.3        
## [13] crayon_1.5.3            fastmap_1.2.0           XVector_0.48.0         
## [16] rmarkdown_2.29          enrichplot_1.28.4       UCSC.utils_1.4.0       
## [19] purrr_1.1.0             bit_4.6.0               xfun_0.53              
## [22] cachem_1.1.0            aplot_0.2.8             GenomeInfoDb_1.44.2    
## [25] jsonlite_2.0.0          blob_1.2.4              BiocParallel_1.42.1    
## [28] parallel_4.5.1          R6_2.6.1                bslib_0.9.0            
## [31] stringi_1.8.7           RColorBrewer_1.1-3      jquerylib_0.1.4        
## [34] GOSemSim_2.34.0         Rcpp_1.1.0              knitr_1.50             
## [37] ggtangle_0.0.7          R.utils_2.13.0          Matrix_1.7-3           
## [40] splines_4.5.1           igraph_2.1.4            tidyselect_1.2.1       
## [43] qvalue_2.40.0           rstudioapi_0.17.1       dichromat_2.0-0.1      
## [46] yaml_2.3.10             codetools_0.2-20        lattice_0.22-7         
## [49] tibble_3.3.0            plyr_1.8.9              treeio_1.32.0          
## [52] KEGGREST_1.48.1         evaluate_1.0.5          gridGraphics_0.5-1     
## [55] Biostrings_2.76.0       pillar_1.11.0           ggtree_3.16.3          
## [58] ggfun_0.2.0             ggplot2_3.5.2           scales_1.4.0           
## [61] tidytree_0.4.6          glue_1.8.0              lazyeval_0.2.2         
## [64] tools_4.5.1             data.table_1.17.8       fgsea_1.34.2           
## [67] fs_1.6.6                fastmatch_1.1-6         cowplot_1.2.0          
## [70] grid_4.5.1              tidyr_1.3.1             ape_5.8-1              
## [73] colorspace_2.1-1        nlme_3.1-168            GenomeInfoDbData_1.2.14
## [76] patchwork_1.3.2         cli_3.6.5               rappdirs_0.3.3         
## [79] dplyr_1.1.4             gtable_0.3.6            R.methodsS3_1.8.2      
## [82] yulab.utils_0.2.1       sass_0.4.10             digest_0.6.37          
## [85] ggrepel_0.9.6           ggplotify_0.1.2         farver_2.1.2           
## [88] memoise_2.0.1           htmltools_0.5.8.1       R.oo_1.27.1            
## [91] lifecycle_1.0.4         httr_1.4.7              GO.db_3.21.0           
## [94] statmod_1.5.0           bit64_4.6.0-1

Gene Set Testing with ROAST, MROAST, and FRY

Mikhail Dozmorov

2025-11-03