class: center, middle, inverse, title-slide .title[ # Visualizing Gene Expression with Heatmaps in R ] .author[ ### Mikhail Dozmorov ] .institute[ ### Virginia Commonwealth University ] .date[ ### 2025-11-17 ] --- <!-- HTML style block --> <style> .large { font-size: 130%; } .small { font-size: 70%; } .tiny { font-size: 40%; } </style> ## Introduction * **Heatmaps** are graphical representations of data where individual values are represented as colors. * Commonly used in genomics to visualize: * Expression of genes across samples * Sample clustering patterns * Co-expression among genes --- ## The `airway` dataset ``` r library(airway) data(airway) airway ``` * RNA-seq data of **human airway smooth muscle cells** treated with **dexamethasone** (a corticosteroid). * 8 samples: 4 controls and 4 treated. * Expression values are counts (from *summarizedExperiment*). --- ## DESeq2 analysis We'll identify **differentially expressed genes** between treated and untreated samples. ``` r dds <- DESeqDataSet(airway, design = ~ cell + dex) dds <- DESeq(dds) res <- results(dds) res <- res[order(res$padj), ] head(res) ``` --- ## Select top 20 differential genes ``` r top_genes <- rownames(res)[1:20] mtx <- assay(vst(dds))[top_genes, ] # variance-stabilized data head(mtx) ``` --- ## Row-centering * Each gene may have different baseline expression. * To focus on **relative expression changes**, we **center rows**: `\(z_{ij} = x_{ij} - \bar{x}_{i}\)` ``` r mtx_centered <- t(scale(t(mtx), center = TRUE, scale = FALSE)) ``` --- ## Heatmap with `pheatmap` ``` r annotation_col <- data.frame( Treatment = colData(dds)$dex, row.names = colnames(mtx_centered) ) pheatmap( mtx, annotation_col = annotation_col, scale = "row", show_rownames = TRUE, show_colnames = TRUE, clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean", clustering_method = "complete", main = "Top 20 Differentially Expressed Genes (pheatmap)" ) ``` --- ## Explanation * **Row centering**: focuses on variation across samples rather than absolute expression. * **Hierarchical clustering**: * Groups **genes** with similar expression patterns. * Groups **samples** based on their expression profiles. * **Annotations**: help visualize sample groups (e.g., treatment). --- ## Correlation-based Clustering We can cluster genes using **(1 - Pearson correlation) / 2** as the distance measure instead of the default Euclidean distance. ``` r # Compute correlation-based distances dist_rows <- as.dist((1 - cor(t(mtx), method = "pearson")) / 2) dist_cols <- as.dist((1 - cor(mtx, method = "pearson")) / 2) # Plot heatmap with correlation distances pheatmap( mtx, annotation_col = annotation_col, scale = "row", show_rownames = TRUE, show_colnames = TRUE, clustering_distance_rows = dist_rows, clustering_distance_cols = dist_cols, clustering_method = "ward.D2", main = "Top 20 Differentially Expressed Genes (Correlation-based Clustering)" ) ``` --- ## Interactive Heatmaps in R - Interactive heatmaps allow **zooming**, **hover tooltips**, and **dynamic clustering**. - Especially useful for **exploring large gene expression matrices**. ``` r library(heatmaply) heatmaply( mtx_centered, k_row = 3, k_col = 2, # cluster highlights scale = "none", colors = colorRampPalette(c("blue", "white", "red"))(256), main = "Interactive Heatmap (heatmaply)", xlab = "Samples", ylab = "Genes", showticklabels = c(TRUE, TRUE) ) ``` --- ## Heatmap with `ComplexHeatmap` ``` r library(circlize) col_fun <- colorRamp2(c(-2, 0, 2), c("blue", "white", "red")) ComplexHeatmap::Heatmap( mtx_centered, name = "Expression", top_annotation = ComplexHeatmap::HeatmapAnnotation(Treatment = annotation_col$Treatment), col = col_fun, show_row_names = TRUE, show_column_names = TRUE, cluster_rows = TRUE, cluster_columns = TRUE, row_title = "Genes", column_title = "Samples", heatmap_legend_param = list(title = "Centered\nExpression") ) ``` --- ## Comparison | Feature | `pheatmap` | `ComplexHeatmap` | | ------------- | ----------- | ------------------------------- | | Ease of use | Very simple | More complex but powerful | | Customization | Moderate | Extensive | | Annotations | Basic | Rich (multi-layered) | | Integration | Standalone | Works well with tidyverse, grid | --- ## Summary * **Heatmaps** reveal global expression patterns and clustering relationships. * **Row centering** emphasizes relative variation. * **Hierarchical clustering** groups similar genes and samples. * Tools: * `pheatmap`: simple and effective * `ComplexHeatmap`: advanced and customizable * `heatmaply`: interactive heatmaps