Module 6: Interpreting ExWAS Results
Conducting Exposome-Wide Association Studies
Setting Up: Loading Results
We assume you have ExWAS results from Module 5:
# Load pre-computed results (or use your own from Module 5)
# results_bmi_train: ExWAS results for BMI in training data
# results_bmi_test: ExWAS results for BMI in testing data
# results_glu_train: ExWAS results for glucose in training data
# results_glu_test: ExWAS results for glucose in testing data
load ('../data/nh.Rdata' ) # for ExposureDescription
The Volcano Plot
A volcano plot displays both statistical significance and effect size:
x-axis : effect estimate (association size)
y-axis : \(-\log_{10}(\text{FDR})\)
Points above the significance line are FDR-significant
Points far from zero have large effects
Building a Volcano Plot
results_bmi_train <- results_bmi_train %>%
left_join (ExposureDescription, by = c ("exposure" = "var" ))
ggplot (results_bmi_train, aes (x = estimate, y = - log10 (fdr))) +
geom_point (aes (color = fdr < 0.05 ), alpha = 0.7 ) +
geom_hline (yintercept = - log10 (0.05 ), linetype = "dashed" , color = "red" ) +
scale_color_manual (values = c ("grey60" , "steelblue" ),
labels = c ("Not significant" , "FDR < 0.05" )) +
labs (x = "Effect Estimate (per 1 SD change in log-exposure)" ,
y = expression (- log[10 ](FDR)),
title = "Volcano Plot: ExWAS for BMI" ,
color = "" ) +
theme_minimal ()
Adding Labels with ggrepel
# Label the top hits
top_hits <- results_bmi_train %>%
filter (fdr < 0.05 ) %>%
arrange (fdr) %>%
head (10 )
ggplot (results_bmi_train, aes (x = estimate, y = - log10 (fdr))) +
geom_point (aes (color = fdr < 0.05 ), alpha = 0.6 ) +
geom_hline (yintercept = - log10 (0.05 ), linetype = "dashed" , color = "red" ) +
geom_text_repel (data = top_hits, aes (label = var_desc),
size = 2.5 , max.overlaps = 15 ) +
scale_color_manual (values = c ("grey60" , "steelblue" )) +
labs (x = "Effect Estimate" , y = expression (- log[10 ](FDR)),
title = "Volcano Plot: ExWAS for BMI" ) +
theme_minimal () + theme (legend.position = "none" )
Volcano Plot for Glucose
results_glu_train <- results_glu_train %>%
left_join (ExposureDescription, by = c ("exposure" = "var" ))
top_glu <- results_glu_train %>%
filter (fdr < 0.05 ) %>% arrange (fdr) %>% head (10 )
ggplot (results_glu_train, aes (x = estimate, y = - log10 (fdr))) +
geom_point (aes (color = fdr < 0.05 ), alpha = 0.6 ) +
geom_hline (yintercept = - log10 (0.05 ), linetype = "dashed" , color = "red" ) +
geom_text_repel (data = top_glu, aes (label = var_desc),
size = 2.5 , max.overlaps = 15 ) +
scale_color_manual (values = c ("grey60" , "coral" )) +
labs (x = "Effect Estimate" , y = expression (- log[10 ](FDR)),
title = "Volcano Plot: ExWAS for Fasting Glucose" ) +
theme_minimal () + theme (legend.position = "none" )
Interpreting Effect Sizes
The estimate from our ExWAS model represents:
The change in the standardized phenotype per 1 SD change in the log-transformed exposure, adjusted for confounders.
Example: An estimate of 0.10 for lead on BMI means: - A 1 SD increase in log(lead) is associated with a 0.10 SD increase in BMI - Both variables are on the same scale, making comparisons across exposures valid
Delta R-Squared: Unique Variance Explained
\(\Delta R^2 = R^2_{\text{full model}} - R^2_{\text{base model (covariates only)}}\)
# Compute delta R2 for each exposure
r2_base <- summary (lm (BMXBMI ~ RIDAGEYR + RIAGENDR +
factor (RIDRETH1) + INDFMPIR,
data = NHData.train))$ r.squared
results_bmi_train <- results_bmi_train %>%
mutate (delta_r2 = r.squared - r2_base)
Visualizing Delta R-Squared
top_r2 <- results_bmi_train %>%
filter (fdr < 0.05 ) %>%
arrange (- abs (delta_r2)) %>%
head (15 )
ggplot (top_r2, aes (x = reorder (var_desc, delta_r2), y = delta_r2)) +
geom_col (fill = "steelblue" ) +
coord_flip () +
labs (x = "" , y = expression (Delta~ R^ 2 ),
title = "Unique Variance Explained by Each Exposure (BMI)" ) +
theme_minimal ()
Replication Table
replication_bmi <- results_bmi_train %>%
filter (fdr < 0.05 ) %>%
select (exposure, var_desc, est_train = estimate, fdr_train = fdr) %>%
inner_join (
results_bmi_test %>%
select (exposure, est_test = estimate, p_test = p.value),
by = "exposure"
) %>%
mutate (
concordant = sign (est_train) == sign (est_test),
replicated = p_test < 0.05 & concordant
)
replication_bmi %>%
select (var_desc, est_train, est_test, fdr_train, p_test, replicated) %>%
arrange (fdr_train) %>%
kable (digits = 4 ) %>% kable_styling (font_size = 8 )
Replication Summary
replication_bmi %>%
summarize (
discoveries = n (),
replicated = sum (replicated),
pct_replicated = mean (replicated) * 100
) %>%
kable (digits = 1 ) %>% kable_styling ()
Exposure Correlation Heatmap
Exposures are often correlated. Visualizing this helps interpret co-significant findings:
# Select top FDR-significant exposures
sig_exposures <- results_bmi_train %>%
filter (fdr < 0.05 ) %>%
pull (exposure)
# Compute correlation matrix
cor_matrix <- NHData.train %>%
select (any_of (sig_exposures)) %>%
mutate (across (everything (), ~ log (. + 0.001 ))) %>%
cor (use = "pairwise.complete.obs" )
Plotting the Heatmap
cor_long <- cor_matrix %>%
as.data.frame () %>%
rownames_to_column ("var1" ) %>%
pivot_longer (- var1, names_to = "var2" , values_to = "correlation" )
ggplot (cor_long, aes (x = var1, y = var2, fill = correlation)) +
geom_tile () +
scale_fill_gradient2 (low = "blue" , mid = "white" , high = "red" ,
midpoint = 0 , limits = c (- 1 , 1 )) +
labs (title = "Correlation Among Significant Exposures" ) +
theme_minimal () +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 , size = 6 ),
axis.text.y = element_text (size = 6 ),
axis.title = element_blank ())
Sensitivity Across Adjustment Models
Does the association hold under different covariate sets?
library (nhanespewas)
con <- connect_pewas_data ()
# Run lead-BMI association under all 7 adjustment models
sensitivity_results <- map_dfr (1 : 7 , function (i) {
tryCatch ({
result <- pe_flex_adjust ("BMXBMI" , "LBXBPB" ,
adjustment_models[[i]], con)
result %>%
map_dfr (~ tidy (.), .id = "cycle" ) %>%
filter (grepl ("LBXBPB" , term)) %>%
summarize (adj_model = i, estimate = mean (estimate),
p.value = min (p.value))
}, error = function (e) {
tibble (adj_model = i, estimate = NA_real_ , p.value = NA_real_ )
})
})
Visualizing Sensitivity
ggplot (sensitivity_results, aes (x = factor (adj_model), y = estimate)) +
geom_point (size = 3 ) +
geom_hline (yintercept = 0 , linetype = "dashed" ) +
labs (x = "Adjustment Model" , y = "Effect Estimate" ,
title = "Lead-BMI Association Across Adjustment Models" ) +
theme_minimal ()
If the estimate is stable across models, the finding is more robust.
Reverse Causation
Problem : In cross-sectional data like NHANES, we cannot determine temporal order.
Does the exposure cause the phenotype?
Or does the phenotype (or treatment for it) affect the exposure?
Example: Medication use may lower exposure biomarker levels in people who already have the disease.
Solution : Longitudinal data, Mendelian randomization (Module 7)
Selection Bias
Problem : Who participates in NHANES?
Non-institutionalized population only
Response rates vary by demographics
Healthier individuals may be more likely to participate
Survey weights partially address this, but residual bias is possible.
Measurement Error
Problem : Biomarkers are imperfect proxies of true exposure.
Single time-point measurement may not reflect chronic exposure
Laboratory assay variability
Some exposures have short half-lives (hours to days)
Measurement error typically biases associations toward the null
The PE Atlas
The Phenotype-Exposure Atlas provides pre-computed ExWAS results:
Website: pe.exposomeatlas.com
Contains associations across hundreds of phenotypes and exposures
Results from multiple NHANES cycles
Searchable by phenotype or exposure
# The nhanespewas package can also access the atlas
# See package documentation for atlas query functions
STROBE-E Reporting
When reporting ExWAS results, follow STROBE-E guidelines:
Study design
Cross-sectional, survey-weighted
Exposures
List all tested, with measurement method
Outcome
Definition and measurement
Covariates
All adjustment variables and rationale
Multiple testing
Method used (FDR, Bonferroni)
Replication
Independent dataset used
Effect sizes
With confidence intervals
Sensitivity analyses
Across adjustment models
Limitations
Reverse causation, confounding, measurement error
Cleaning Up
disconnect_pewas_data (con)
Summary
Volcano plots visualize significance and effect size simultaneously
Delta R-squared quantifies unique variance explained by each exposure
Replication in independent data validates discoveries
Correlation heatmaps reveal co-occurring exposures
Sensitivity analysis tests robustness across adjustment models
Be cautious about reverse causation , selection bias , and measurement error
Use STROBE-E guidelines for transparent reporting
What’s Next?
Module 7: Advanced Topics
Meta-analysis across NHANES cycles
Interaction testing
Microbiome-exposome studies
Future directions: causal inference, ML, multi-omics
Supported By
This course is supported by the National Institutes of Health (NIH):
National Institute of Environmental Health Sciences (NIEHS): R01ES032470, U24ES036819
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK): R01DK137993