TMAO, hippuric acid, p-cresol sulfate, indoles, secondary bile acids
Adducts
HSA-Cys34 adducts from reactive electrophiles (oxidative stress, pollutants)
The key insight: LC-HRMS captures both the internal chemical environment (endogenous) and external exposures (exogenous) simultaneously.
The Blood Exposome
Rappaport et al. (2014) defined the blood exposome as the totality of chemicals circulating in blood from both endogenous and exogenous sources:
Chemicals enter the blood from external sources (air, water, diet, drugs, occupation)
Chemicals also arise from endogenous processes (inflammation, oxidative stress, lipid peroxidation, gut microbiome)
The blood integrates all sources into a single measurable compartment
LC-HRMS can profile this integrated signal in a single analytical run
This “top-down” approach (measure what’s in the blood) complements the “bottom-up” approach (measure every external source) used in traditional exposure assessment.
Reference: Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. The blood exposome and its role in discovering causes of disease. Environ Health Perspect 2014; 122(8):769-774.
Untargeted vs. Targeted: A Comparison
Dimension
Targeted (e.g., NHANES)
Untargeted LC-HRMS
Coverage
Hundreds of pre-specified chemicals
Thousands of features (known + unknown)
Selection
Must know what to measure a priori
Agnostic, discovery-based
Quantification
Absolute (ng/mL) with reference standards
Semi-quantitative (relative intensity)
Sensitivity
Very high for targeted analytes (ppb-ppt)
Lower for trace xenobiotics
Discovery
Limited to known chemicals
Can find novel/unexpected exposures
Annotation
Known identity
~80-95% of features are unannotated
Sample volume
Large volumes for full panel
Small volumes (< 100 \(\mu\)L)
Cost
Expensive per analyte
Cost-effective per feature
Targeted and untargeted approaches are complementary, not competing.
The Annotation Challenge
The single biggest bottleneck in untargeted exposomics:
Only ~5% of detected features are confidently annotated.
The Schymanski confidence levels provide a standardized framework:
Level
Confidence
Evidence Required
1
Confirmed
Reference standard match (RT + MS + MS/MS)
2
Probable
Library MS/MS spectral match
3
Tentative
Molecular formula, partial structural evidence
4
Formula
Unequivocal molecular formula only
5
Mass
Exact mass (m/z) only
Most features in an untargeted run are Level 4-5 — the “dark matter” of the exposome.
Reference: Schymanski EL, et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ Sci Technol 2014; 48(4):2097-2098.
Why Dark Matter Matters
The ~80-95% of unannotated features are a mix of:
Truly novel chemicals not in any database
Known chemicals missing from spectral libraries
Transformation products (metabolites of metabolites)
Adducts and in-source fragments (analytical artifacts)
Informatic noise (false peaks from feature detection algorithms)
This means that an LC-HRMS-based ExWAS may find strong associations with features we cannot yet identify.
Long-term drift (months between batches in a large cohort)
These systematic variations can confound biological signals.
Mitigation strategies:
Pooled QC samples run every 10-20 samples to monitor drift
Reference standardization (Go, Walker et al. 2015)
Signal normalization: median fold change, LOESS regression, ComBat
Randomized run order to decouple batch from biological variables
Reference: Go YM, Walker DI, et al. Reference standardization for mass spectrometry and high-resolution metabolomics applications to exposome research. Toxicol Sci 2015; 148(2):531-543.
Scalable Workflows for Population Studies
Moving LC-HRMS from small studies to population-scale cohorts:
Hu, Walker et al. (2021) demonstrated a scalable single-step extraction workflow:
Combined LC-HRMS and GC-HRMS in a single extraction
Validated across hundreds of samples
Demonstrated reproducibility for population-scale deployment
Detected both endogenous metabolites and exogenous chemicals
This is the kind of infrastructure needed to create a “next-generation NHANES” with untargeted exposomics.
Reference: Hu X, Walker DI, et al. A scalable workflow to characterize the human exposome. Nat Commun 2021; 12:5575.
What LC-HRMS Has Already Found
Selected findings from untargeted exposome studies:
Novel exposure-disease associations not detectable with targeted panels (e.g., previously unmeasured dietary metabolites associated with cardiovascular risk)
Exposure-metabolite networks revealing how exogenous chemicals perturb endogenous pathways (Jeong et al., Sci Rep 2021)
Occupational exposures detected in firefighters vs. office workers through differential metabolomic profiles
Environmental chemical mixtures that co-occur and may have joint effects on health
The discovery potential is the key advantage — finding associations with chemicals we didn’t know to measure.
The Future: Untargeted ExWAS at Scale
Imagine combining the PE Atlas approach (Module 8) with untargeted LC-HRMS:
Current (NHANES targeted)
Future (LC-HRMS untargeted)
619 exposures
10,000-20,000 features
Known chemicals only
Known + unknown chemicals
Pre-specified assays
Discovery-driven
~120,000 associations
~3-6 million associations
Targeted replication
Targeted confirmation of unknowns
The statistical and computational challenges scale accordingly — but the ExWAS framework (Modules 3-7) provides the foundation.
Key References
Foundational:
Wild CP. Complementing the genome with an “exposome.” Cancer Epidemiol Biomarkers Prev 2005; 14(8):1847-1850.
Rappaport SM, Smith MT. Environment and disease risks. Science 2010; 330:460-461.
Vermeulen R, Schymanski EL, Barabasi AL, Miller GW. The exposome and health: where chemistry meets biology. Science 2020; 367:392-396.
Methodology:
Rappaport SM, et al. The blood exposome and its role in discovering causes of disease. Environ Health Perspect 2014; 122(8):769-774.
Go YM, Walker DI, et al. Reference standardization for LC-HRMS exposome research. Toxicol Sci 2015; 148(2):531-543.
Hu X, Walker DI, et al. A scalable workflow to characterize the human exposome. Nat Commun 2021; 12:5575.
Key References (continued)
Annotation and standards:
Schymanski EL, et al. Identifying small molecules via HRMS: communicating confidence. Environ Sci Technol 2014; 48(4):2097-2098.
Jones DP. Sequencing the exposome: a call to action. Toxicol Rep 2016; 3:29-45.
ExWAS and data science:
Patel CJ, Bhattacharya J, Butte AJ. An Environment-Wide Association Study (EWAS) on Type 2 Diabetes Mellitus. PLoS ONE 2010; 5(5):e10746.
Chung MK, et al. The exposome and exposome-wide association studies. Exposome 2024.
Patel CJ, et al. Decoding the exposome: data science methodologies and implications in ExWAS. Exposome 2024; 4(1):osae001.
Summary
LC-HRMS enables untargeted measurement of thousands of chemical features in a single sample
It captures endogenous metabolites, exogenous chemicals, drugs, dietary compounds, and microbiome products simultaneously
The annotation bottleneck (~80-95% unannotated) is the major challenge
The ExWAS framework from this course extends directly to untargeted data — same statistics, larger scale
Batch effects and semi-quantitative data require careful pre-processing
Untargeted and targeted approaches are complementary — targeted validates untargeted discoveries
The future of exposome epidemiology lies in combining LC-HRMS measurement with the ExWAS analytical pipeline at population scale
What’s Next?
The tools are in place:
nhanespewas provides the targeted ExWAS infrastructure (Modules 4-9)
LC-HRMS provides the next-generation measurement platform (this module)
Statistical methods from Modules 3 and 7 scale to untargeted data
The exposome is no longer limited to what we know to measure — LC-HRMS opens the door to discovering the unknown unknowns of environmental health.
Supported By
This course is supported by the National Institutes of Health (NIH):
National Institute of Environmental Health Sciences (NIEHS): R01ES032470, U24ES036819
National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK): R01DK137993