Research Program
Statistical methods for precision oncology
We work with UNC Lineberger clinicians on methods and software for biomarker discovery, adaptive trial designs, and translational research. Clinical collaborations shape the questions we address and how tools are delivered.
Research Program Overview
Current research focuses on three interconnected areas with demonstrated clinical applications:
Clinical-Grade Tumor Subtyping
PurIST, developed in collaboration with the Yeh laboratory, enables single-sample pancreatic cancer classification without reference cohorts, addressing tumor purity heterogeneity through deconvolution-informed modeling. CLIA-certified and used to stratify 300+ patients across 12 active trials.
Deep Learning for Non-Ignorable Missingness
NIMIWAE and dlGLM frameworks bridge variational autoencoders with generalized linear models for principled uncertainty quantification when data are missing not at random—enabling valid inference in clinical registries and genomic studies with informative dropout.
Adaptive Platforms with Late-Arriving Biomarkers
Bayesian response-adaptive designs that integrate ctDNA, imaging, and tissue markers arriving weeks post-enrollment. Enables real-time enrichment without waiting for clinical endpoints. Deployed in SPORE and ARPA-H ADAPT trials.
Translational Work
Selected projects
AI & Machine Learning
Deep Learning for Non-Ignorable Missingness
→ Methods for handling data that's systematically missing (e.g., sicker patients drop out)
NIMIWAE and dlGLM combine variational autoencoders with generalized linear model inference to handle informative missingness in clinical registries and genomic studies, providing uncertainty estimates when data are missing not at random.
Clinical Translation
Single-Sample Tumor Subtyping
→ Classifying individual pancreatic tumors into biological subtypes from a single RNA sample
PurIST, developed in collaboration with the Yeh laboratory, enables platform-independent, reference-free pancreatic cancer subtyping using single bulk RNA-seq samples and deconvolution-informed modeling. It supports clinical decision-making in biomarker-stratified trials.
Adaptive Trials
Bayesian Platform Design with Late-Arriving Biomarkers
→ Trial designs that accommodate blood/tissue biomarker results arriving weeks after enrollment
Adaptive randomization framework integrating serial ctDNA, tissue, and imaging data for metastatic breast cancer platforms. Methodology accommodates staggered biomarker availability and enables mid-trial enrichment based on early response signals.
High-Dimensional Methods
Penalized Mixed Models for Correlated Biomarkers
→ Variable selection methods for correlated genomic data with repeated measures or clustered samples
glmmPen framework enables variable selection in high-dimensional longitudinal and clustered data, addressing over-fitting in genomic studies with complex correlation structures.
Focus Areas
Research priorities
Our work spans methodological development, software implementation, and collaborative translational research.
Precision medicine
Biomarker-guided treatment methods
→ Statistical tools for classifying patients into subtypes to inform treatment decisions
Subtyping, stromal modeling, and patient stratification methods for clinical decision support.
Genomics & epigenomics
Transcriptomic and epigenomic software
→ R packages for analyzing RNA-seq and chromatin data in cancer studies
Open-source RNA-seq and chromatin analysis tools for cancer genomics research.
AI & deep learning
Deep learning methods for missing data and clinical support
→ AI methods for incomplete datasets and clinical trial matching tools
Deep learning, LLM, and probabilistic models for incomplete data and clinical decision support.
Trial innovation
Adaptive design & real-time biomarker integration
→ Trial designs that adjust treatment assignment based on incoming biomarker data
Bayesian platforms integrating ctDNA, imaging, and clinical data in cooperative trials.
Research Portfolio Map (2011-2025)
Interactive Research Portfolio (2011-2025)
Tip: Hover over nodes to see paper details. Click and drag to explore connections.
Cross-Cutting Methodological Innovations
Rigor & Reproducibility
Quantification-aware modeling, heterogeneity frameworks, and documented software.
Clinical Translation
Collaborations with UNC oncologists and cooperative groups on adaptive, biomarker-rich trials.
Open Software
10+ CRAN/Bioconductor packages with tutorials, vignettes, and active maintenance.
Collaborative Network
UNC Lineberger
Collaborations with Jen Jen Yeh (tumor-stroma organoid models, stromal reprogramming), Lisa Carey (TBCRC adaptive trials, endocrine resistance), Chuck Perou (breast subtype integration), and Ben Vincent (immunotherapy biomarkers, neoantigen prediction).
National Consortia
Statistical leadership in Translational Breast Cancer Research Consortium (TBCRC) Statistical Working Group, V Foundation Scientific Advisory Board, and PDAC Stromal Reprogramming Consortium.
Methodology Partners
Joseph Ibrahim, Michael Kosorok, Mike Love, Katie Hoadley, and collaborators extend our statistical methods.
Interested in PhD research? We're recruiting for Fall 2026. Learn about our training program →