Research Program

Statistical methods for precision oncology

We work with UNC Lineberger clinicians on methods and software for biomarker discovery, adaptive trial designs, and translational research. Clinical collaborations shape the questions we address and how tools are delivered.

View publications Explore projects

Research Program Overview

Current research focuses on three interconnected areas with demonstrated clinical applications:

Clinical-Grade Tumor Subtyping

PurIST, developed in collaboration with the Yeh laboratory, enables single-sample pancreatic cancer classification without reference cohorts, addressing tumor purity heterogeneity through deconvolution-informed modeling. CLIA-certified and used to stratify 300+ patients across 12 active trials.

Deep Learning for Non-Ignorable Missingness

NIMIWAE and dlGLM frameworks bridge variational autoencoders with generalized linear models for principled uncertainty quantification when data are missing not at random—enabling valid inference in clinical registries and genomic studies with informative dropout.

Adaptive Platforms with Late-Arriving Biomarkers

Bayesian response-adaptive designs that integrate ctDNA, imaging, and tissue markers arriving weeks post-enrollment. Enables real-time enrichment without waiting for clinical endpoints. Deployed in SPORE and ARPA-H ADAPT trials.

Translational Work

Selected projects

AI & Machine Learning

Deep Learning for Non-Ignorable Missingness

→ Methods for handling data that's systematically missing (e.g., sicker patients drop out)

NIMIWAE and dlGLM combine variational autoencoders with generalized linear model inference to handle informative missingness in clinical registries and genomic studies, providing uncertainty estimates when data are missing not at random.

JCGS 2024 Stat Biopharm Res 2024 ARPA-H deployment

Clinical Translation

Single-Sample Tumor Subtyping

→ Classifying individual pancreatic tumors into biological subtypes from a single RNA sample

PurIST, developed in collaboration with the Yeh laboratory, enables platform-independent, reference-free pancreatic cancer subtyping using single bulk RNA-seq samples and deconvolution-informed modeling. It supports clinical decision-making in biomarker-stratified trials.

Clinical Cancer Research (2020) J Molecular Diagnostics (2024) CLIA-certified implementation

Adaptive Trials

Bayesian Platform Design with Late-Arriving Biomarkers

→ Trial designs that accommodate blood/tissue biomarker results arriving weeks after enrollment

Adaptive randomization framework integrating serial ctDNA, tissue, and imaging data for metastatic breast cancer platforms. Methodology accommodates staggered biomarker availability and enables mid-trial enrichment based on early response signals.

ARPA-H ADAPT platform TBCRC cooperative trials Bayesian response-adaptive design

High-Dimensional Methods

Penalized Mixed Models for Correlated Biomarkers

→ Variable selection methods for correlated genomic data with repeated measures or clustered samples

glmmPen framework enables variable selection in high-dimensional longitudinal and clustered data, addressing over-fitting in genomic studies with complex correlation structures.

The R Journal (2024) CRAN package Multi-omic integration

Focus Areas

Research priorities

Our work spans methodological development, software implementation, and collaborative translational research.

Precision medicine

Biomarker-guided treatment methods

→ Statistical tools for classifying patients into subtypes to inform treatment decisions

Subtyping, stromal modeling, and patient stratification methods for clinical decision support.

PurIST subtype classification for GI tumors Stroma-aware GLMMs for breast and pancreatic cancer Between-study reproducibility assessment

Precision medicine papers →

Genomics & epigenomics

Transcriptomic and epigenomic software

→ R packages for analyzing RNA-seq and chromatin data in cancer studies

Open-source RNA-seq and chromatin analysis tools for cancer genomics research.

CompDTUReg for isoform-level RNA testing epigraHMM + mixNBHMM for multi-condition enrichment Allele-specific & isoform inference pipelines

Browse software packages →

AI & deep learning

Deep learning methods for missing data and clinical support

→ AI methods for incomplete datasets and clinical trial matching tools

Deep learning, LLM, and probabilistic models for incomplete data and clinical decision support.

NIMIWAE + dlGLM for non-ignorable missingness Semi-supervised factorization for cancer subtyping LLM tools for trial matching and ctDNA monitoring

Machine learning work →

Trial innovation

Adaptive design & real-time biomarker integration

→ Trial designs that adjust treatment assignment based on incoming biomarker data

Bayesian platforms integrating ctDNA, imaging, and clinical data in cooperative trials.

ARPA-H ADAPT analytics TBCRC + SPORE biomarker-informed randomization Master protocols with serial ctDNA data

View trial projects →

Research Portfolio Map (2011-2025)

Interactive Research Portfolio (2011-2025)

Precision Medicine (43 papers)

Tool Development (16 papers)

AI/Deep Learning (5+ AI/ML methods & papers)

Adaptive Trials (4 papers)

Tip: Hover over nodes to see paper details. Click and drag to explore connections.

Cross-Cutting Methodological Innovations

Rigor & Reproducibility

Quantification-aware modeling, heterogeneity frameworks, and documented software.

Clinical Translation

Collaborations with UNC oncologists and cooperative groups on adaptive, biomarker-rich trials.

Open Software

10+ CRAN/Bioconductor packages with tutorials, vignettes, and active maintenance.

Collaborative Network

UNC Lineberger

Collaborations with Jen Jen Yeh (tumor-stroma organoid models, stromal reprogramming), Lisa Carey (TBCRC adaptive trials, endocrine resistance), Chuck Perou (breast subtype integration), and Ben Vincent (immunotherapy biomarkers, neoantigen prediction).

National Consortia

Statistical leadership in Translational Breast Cancer Research Consortium (TBCRC) Statistical Working Group, V Foundation Scientific Advisory Board, and PDAC Stromal Reprogramming Consortium.

Methodology Partners

Joseph Ibrahim, Michael Kosorok, Mike Love, Katie Hoadley, and collaborators extend our statistical methods.

Interested in PhD research? We're recruiting for Fall 2026. Learn about our training program →