ORCID ID
Graduation Date
Summer 8-15-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Programs
Biostatistics
First Advisor
Ran Dai
Second Advisor
Cheng Zheng
Third Advisor
Ying Zhang
Fourth Advisor
Hongying (Daisy) Dai
Abstract
The increasing availability of high-dimensional (HD) observational data offers great opportunities for scientific discovery through feature selection. However, HD biomedical data also present significant statistical challenges, as signals are often rare and weak, and the relationship between predictors and outcomes is frequently unknown. We are interested in developing a knockoff-based framework to identify features that are truly associated with biomedical outcomes from HD data, while effectively ensuring the quality of selection by controlling the false discovery rate (FDR). This framework allows for the integration of machine learning methods, such as penalized regression and random forests. In particular, we demonstrate that our methods address the following challenges: (1) handling measurement errors and missingness in HD metabolomics data; (2) selecting reproducible features by integrating information from multiple electronic health record (EHR) sources with heterogeneity and privacy concerns; and (3) conducting mediator selection in high-dimensional causal models involving nonlinearities and interactions. We validate the FDR control and improved power of our methods through extensive simulation studies. Finally, we apply the proposed framework to the Women’s Health Initiative data, the National COVID Cohort Collaborative (N3C), and Alzheimer’s research datasets. These applications provide new insights into FDR control in HD feature selection and lead to novel scientific findings in nutritional epidemiology, infectious disease research, and neuroscience.
Recommended Citation
Wang, Runqiu, "Variable Selection with False Discovery Rate Control in High-Dimensional Data" (2025). Theses & Dissertations. 965.
https://digitalcommons.unmc.edu/etd/965
Included in
Biostatistics Commons, Clinical Trials Commons, Computational Neuroscience Commons, Immunology of Infectious Disease Commons, Nutritional Epidemiology Commons, Statistical Methodology Commons
Comments
2025 Copyright, the authors