ORCID ID
Graduation Date
Fall 2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Programs
Biostatistics
First Advisor
Lynette M Smith
Second Advisor
Ran Dai
Abstract
The early detection of complex diseases is essential for improving patient outcomes, particularly for conditions that remain asymptomatic until advanced stages. Biomarkers serve as key indicators of disease presence and progression, but selecting an optimal subset remains a challenge due to the high dimensionality of modern biological datasets. While advances in omics technologies have identified numerous candidate biomarkers, their effective utilization requires robust selection methods to ensure interpretability, cost-effectiveness, and predictive reliability in both cross-sectional and longitudinal settings. To address these challenges, we propose two novel approaches: Stability Selection Ensemble Learning (STABEL) for cross-sectional data and Longitudinal Stability Selection Ensemble Learning (LSTABEL) for longitudinal data. These methods integrate stability selection with ensemble learning to improve biomarker selection and predictive accuracy while mitigating overfitting. Stability selection enhances traditional variable selection by producing a stable subset of significant variables. Additionally, ensemble learning enhances generalization capabilities by combining the predictions of multiple models in order to mitigate the limitations of individual models. Simulation studies demonstrate the superiority of STABEL and LSTABEL over traditional methods in selecting truly relevant biomarkers and enhancing prediction performance. These methodologies are particularly valuable in applications where early detection is crucial. We illustrate their effectiveness in two case studies: ovarian cancer, where selecting a concise biomarker panel can enhance early diagnosis and treatment strategies, and Alzheimer’s disease, where robust biomarker discovery can improve the prediction of disease progression and cognitive decline. We also developed an R package, stabel, for the STABEL method.
Recommended Citation
Das, Apu Chandra, "Variable Selection and Prediction Using Machine Learning Models in High-Dimensional Data" (2025). Theses & Dissertations. 997.
https://digitalcommons.unmc.edu/etd/997
Comments
2025 Copyright, the authors