ORCID ID
Graduation Date
Spring 5-9-2026
Document Type
Thesis
Degree Name
Master of Science (MS)
Programs
Biostatistics
First Advisor
Christopher Wichman
Second Advisor
Lynette Smith
Third Advisor
Apar Ganti
Abstract
This study evaluates the robustness and reproducibility of a previously published analysis of lung adenocarcinoma gene expression data (Schabath et al.), which examined KRAS-associated biology in relation to co-occurring STK11 and TP53 mutations. The original study used Kaplan-Meier (KM) survival analysis and gene expression signatures to assess associations between driver mutations, tumor suppressor mutations, and patient outcomes. The aim of this work is to replicate these analyses and evaluate consistency across independent datasets (Wilkerson and Selamat), with extension using Cox proportional hazards (PH) models. Two hypotheses are evaluated: (1) that the number of mutated genes increases the hazard of death, and (2) that higher RAS de novo expression increases the hazard of death.
This replication reproduces the KM comparisons and gene signature analyses from the original study. The robustness of the analytical approach in Schabath et al. is evaluated across datasets differing in sample size, gene coverage, and outcome completeness. Cox PH models are then used to examine the effects of gene expression, mutation status, and clinical covariates (age, stage, gender, and smoking status) on survival. Sensitivity analyses assess the impact of alternative handling of missing smoking status data.
Results show partial replication of the original findings. Clinical covariates such as stage and age are generally consistent predictors of survival across datasets. In contrast, gene and signature effects vary substantially. The number of mutated genes is not associated with survival in either dataset. The RAS de Novo signature is significantly associated with survival in the Schabath dataset, but not in the Wilkerson cohort. Overall, the results are sensitive to dataset composition and missing data, reducing statistical stability.
In conclusion, while some directional consistency is observed, the results indicate limited robustness and generalizability of gene signature-based associations across independent datasets. The Cox modeling extension provides additional analytical depth but reinforces the overall conclusion that results are strongly data dependent.
Rights
The author holds the copyright to this work and any reuse or permissions must be obtained from the author directly.
Recommended Citation
Berry, Camryn, "Replication and Extension of KRAS-Associated Gene Expression Signatures in Lung Adenocarcinoma: A Cox Proportional Hazards Analysis of Mutation Status, Gene Expression, Multi-Gene Mutation Count, and RAS de Novo Signature" (2026). Theses & Dissertations. 1080.
https://digitalcommons.unmc.edu/etd/1080
Included in
Biostatistics Commons, Cancer Biology Commons, Neoplasms Commons, Oncology Commons, Respiratory Tract Diseases Commons, Survival Analysis Commons