Graduation Date
Fall 12-18-2015
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Programs
Biomedical Informatics
First Advisor
Dr. Babu Guda
Abstract
The high degree of heterogeneity observed in breast cancers makes it very difficult to classify cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. In this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Identified somatic and non-synonymous single nucleotide variants were assigned a quantitative score (C-score) that represents the extent of negative impact on the function of the gene. Using these scores with a non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients among the three subgroups, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the C-scores (mutation scores) of these subgroups identified 358 genes that carry significantly higher rates of mutations in the late-stage-enriched subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late-state-enriched subgroup. Finally, using the identified subgroups, we also developed a supervised classification model to predict the likely stage of patients, given their mutation profiles, hence provide clinical insights to help devise an effective treatment plan. This study demonstrates that gene mutation profiles can be effectively used with machine-learning methods to identify clinically distinguishable subgroups of cancer patients. Genes and gene families that carry a heavy mutational load in late-stage-enriched cancer patients compared to early-stage-enriched subgroup were also identified from functional analysis of genes. The classification model developed in this method could provide a reasonable prediction of the stage of cancer patients solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology could also be applied to other cancer datasets.
Recommended Citation
Vural, Suleyman, "Classification of Breast Cancer Patients Using Somatic Mutation Profiles and Machine Learning Approaches" (2015). Theses & Dissertations. 50.
https://digitalcommons.unmc.edu/etd/50