Theses & Dissertations

Classification of Breast Cancer Patients Using Somatic Mutation Profiles and Machine Learning Approaches

Suleyman Vural, University of Nebraska Medical CenterFollow

Graduation Date

Fall 12-18-2015

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Programs

Biomedical Informatics

First Advisor

Dr. Babu Guda

Abstract

The high degree of heterogeneity observed in breast cancers makes it very difficult to classify cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. In this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Identified somatic and non-synonymous single nucleotide variants were assigned a quantitative score (C-score) that represents the extent of negative impact on the function of the gene. Using these scores with a non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients among the three subgroups, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the C-scores (mutation scores) of these subgroups identified 358 genes that carry significantly higher rates of mutations in the late-stage-enriched subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late-state-enriched subgroup. Finally, using the identified subgroups, we also developed a supervised classification model to predict the likely stage of patients, given their mutation profiles, hence provide clinical insights to help devise an effective treatment plan. This study demonstrates that gene mutation profiles can be effectively used with machine-learning methods to identify clinically distinguishable subgroups of cancer patients. Genes and gene families that carry a heavy mutational load in late-stage-enriched cancer patients compared to early-stage-enriched subgroup were also identified from functional analysis of genes. The classification model developed in this method could provide a reasonable prediction of the stage of cancer patients solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology could also be applied to other cancer datasets.

Recommended Citation

Vural, Suleyman, "Classification of Breast Cancer Patients Using Somatic Mutation Profiles and Machine Learning Approaches" (2015). Theses & Dissertations. 50.
https://digitalcommons.unmc.edu/etd/50

Download

Included in

Bioinformatics Commons, Systems Biology Commons

COinS

DigitalCommons@UNMC

Theses & Dissertations

Classification of Breast Cancer Patients Using Somatic Mutation Profiles and Machine Learning Approaches

Graduation Date

Document Type

Degree Name

Programs

First Advisor

Abstract

Recommended Citation

Included in

Links

Search

Browse

Author Corner

DigitalCommons@UNMC

Theses & Dissertations

Classification of Breast Cancer Patients Using Somatic Mutation Profiles and Machine Learning Approaches

Author

Graduation Date

Document Type

Degree Name

Programs

First Advisor

Abstract

Recommended Citation

Included in

Share

Links

Search

Browse

Author Corner