Graduation Date

Fall 12-14-2018

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Programs

Genetics, Cell Biology & Anatomy

First Advisor

Chittibabu Guda

MeSH Headings

RNA-Seq, Single-Cell Analysis, Sequence Analysis, Software DesignRNA

Abstract

Single-cell sequencing enables the rapid acquisition of genomic and transcriptomic data from individual cells to better understand genetic diseases, such as cancer or autoimmune disorders, which are often affected by changes in rare cells. Currently, no existing software is aimed at identifying single nucleotide variations or micro (1-50bp) insertions and deletions in single-cell RNA sequencing (scRNA-seq) data. However, generating high quality data is vital to the study of the aforementioned diseases, among others. Our goal is to create such a tool and use in-house sequencing to validate its effectiveness. Our software employs the unique information found in scRNA-seq data to more accurately identify variants in ways not possible with software designed for bulk sequencing. We intentionally isolate variants based on three different classes: homozygous-looking, heterozygous, and bimodally-distributed heterozygous, the last of which can only be identified in scRNA-seq. To properly validate the results from this method, variants were called on: scRNA-seq and exome sequencing jointly performed on human articular chondrocytes, scRNA-seq from mouse embryonic fibroblasts (MEFs), and simulated data stemming from the MEF alignments. The chondrocyte exome sequencing was used to validate the chondrocyte scRNA-seq results. For Red Panda, on average, 913 variants were shared with the exome and had a Positive Predictive Value (PPV) of 45.0%. Other tools—FreeBayes, GATK HaplotypeCaller, GATK UnifiedGenotyper, and Platypus—ranged from 65-705 variants and 5.8%-31.7% PPV. Sanger sequencing was performed on a subset of the variants identified in the MEFs, and simulated data was generated to assess the sensitivity of each tools. From the latter, Red Panda had the highest sensitivity at 72.44%. The other tools ranged from 18.22% to 39.09%. We show that our method provides a novel and improved mechanism to identify variants in scRNA-seq as compared to currently-existing software.

Share

COinS