Highly Informative Short Tandem Repeat Markers for Enhanced DNA Mixture Deconvolution

Date

2018-08-01

Authors

Novroski, Nicole M. M.

ORCID

0000-0001-9071-9278 (Novroski, Nicole M. M.)

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

DNA typing in forensic genetics relies on amplification of short tandem repeat (STR) markers using the polymerase chain reaction (PCR), subsequently allele sizes are determined for each locus, using capillary electrophoresis (CE) and fluorescent detection. The resulting profiles are compared to reference sample profiles or to query existing profiles, such as those stored in the FBI Combined DNA Index System, to develop investigative leads to help solve crimes. The success of commercial STR kits to facilitate analysis of challenging samples has led to a demand to analyze increasingly complex DNA mixtures. Low quantity/low quality DNA samples have become commonplace in casework, but the interpretation of the resultant DNA profiles continues to remain challenging. Massively parallel sequencing (MPS) for typing forensically-relevant STR loci has dramatically enhanced the ability to identify allele diversity due to sequence variation within STR repeat and flanking regions. Sequence variation within the currently utilized STR loci for forensic genetic analysis is quite large. However, recent studies have demonstrated that some of the current core CODIS loci are devoid of repeat and/or flanking region sequence variation, minimizing the relative information via MPS for these STRs. Thus, novel STRs with increased sequence variation should be sought to facilitate mixture deconvolution. The primary goal of this research was to identify and characterize STR genetic variation, which in turn would allow for the development of a novel panel of highly polymorphic STR markers (referred to as the STR DECoDE panel; STR DNA EnhanCed DEconvolution panel) that is capable of deconvolving simple to complex DNA mixture samples better than current systems. A list of candidate STRs was generated by mining the 1000 Genomes Project using the criteria of 1) a repeat size of at least 4 nucleotides; 2) a minimum of 80% locus heterozygosity; and 3) generally an allele length spread of 10 nominal alleles or less. A preliminary panel of 248 candidate markers was designed, and a bioinformatics pipeline for MPS was created and implemented to assess the analytical performance and biological properties of each STR. The STR DECoDE panel is comprised of 73 of the 248 STRs that displayed the highest heterozygosity. This panel was compared to the current core CODIS loci regarding an ability to resolve in silico two-person mixtures from 443 population samples comprising three US populations. Additionally, each of the 73 loci was extensively characterized for its underlying genetic variation, and population genetic analyses were performed. The results of this dissertation research indicate that the STR DECoDE panel improves upon current mixture deconvolution efforts by employing markers that allow for better allele resolution of component contributors in a mixed DNA sample. The DECoDE panel loci offer a substantial degree of diversity compared with the current core CODIS STR loci used for forensic identity typing. In turn, use of this panel could facilitate complex downstream statistical modeling (probabilistic genotyping) and subjective interpretation that are currently utilized for analysis of DNA mixture samples in forensic laboratories. Finally, integration of DECoDE STR loci into current multiplexes will allow the field of forensic genetic investigation to increase the number of resolved genotypes in mixed samples being compared to reference and suspect profiles, and expand the DNA database by increasing the number of samples uploaded. The benefit to society from this revolutionary application will be an increase in the number of investigative leads and the overall resolution of more crimes.

Description

Citation