Ancestry Informative Markers Tailored to Hispanic Populations

Date

2020-05

Authors

Setser, Casandra H.

ORCID

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Hispanic populations are highly heterogeneous despite being grouped together as a conglomerate population; this makes an accurate panel of ancestry informative markers (AIMs) especially important for human identification. In Chapter 2, the Genomic Origins and Admixture in Latinos (GOAL) dataset containing 494,886 SNPs was used for SNP ascertainment. Utilizing a country attributable variant of Wright's FST, 234 SNPs were selected for biogeographic ancestry (BGA) determination by tailoring each SNP to genetic differentiation of specific populations. Accuracy of BGA prediction was tested using multinomial logistic regression (MLR) and as few as 55 SNPs were robust to 90% for all populations studied. The panel of 234 SNPs was compressed by 65.8% to 80 SNPs by decreasing the influence of Honduras and the Dominican Republic SNPs with high country attributable mean FST values in favor of additional SNPs for Colombia, Cuba, and Puerto Rico; this balanced small panel size with classification accuracy. In Chapter 3, the Setser80 Hispanic AIMs panel was tested against the panels of 128 SNPs developed by the Seldin group and 55 SNPs developed by the Kidd group using STRUCTURE, PCA, a naive Bayesian classifier and MLR. In STRUCTURE, the Setser80 was able to distinguish Honduras, the Dominican Republic, and Colombia at K=4, where the Seldin and Kidd panels were optimized at K=3 and distinguished only Honduras and the Dominican Republic; similar results were obtained by PCA. The GOAL dataset was combined with the Admixed American super-population from the 1000 Genomes Project to test the panel on an expanded dataset of seven populations. Overall, the Setser80 had superior results to the Seldin and Kidd panels with 91.5% accuracy by naive Bayesian classifier and 93.2% by MLR. As an indication of its portability, the Setser80 had accuracies of >98% for Peru and >80% for Mexicans living in Los Angeles, which were not involved in SNP ascertainment. Given its accuracy and lack of overlap, the Setser80 may supplement existing panels for more granular Hispanic BGA determination. In Chapter 4, the application of allele frequencies to forensic genetics, genealogy, and clinical genetics are discussed as well as future directions and ethical considerations.

Description

Citation