Genetic Differentiation of Hispanic Populations Using Ancestry Informative Markers
Date
Authors
ORCID
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Hypothesis: There are at least 10,500 unidentified human remains in the US as of August 2015, with 2,041 of presumed Hispanic origin (NamUs 2015). Conventional DNA analysis identifies an individual through comparison with reference profiles. For those with no reference, panels of ancestry informative single nucleotide polymorphisms (SNPs) exist (Kidd 2014, Seldin 2009), but they focus on global differentiation and are not useful for ancestry determination of admixed populations (e.g. Hispanics). We hypothesize that a small panel of SNPs ascertained from appropriate populations with great genetic differentiation can distinguish ancestry within Hispanic populations. Materials: This bioinformatics study uses the Genomic Origins and Ancestry in Latinos (GOAL) data set of 250 individuals with ancestry from Columbia, Cuba, Dominican Republic, Haiti, Honduras, or Puerto Rico, genotyped using the Affymetrix 6.0 chip to develop an informative Hispanic SNP panel. Methods: Starting with 897,336 SNPs, we trimmed to 531,878 SNPs using linkage disequilibrium of 0.7. We then calculated pairwise FST for each SNP with each population pair using PLINK software (Haiti excluded). SNPs that met the 0.15 threshold for the four comparisons were included in a 1217 SNP panel. We used STRUCTURE to visualize population separation. To determine if a smaller SNP set could be utilized while retaining information, we used the SNPs with the top ten mean FST values from each population plus five extra to try to distinguish Cuba vs. Dominican Republic for a condensed panel of 56 SNPs. Additionally, we combined 1000 Genomes and GOAL data to verify whether the countries differentiate ancestrally or geographically. Results: STRUCTURE analysis showed Honduras was easily distinguished from other countries in the 1217 and 56 SNP panels. Other countries were also separated based on contribution from ancestral populations; however, the separation was less than ideal. Notably, Honduras contributed 71% of the SNPs in the 1217 panel. When analyzed with 1000 Genomes data, Honduras separated with the Chinese population for K=1-3, but was the first GOAL population to separate from the ancestral line. Conclusions: Utilizing an efficient SNP panel consistently separated Honduras from other populations demonstrating proof of concept. Greater separation of country of origin may be seen with a larger data set and alternative selection of each population’s number of SNPs by a cumulative mean FST threshold.