    TRcaller: a novel tool for precise and ultrafast tandem repeat variant genotyping in massively parallel sequencing reads
    (Frontiers Media S.A., 2023-08-03) Wang, Xuewen; Huang, Meng; Budowle, Bruce; Ge, Jianye
    Calling tandem repeat (TR) variants from DNA sequences is of both theoretical and practical significance. Some bioinformatics tools have been developed for detecting or genotyping TRs. However, little study has been done to genotyping TR alleles from long-read sequencing data, and the accuracy of genotyping TR alleles from next-generation sequencing data still needs to be improved. Herein, a novel algorithm is described to retrieve TR regions from sequence alignment, and a software program TRcaller has been developed and integrated into a web portal to call TR alleles from both short- and long-read sequences, both whole genome and targeted sequences generated from multiple sequencing platforms. All TR alleles are genotyped as haplotypes and the robust alleles will be reported, even multiple alleles in a DNA mixture. TRcaller could provide substantially higher accuracy (>99% in 289 human individuals) in detecting TR alleles with magnitudes faster (e.g., approximately 2 s for 300x human sequence data) than the mainstream software tools. The web portal preselected 119 TR loci from forensics, genealogy, and disease related TR loci. TRcaller is validated to be scalable in various applications, such as DNA forensics and disease diagnosis, which can be expanded into other fields like breeding programs. Availability: TRcaller is available at
  • Item
    A prospective cost-benefit analysis for nylon 4N6FLOQSwabs(R): example of the process and potential benefits
    (Springer Nature, 2022-09-04) Budowle, Bruce; Ge, Jianye; Sajantila, Antti
    Laboratories and their criminal justice systems are confronted with challenges for implementing new technologies, practices, and policies even when there appears to be demonstrative benefits to operational performance. Impacting decisions are the often higher costs associated with, for example, new technologies, limited current budgets, and making hard decisions on what to sacrifice to take on the seemingly better approach. A prospective cost-benefit analysis (CBA) could help an agency better formulate its strategies and plans and more importantly delineate how a relatively small increase to take on, for example, a new technology can have large impact on the system (e.g., the agency, other agencies, victims and families, and taxpayers). To demonstrate the process and potential value a CBA was performed on the use of an alternate and more expensive swab with reported better DNA yield and being certified human DNA free (i.e., nylon 4N6FLOQSwabs(R)), versus the traditional less costly swab (i.e., cotton swab). Assumptions are described, potential underestimates and overestimates noted, different values applied (for low and modest to high), and potential benefits (monetary and qualitative) presented. The overall outcome is that the cost of using the more expensive technology pales compared with the potential tangible and intangible benefits. This approach could be a guide for laboratories (and associated criminal justice systems) worldwide to support increased funding, although the costs and benefits may vary locally and for different technologies, practices, and policies. With well-developed CBAs, goals of providing the best services to support the criminal justice system and society can be attained.
  • Item
    USAT: a bioinformatic toolkit to facilitate interpretation and comparative visualization of tandem repeat sequences
    (BioMed Central Ltd., 2022-11-20) Wang, Xuewen; Budowle, Bruce; Ge, Jianye
    BACKGROUND: Tandem repeats (TR), highly variable genomic variants, are widely used in individual identification, disease diagnostics, and evolutionary studies. The recent advances in sequencing technologies and bioinformatic tools facilitate calling TR haplotypes genome widely. Both length-based and sequence-based TR alleles are used in different applications. However, sequence-based TR alleles could provide the highest precision in characterizing TR haplotypes. The need to identify the differences at the single nucleotide level between or among TR haplotypes with an easy-use bioinformatic tool is essential. RESULTS: In this study, we developed a Universal STR Allele Toolkit (USAT) for TR haplotype analysis, which takes TR haplotype output from existing tools to perform allele size conversion, sequence comparison of haplotypes, figure plotting, comparison for allele distribution, and interactive visualization. An exemplary application of USAT for analysis of the CODIS core STR loci for DNA forensics with benchmarking human individuals demonstrated the capabilities of USAT. USAT has user-friendly graphic interfaces and runs fast in major computing operating systems with parallel computing enabled. CONCLUSION: USAT is a user-friendly bioinformatics software for interpretation, visualization, and comparisons of TRs.
  • Item
    Precision DNA Mixture Interpretation with Single-Cell Profiling
    (MDPI, 2021-10-20) Ge, Jianye; King, Jonathan L.; Smuts, Amy; Budowle, Bruce
    Wet-lab based studies have exploited emerging single-cell technologies to address the challenges of interpreting forensic mixture evidence. However, little effort has been dedicated to developing a systematic approach to interpreting the single-cell profiles derived from the mixtures. This study is the first attempt to develop a comprehensive interpretation workflow in which single-cell profiles from mixtures are interpreted individually and holistically. In this approach, the genotypes from each cell are assessed, the number of contributors (NOC) of the single-cell profiles is estimated, followed by developing a consensus profile of each contributor, and finally the consensus profile(s) can be used for a DNA database search or comparing with known profiles to determine their potential sources. The potential of this single-cell interpretation workflow was assessed by simulation with various mixture scenarios and empirical allele drop-out and drop-in rates, the accuracies of estimating the NOC, the accuracies of recovering the true alleles by consensus, and the capabilities of deconvolving mixtures with related contributors. The results support that the single-cell based mixture interpretation can provide a precision that cannot beachieved with current standard CE-STR analyses. A new paradigm for mixture interpretation is available to enhance the interpretation of forensic genetic casework.
  • Item
    How many familial relationship testing results could be wrong?
    (PLOS, 2020-08-13) Ge, Jianye; Budowle, Bruce
  • Item
    skater: an R package for SNP-based kinship analysis, testing, and evaluation
    (F1000 Research Ltd., 2022-01-07) Turner, Stephen D.; Nagraj, V. P.; Scholz, Matthew; Jessa, Shakeel; Acevedo, Carlos; Ge, Jianye; Woerner, August E.; Budowle, Bruce
    Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream process- ing and manipulation. A dedicated software package that consistently and intuitively imple- ments this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at Documentation is available at
  • Item
    Evaluating the Impact of Dropout and Genotyping Error on SNP-Based Kinship Analysis With Forensic Samples
    (Frontiers Media S.A., 2022-06-30) Turner, Stephen D.; Nagraj, V. P.; Scholz, Matthew; Jessa, Shakeel; Acevedo, Carlos; Ge, Jianye; Woerner, August E.; Budowle, Bruce
    Technological advances in sequencing and single nucleotide polymorphism (SNP) genotyping microarray technology have facilitated advances in forensic analysis beyond short tandem repeat (STR) profiling, enabling the identification of unknown DNA samples and distant relationships. Forensic genetic genealogy (FGG) has facilitated the identification of distant relatives of both unidentified remains and unknown donors of crime scene DNA, invigorating the use of biological samples to resolve open cases. Forensic samples are often degraded or contain only trace amounts of DNA. In this study, the accuracy of genome-wide relatedness methods and identity by descent (IBD) segment approaches was evaluated in the presence of challenges commonly encountered with forensic data: missing data and genotyping error. Pedigree whole-genome simulations were used to estimate the genotypes of thousands of individuals with known relationships using multiple populations with different biogeographic ancestral origins. Simulations were also performed with varying error rates and types. Using these data, the performance of different methods for quantifying relatedness was benchmarked across these scenarios. When the genotyping error was low (<1%), IBD segment methods outperformed genome-wide relatedness methods for close relationships and are more accurate at distant relationship inference. However, with an increasing genotyping error (1-5%), methods that do not rely on IBD segment detection are more robust and outperform IBD segment methods. The reduced call rate had little impact on either class of methods. These results have implications for the use of dense SNP data in forensic genomics for distant kinship analysis and FGG, especially when the sample quality is low.