MaCHTools: Additional functionality for the imputation software MaCH




Mitchel, Jeffrey S.


Journal Title

Journal ISSN

Volume Title



Imputation of unknown genotypes is becoming a standard procedure in exploratory genetic association studies. Imputation is accomplished by comparing observed data from the study population to reference panels of individuals who are from a genetically similar population and genotyped at a dense set of polymorphic sites. Linkage disequilibrium within the reference panels is used to construct haplotypes and extrapolate allelic correlations in the test sample. Imputation has been shown to be accurate for the inference of genotypes at unobserved SNPs, as well as for quality control measures at genotyped locations. Imputing genotypes also allows cohorts that were genotyped on different platforms to be combined in a joint or meta-analysis. One of the most widely used imputation software packages is MaCH ( MaCH uses a powerful and accurate Markov chain-based algorithm, however its usability is lacking. MaCHTools allows the user to streamline their workflow with MaCH through input file specification, error checking, and QC measures, MaCHTools began as a series of Java scripts used to check input files and QC raw data as an initial step before imputing additional genotypes in MaCH. This set of scripts became invaluable to the GWAS workflow, but they were unpolished and ill-suited for public release to benefit the scientific community. This project aimed to bundle the scripts into a single executable program that provides a graphical user interface (GUI) to facilitate use by students and researchers to aid in streamlining the GWAS workflow. Additional functionalities include more efficient launching of jobs to compute clusters and compatibility with different Linux job handlers, the ability to easily switch between different GWAS projects including switching between different genotype data and reference datasets, more simplistic specification of parameters and thresholds, and several other usability improvements. The GWAS workflow that includes dataset preparation with MaCHTools coupled with haplotype estimation and imputation with MaCH was validated by replicating results from a published study of the genetic basis of Alzheimer’s endophenotypes in the Texas Alzheimer’s Research and Care Consortium. A similar analysis was then performed to determine the genetic basis of D, a latent variable that represents the dementing process.