DESCRIPTION: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
URL: https://www.cog-genomics.org/plink/
CITATION: Purcell, Shaun, et al. "PLINK: a tool set for whole-genome association and population-based linkage analyses." The American journal of human genetics 81.3 (2007): 559-575.
PLINK2
FULLNAME:PLINK2
SHORTNAME:PLINK2
URL: https://www.cog-genomics.org/plink/2.0/
CITATION: Chang, Christopher C., et al. "Second-generation PLINK: rising to the challenge of larger and richer datasets." Gigascience 4.1 (2015): s13742-015.
EMMAX
FULLNAME:efficient mixed-model association eXpedited
SHORTNAME:EMMAX
URL: https://genome.sph.umich.edu/wiki/EMMAX
DESCRIPTION: EMMAX is a statistical test for large scale human or model organism association mapping accounting for the sample structure. In addition to the computational efficiency obtained by EMMA algorithm, EMMAX takes advantage of the fact that each loci explains only a small fraction of complex traits, which allows us to avoid repetitive variance component estimation procedure, resulting in a significant amount of increase in computational time of association mapping using mixed model.
CITATION:
GEMMA
FULLNAME: genome-wide efficient mixed-model association
SHORTNAME: GEMMA
DESCRIPTION: GEMMA is the software implementing the Genome-wide Efficient Mixed Model Association algorithm for a standard linear mixed model and some of its close relatives for genome-wide association studies (GWAS). It fits a standard linear mixed model (LMM) to account for population stratification and sample structure for single marker association tests. It fits a Bayesian sparse linear mixed model (BSLMM) using Markov chain Monte Carlo (MCMC) for estimating the proportion of variance in phenotypes explained (PVE) by typed genotypes (i.e. chip heritability), predicting phenotypes, and identifying associated markers by jointly modeling all markers while controlling for population structure. It is computationally efficient for large scale GWAS and uses freely available open-source numerical libraries.
CITATION: Zhou, Xiang, and Matthew Stephens. "Genome-wide efficient mixed-model analysis for association studies." Nature genetics 44.7 (2012): 821-824.
BOLT-lMM
FULLNAME:BOLT-lMM
SHORTNAME: BOLT-lMM
DESCRIPTION: The BOLT-LMM software package currently consists of two main algorithms, the BOLT-LMM algorithm for mixed model association testing, and the BOLT-REML algorithm for variance components analysis (i.e., partitioning of SNP-heritability and estimation of genetic correlations).
CITATION: Loh, Po-Ru, et al. "Efficient Bayesian mixed-model analysis increases association power in large cohorts." Nature genetics 47.3 (2015): 284-290.
-KEY WORDS: non-infinitesimal model, mixture of two Gaussian distributions
SAIGE
FULLNAME:Scalable and Accurate Implementation of GEneralized mixed model
SHORTNAME: SAIGE
URL: https://github.com/weizhouUMICH/SAIGE
DESCRIPTION: SAIGE is an R package with Scalable and Accurate Implementation of Generalized mixed model (Chen, H. et al. 2016). It accounts for sample relatedness and is feasible for genetic association tests in large cohorts and biobanks (N > 400,000). SAIGE performs single-variant association tests for binary traits and quantitative taits. For binary traits, SAIGE uses the saddlepoint approximation (SPA)(mhof, J. P. , 1961; Kuonen, D. 1999; Dey, R. et.al 2017) to account for case-control imbalance.
CITATION: Zhou, Wei, et al. "Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies." Nature genetics 50.9 (2018): 1335-1341.
-KEY WORDS:case-control imbalance, saddlepoint approximation (SPA)
CITATION: Jiang, Longda, et al. "A resource-efficient tool for mixed model association analysis of large-scale data." Nature genetics 51.12 (2019): 1749-1755.
-KEY WORDS: grid-search-based REML algorithm
CITATION: Jiang, Longda, et al. "A generalized linear mixed model association tool for biobank-scale data." Nature genetics 53.11 (2021): 1616-1621.
REGENIE
FULLNAME:REGENIE
SHORTNAME:REGENIE
URL: https://github.com/rgcgithub/regenie
DESCRIPTION:regenie is a C++ program for whole genome regression modelling of large genome-wide association studies. It is developed and supported by a team of scientists at the Regeneron Genetics Center.
CITATION: Mbatchou, Joelle, et al. "Computationally efficient whole-genome regression for quantitative and binary traits." Nature genetics 53.7 (2021): 1097-1103.
KEY WORDS: whole genome regression
POLMM
FULLNAME:proportional odds logistic mixed model (POLMM)
SHORTNAME:POLMM
URL: https://github.com/WenjianBI/POLMM
DESCRIPTION: Proportional Odds Logistic Mixed Model (POLMM) for ordinal categorical data analysis
CITATION: Bi, W., Zhou, W., Dey, R., Mukherjee, B., Sampson, J. N., & Lee, S. (2021). Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. The American Journal of Human Genetics, 108(5), 825-839.
KEY WORDS: ordinal categorical phenotypes
Gene-based analysis (rare variant)
REGENIE
FULLNAME:REGENIE
SHORTNAME:REGENIE
URL: https://github.com/rgcgithub/regenie
DESCRIPTION:regenie is a C++ program for whole genome regression modelling of large genome-wide association studies. It is developed and supported by a team of scientists at the Regeneron Genetics Center.
CITATION: Mbatchou, Joelle, et al. "Computationally efficient whole-genome regression for quantitative and binary traits." Nature genetics 53.7 (2021): 1097-1103.
KEY WORDS: whole genome regression
SAIGE-GENE / SAIGE-GENE+
FULLNAME: SAIGE-GENE / SAIGE-GENE+
SHORTNAME: SAIGE-GENE / SAIGE-GENE+
URL: https://github.com/weizhouUMICH/SAIGE
CITATION: Zhou, Wei, et al. "SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests." Nature Genetics (2022): 1-4.
CITATION: Zhou, Wei, et al. "Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts." Nature genetics 52.6 (2020): 634-639.
STAAR
FULLNAME: variant-set test for association using annotation information
SHORTNAME: STAAR
URL: https://github.com/xihaoli/STAAR
DESCRIPTION: STAAR is an R package for performing variant-Set Test for Association using Annotation infoRmation (STAAR) procedure in whole-genome sequencing (WGS) studies. STAAR is a general framework that incorporates both qualitative functional categories and quantitative complementary functional annotations using an omnibus multi-dimensional weighting scheme. STAAR accounts for population structure and relatedness, and is scalable for analyzing large WGS studies of continuous and dichotomous traits.
CITATION: Li, Xihao, et al. "Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale." Nature genetics 52.9 (2020): 969-983.
KEY WORDS: functional annotations
SKAT
FULLNAME: sequence kernel association test
SHORTNAME: SKAT
URL: https://www.hsph.harvard.edu/skat/
DESCRIPTION: SKAT is a SNP-set (e.g., a gene or a region) level test for association between a set of rare (or common) variants and dichotomous or quantitative phenotypes, SKAT aggregates individual score test statistics of SNPs in a SNP set and efficiently computes SNP-set level p-values, e.g. a gene or a region level p-value, while adjusting for covariates, such as principal components to account for population stratification. SKAT also allows for power/sample size calculations for designing for sequence association studies.
CITATION:Wu, Michael C., et al. "Rare-variant association testing for sequencing data with the sequence kernel association test." The American Journal of Human Genetics 89.1 (2011): 82-93.
KEY WORDS:
SKAT-O
FULLNAME: sequence kernel association test - optimal test
SHORTNAME: SKAT-O
URL:https://www.hsph.harvard.edu/skat/
DESCRIPTION: estimating the correlation parameter in the kernel matrix to maximize the power, which corresponds to the estimated weight in the linear combination of the burden test and SKAT test statistics that maximizes power.
CITATION: Lee, Seunggeun, Michael C. Wu, and Xihong Lin. "Optimal tests for rare variant effects in sequencing association studies." Biostatistics 13.4 (2012): 762-775.
GWAS using family history
GWAX
FULLNAME: genome-wide association by proxy
SHORTNAME: GWAX
DESCRIPTION: In randomly ascertained cohorts, replacing cases with their first-degree relatives enables studies of diseases that are absent (or nearly absent) in the cohort.
CITATION: Liu, J. Z., Erlich, Y., & Pickrell, J. K. (2017). Case–control association mapping by proxy using family history of disease. Nature genetics, 49(3), 325-331.
LT-FH
FULLNAME: liability threshold model, conditional on case–control status and family history
DESCRIPTION: an association method based on posterior mean genetic liabilities under a liability threshold model, conditional on case-control status and family history (LT-FH)
CITATION:Liu, J. Z., Erlich, Y., & Pickrell, J. K. (2017). Case–control association mapping by proxy using family history of disease. Nature genetics, 49(3), 325-331.
SiblingGWAS
FULL NAME: Within-sibship genome-wide association analyses
SHORT NAME: SiblingGWAS
URL: https://github.com/LaurenceHowe/SiblingGWAS
YEAR: 2022
DESCRIPTION: Scripts for running GWAS using siblings to estimate Within-Family (WF) and Between-Family (BF) effects of genetic variants on continuous traits. Allows the inclusion of more than two siblings from one family.
CITATION: Howe, L. J., Nivard, M. G., Morris, T. T., Hansen, A. F., Rasheed, H., Cho, Y., ... & Davies, N. M. (2022). Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nature genetics, 54(5), 581-592.
snipar
FULLNAME: single nucleotide imputation of parents
SHORTNAME: snipar
DESCRIPTION: snipar (single nucleotide imputation of parents) is a Python package for inferring identity-by-descent (IBD) segments shared between siblings, imputing missing parental genotypes, and for performing family based genome-wide association and polygenic score analyses using observed and/or imputed parental genotypes.
YEAR: 2022
URL: https://github.com/AlexTISYoung/snipar
CITATION: Young, A. I., Nehzati, S. M., Benonisdottir, S., Okbay, A., Jayashankar, H., Lee, C., ... & Kong, A. (2022). Mendelian imputation of parental genotypes improves estimates of direct genetic effects. Nature genetics, 54(6), 897-905.
CITATION: Guan, J., Nehzati, S. M., Benjamin, D. J., & Young, A. I. (2022). Novel estimators for family-based genome-wide association studies increase power and robustness. bioRxiv, 2022-10.
Case-case GWAS
CC-GWAS
FULLNAME: case–case genome-wide association study
SHORTNAME: CC-GWAS
URL:https://github.com/wouterpeyrot/CCGWAS
DESCRIPTION: The CCGWAS R package provides a tool for case-case association testing of two different disorders based on their respective case-control GWAS results
CITATION: Peyrot, W. J., & Price, A. L. (2021). Identifying loci with different allele frequencies among cases of eight psychiatric disorders using CC-GWAS. Nature genetics, 53(4), 445-454.
GWAS of longitudinal trajectories
TrajGWAS
FULLNAME: GWAS of longitudinal trajectories
SHORTNAME: TrajGWAS
DESCRIPTION: TrajGWAS.jl is a Julia package for performing genome-wide association studies (GWAS) for continuous longitudinal phenotypes using a modified linear mixed effects model. It builds upon the within-subject variance estimation by robust regression (WiSER) method and can be used to identify variants associated with changes in the mean and within-subject variability of the longitduinal trait.
YEAR: 2022
URL: https://github.com/OpenMendel/TrajGWAS.jl
CITATION: Ko, S., German, C. A., Jensen, A., Shen, J., Wang, A., Mehrotra, D. V., ... & Zhou, J. J. (2022). GWAS of longitudinal trajectories at biobank scale. The American Journal of Human Genetics, 109(3), 433-445.
Keywords: biomarker trajectories, mean, within-subject (WS) variability, linear mixed effect model, within-subject variance estimation by robust regression (WiSER) method
PGS-adjusted GWAS
PGS-adjusted RVATs
FULLNAME: PGS-adjusted rare variant association tests
SHORTNAME: PGS-adjusted RVATs
DESCRIPTION: adjusting for common variant polygenic scores improves yield in gene-based rare variant association tests
YEAR: 2023
CITATION: Jurgens, S. J., Pirruccello, J. P., Choi, S. H., Morrill, V. N., Chaffin, M., Lubitz, S. A., ... & Ellinor, P. T. (2023). Adjusting for common variant polygenic scores improves yield in rare variant association analyses. Nature Genetics, 55(4), 544-548.
Keywords: PGS, Rare variants
PGS-adjusted GWAS
FULLNAME: PGS-adjusted GWAS
SHORTNAME: PGS-adjusted GWAS
DESCRIPTION: adjustment of GWAS analyses for polygenic scores (PGSs) increases the statistical power for discovery across all ancestries
YEAR: 2023
CITATION: Campos, A. I., Namba, S., Lin, S. C., Nam, K., Sidorenko, J., Wang, H., ... & Yengo, L. (2023). Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores. Nature Genetics, 1-8.