Genetic Analysis Center

The Genetic Analysis Center (GAC) develops and applies statistical methods to genetic data with the aim of discovering how genetic variation contributes to human disease and well-being. We also provide scientific and administrative coordination to ensure the success of large-scale genomics research consortia and other programs.

Spotlight

We are the Data Coordinating Center for the NHGRI Genomics Research to Elucidate the Genetics of Rare diseases (GREGoR) Consortium and the Coordinating Center for both the NIH Polygenic Risk Methods Development (PRIMED) Consortium and the Alzheimer’s Disease Sequencing Project (ADSP).

Overview

About us

Established by Professor Emeritus Dr. Bruce Weir in 2007, the GAC contributes to major genomic research initiatives, offering data analysis support, software and methods development, statistical consulting, study design, data coordination, project management, and ongoing data quality assurance through the duration of a project. Research efforts are collaborative with University of Washington (UW) faculty and students who possess advanced expertise and a dedicated interest in biostatistics, statistical genetics, and public health genetics. Other collaborators come from other UW departments and schools, academic institutions, government, nonprofits, and the private sector.

People

Members

Michael Bowers, Project Administrator

Brian Browning, Faculty

Sharon Browning, Faculty

Sarah Catherine Nelson, Research Scientist, Project Manager

Sarah Conner, Project Administrator

Matt Conomos, Research Scientist, Project Manager

Stephanie Gogarten, Research Scientist

Ben Heavner, Research Scientist, Project Manager

Kathleen Kerr, Faculty

Cathy Laurie, Senior Consultant

David Levine, Senior Consultant

Susanne May, Faculty, PI

Sheryl Payne, Project Manager, Associate Director

Guanghao Qi, Faculty

Ken Rice, Faculty, PI

Ali Shojaie, Faculty, PI

Adrienne Stilp, Research Scientist

Timothy Thornton, Faculty Consultant

Catherine Tong, Program Analyst

Bruce Weir, Emeritus Faculty

Marsha Wheeler, Research Scientist

Ellen Wijsman, Emeritus Faculty

Quenna Wong, Research Scientist

Training

Workshops

The GAC provides hands-on training to focused groups and the broad scientific community on topics related to quantitative genetics. Some of the workshops we have taught are:

Modules at the Georgia Tech Bruce Weir Summer Institute in Statistical Genertics (SISG)
- WGS Data Analysis (2025; 2024)
- Bayesian Statistics (2025; 2024)
- Pathways and Network Analysis (2025; 2024)
Modules at the UW Summer Institute for Statistical Genetics (SISG)
- Computational Pipeline for WGS Data Module at the UW Summer Institute for Statistical Genetics (SISG) (2021; 2020; 2019; 2018)
- Bayesian Statistics
- Introduction to R

Projects

Current

Alzheimer's Disease Sequencing Project (ADSP) Coordinating Center
Genomics Research to Elucidate the Genetics of Rare diseases (GREGoR) Consortium Data Coordinating Center
Polygenic Risk Methods Development (PRIMED) Consortium Coordinating Center

Past

Development of scalable and user friendly engine to support genotype-phenotype association testing on BioData Catalyst powered by Seven Bridges
Trans-Omics for Precision Medicine (TOPMed) Data Coordinating Center
Hispanic Community Health Study/Study Of Latinos (HCHS/SOL) Genetic Analysis Center
Center for Inherited Disease Research (CIDR)
Genomics and Randomized Trial Networks (GARNET) Coordinating Center
Gene Environment Association Studies (GENEVA) Coordinating Center
Population Genetic Issues for forensic DNA Profiles (NIJ)
Theoretical Population Genetics
CODIS Support (FBI Project)
Statistical Evaluation of Forensic Sequencing Profiles (NIJ)
Predoctoral Training in Statistical Genetics (NIH Training Grant)

Expertise

Data coordination, management, and sharing
Data cleaning (Quality Assurance/Quality Control) and harmonization
Data analysis support and training
Statistical methods development
Scientific software development
Cloud computing
Research study design and planning
Project management
Program operations, logistics, and administration
Consortium governance and policy
Ethical, Legal, and Social Implications (ELSI)
Statistical genetics
Quantitative genetics
Population genetics

Software

Overview

We develop open source software for analyzing genetic data.

UW GAC GitHub Repository

Central collection of publicly available source code across various GAC projects.

Docker images

Docker images containing GAC software.

R Packages

gdsfmt

The package gdsfmt provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms and include hierarchical structure to store multiple scalable array-oriented data sets with metadata information.

Zheng, X., Levine, D., Shen, J. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012). PMID: 23060615

GENESIS

An R package for single- and aggregate-variant genetic association testing using computationally efficient mixed models in samples with complex population and pedigree structure. Also provides tools for de-convoluting population and pedigree structure in genetic data.

Gogarten, S. M., Sofer, T., Chen, H. et al. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics 35, 5346–5348 (2019). PMID: 31329242

GWASTools

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Gogarten, S. M., Bhangale, T., Conomos, M. P. et al. GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies. Bioinformatics 28, 3329–3331 (2012). PMID: 23052040

SeqArray

Big data management of whole-genome sequence variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.

Zheng, X., Gogarten, S. M., Lawrence, M. et al. SeqArray-a storage-efficient high-performance data format for WGS variant calls. Bioinformatics 33, 2251–2257 (2017). PMID: 28334390

SeqVarTools

An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.

Gogarten SM, Zheng X, Stilp A (2021). SeqVarTools: Tools for variant data. R package version 1.30.0, https://github.com/smgogarten/SeqVarTools.

SNPRelate

A parallel computing toolset for relatedness and principal component analysis of SNP data.

Zheng, X., Levine, D., Shen, J. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012). PMID: 23060615

TOPMed WGS analysis pipeline

Analysis pipeline for TOPMed whole genome sequencing project.

WGSAParsr

An R package the TOPMed DCC developed and uses to parse genetic variant annotation files produced by the WGSA annotation tool.

Workflows on Dockstore

Data Validation (PRIMED; GREGoR)

Workflows for validating that data and metadata uploaded to AnVIL workspaces conforms to a provided data model.

Data Import

Workflows for importing data (e.g. from dbGaP) into AnVIL workspaces

Genetic Ancestry Inference

Workflows for running genetic ancestry inference analyses (e.g. PCA, ADMIXTURE) and projecting samples onto genetic ancestry models.

Genotype Data Simulation

Workflows for simulating realistic genotype data from admixed populations

Hou K, Gogarten S, Kim J, et al. Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations. Bioinformatics. 2024 Mar 29;40(4):btae148. doi: 10.1093/bioinformatics/btae148. PMID: 38490256; PMCID: PMC10980565.

Genotype File Conversion

Workflows for converting genotype data files among common data formats (e.g. VCF, PLINK .bed and .pgen), merging files, and extracting information

Genotype Imputation

Workflows for submitting and retrieving genotype imputation jobs from the TOPMed imputation server

GENESIS GWAS

Workflow for running a GWAS using the GENESIS R package

PRS

Workflows for calculating PRS scores and developing PRS models

Web applications

AnVIL Consortium Manager (ACM)

A highly customizable, reusable Django app to manage workspace access on the NHGRI AnVIL cloud platform. We have used the web app in two consortia (primed-django and gregor-django), each with their own custom extensions of ACM.

GENESIS Model Explorer

An R-Shiny web app that allows users to interactively explore a GENESIS null model. This app is currently deployed on the NHLBI BioData Catalyst cloud platform.

Tools on BioData Catalyst powered by Seven Bridges

Ancestry and Relatedness workflows

Workflows for genetic ancestry and relatedness inference, implementing methods including LD-pruning, PC-AiR, PC-Relate, KING-robust, and KING-ibdseg.

Annotation Explorer

Interactive application to explore, query, and study characteristics of an inventory of annotations for all possible SNVs, indels in dbSNP and variants called in TOPMed studies. This application can be used pre-GWAS to generate annotation-informed variant filters and groups for rare variant association testing, and post-GWAS for fine-mapping and variant prioritization.

Data Management tools

Tools to manipulate and format data files, such as, tool for merging multiple VCF/BCF files and filtering monomorphic variants, and tool for converting variant calls from VCF into GDS format.

GENESIS Association Testing workflows

Workflows for genetic association testing using the GENESIS R package. Available workflows include: fitting a null model, single variant association testing, aggregate variant association testing (including burden, SKAT, fastSKAT, and SMMAT methods), sliding window association testing, and tools for making Manhattan, QQ, and LocusZoom plots.

Quality Control tools

Workflows for variant and sample QC using WGS data. Available workflows include: Pedigree check, Heterozygosity by sample and XY chromosome depth.

Publications

Selected Publications

PRIMED

Kullo, I. J., Conomos, M.P., Nelson, S.C. et al. The PRIMED Consortium: Reducing disparities in polygenic risk assessment. Am J Hum Genet 111, 2594–2606 (2024). PMID: 39561770

Smith, J. L., Wong, Q., Hornsby, W. et al. Data Sharing in the PRIMED Consortium: Design, implementation, and recommendations for future policymaking. arxiv [preprint] (2025) doi:10.48550/arXiv.2502.09351.

GREGoR

Wojcik, M. H., Reuter, C.M., Marwaha, S. et al. Beyond the exome: What’s next in diagnostic testing for Mendelian conditions. Am J Hum Genet 110, 1229–1248 (2023). PMID: 37541186

Dawood, M. Heavner, B., Wheeler, M.W. et al. GREGoR: Accelerating Genomics for Rare Diseases. arxiv [preprint] (2024) doi: 10.48550/arXiv.2412.14338.

TOPMed

de Vries PS, Conomos MP, Singh K, et al. Whole-genome sequencing uncovers two loci for coronary artery calcification and identifies ARSE as a regulator of vascular calcification. Nat Cardiovasc Res. 2023 Dec;2(12):1159-1172. PubMed Central PMCID: PMC11138106.

Taub MA, Conomos MP, Keener R, et al. Genetic determinants of telomere length from 109,122 ancestrally diverse whole-genome sequences in TOPMed. Cell Genom. 2022 Jan 12;2(1) PubMed Central PMCID: PMC9075703.

Khan, A. T., Gogarten, S.M., McHugh, C.P et al. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: Experiences from the NHLBI TOPMed program. Cell Genom 2, 100155 (2022). PMID: 36119389

Nelson, S. C., Gogarten, S.M., Fullerton, S.M. et al. Social and scientific motivations to move beyond groups in allele frequencies: The TOPMed experience. Am J Hum Genet 109, 1582–1590 (2022). PMID: 36055210

Stilp, A. M., Emery, L. S., Broome, J. G. et al. A System for Phenotype Harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) Program. Am J Epidemiol 190, 1977–1992 (2021). PMID: 33861317. See associated harmonization documentation at https://github.com/UW-GAC/topmed-dcc-harmonized-phenotypes.

Hu, Y., Stilp, A. M., McHugh, C. P. et al. Whole-genome sequencing association analysis of quantitative red blood cell phenotypes: The NHLBI TOPMed program. Am J Hum Genet 108, 874–893 (2021). PMID: 33887194

Taliun, D., Harris, D. N., Kessler, M. D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021). PMID: 33568819

Statistical genetics and methods

Sofer, T., Zheng, X., Laurie, C. A. et al. Variant-specific inflation factors for assessing population stratification at the phenotypic variance level. Nat Commun 12, 3506 (2021). PMID: 34108454

Sofer, T., Zheng, X., Gogarten, S. M. et al. A fully adjusted two-stage procedure for rank-normalization in genetic association studies. Genet Epidemiol 43, 263–275 (2019). PMID: 30653739

Chen, H., Wang, C., Conomos, M. P. et al. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. Am. J. Hum. Genet. 98, 653–666 (2016). PMID: 27018471

Conomos, M. P., Reiner, A. P., Weir, B. S. et al. Model-free Estimation of Recent Genetic Relatedness. Am. J. Hum. Genet. 98, 127–148 (2016). PMID: 26748516

Browning, B. L. & Browning, S. R. Genotype Imputation with Millions of Reference Samples. Am J Hum Genet 98, 116–126 (2016). PMID: 26748515

Buckleton, J., Curran, J., Goudet, J. et al. Population-specific FST values for forensic STR markers: A worldwide survey. Forensic Sci Int Genet 23, 91–100 (2016). PMID: 27082756

Graffelman, J. & Weir, B. S. Testing for Hardy-Weinberg equilibrium at biallelic genetic markers on the X chromosome. Heredity (Edinb) 116, 558–568 (2016). PMID: 27071844

Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39, 276–293 (2015). PMID: 25810074

Browning, S. R. & Browning, B. L. Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent. Am J Hum Genet 97, 404–418 (2015). PMID: 26299365

Zheng, X. & Weir, B. S. Eigenanalysis of SNP data with an identity by descent interpretation. Theor Popul Biol 107, 65–76 (2016). PMID: 26482676

Zhu, Z., Bakshi, A., Vinkhuyzen, A. A. E. et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. Am J Hum Genet 96, 377–385 (2015). PMID: 25683123

Thornton T, Conomos MP, Sverdlov S, et al. Estimating and adjusting for ancestry admixture in statistical methods for relatedness inference, heritability estimation, and association testing. BMC Proc. 2014;8(Suppl 1):S5. PubMed Central PMCID: PMC4143704.

Nelson, S. C., Doheny, K. F., Pugh, E. W. et al. Imputation-based genomic coverage assessments of current human genotyping arrays. G3 (Bethesda) 3, 1795–1807 (2013). DOI: 10.1101/150219

HCHS/SOL

Conomos, M. P., Laurie, C. A., Stilp, A. M. et al. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. Am. J. Hum. Genet. 98, 165–184 (2016). PMID: 26748518

Browning, S. R., Grinde, K., Plantinga, A. et al. Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL). G3 (Bethesda) 6, 1525–1534 (2016). PMID: 27172203

Nelson, S. C., Stilp, A. M., Papanicolaou, G. J. et al. Improved imputation accuracy in Hispanic/Latino populations with larger and more diverse reference panels: applications in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Hum. Mol. Genet. 25, 3245–3254 (2016). PMID: 27346520

GENEVA

Laurie, C. C., Laurie, C. A., Smoley, S. A. et al. Acquired chromosomal anomalies in chronic lymphocytic leukemia patients compared with more than 50,000 quasi-normal participants. Cancer Genet 207, 19–30 (2014). PMID: 24613276

Laurie, C. C., Laurie, C. A., Rice, K. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat Genet 44, 642–650 (2012). PMID: 22561516

Laurie, C. C., Doheny, K. F., Mirel, D. B. et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol. 34, 591–602 (2010). PMID: 20718045