Research

Genetic Analysis Center

The Genetic Analysis Center (GAC) develops and applies statistical methods to genetic data with the aim of discovering how genetic variation contributes to human disease and well-being. We also provide scientific and administrative coordination to ensure the success of large-scale genomics research consortia and other programs.

Spotlight

We are the Data Coordinating Center for the NHGRI GREGoR (Genomics Research to Elucidate the Genetics of Rare diseases) Consortium and the Coordinating Center for the NIH (Polygenic Risk Methods in Diverse Populations) Consortium.

Overview

About us

The GAC contributes to major genomic research initiatives, offering data analysis support, software and methods development, statistical consulting, study design, data coordination, and ongoing data quality assurance through the duration of a project. Research efforts are collaborative with University of Washington (UW) faculty and students who possess advanced expertise and a dedicated interest in biostatistics, statistical genetics, and public health genetics. Other collaborators come from other academic institutions, government, nonprofits, and the private sector.

Expertise

What we do

  • Data coordination
  • Data cleaning (Quality Assurance/Quality Control) and harmonization
  • Data analysis support and training
  • Statistical software and methods development
  • Consulting
  • Research study design and planning
  • Population and quantitative genetics methods and analysis
  • Forensic genetics methods and analysis
Projects

Current

Past

Publications

Selected Publications

TOPMed

Stilp AM, Emery LS, Broome JG, et al. A System for Phenotype Harmonization in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program. Am J Epidemiol. 2021 Apr 16; PMID: 33861317

Hu Y, Stilp AM, McHugh CP, et al. Whole-genome sequencing association analysis of quantitative red blood cell phenotypes: The NHLBI TOPMed program. Am J Hum Genet. 2021 Apr 16; PMID: 33887194

Taliun D, Harris DN, Kessler MD, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 2021 Feb;590(7845):290–299. PMCID: PMC7875770

HCHS/SOL

Conomos MP, Laurie CA, Stilp AM, et al. Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet. 2016 Jan 7;98(1):165-84. doi: 10.1016/j.ajhg.2015.12.001. PubMed PMID: 26748518; PubMed Central PMCID: PMC4716704.

Browning SR, Grinde K, Plantinga A, Gogarten SM, Stilp AM, Kaplan RC, Avilés-Santa ML, Browning BL, Laurie CC. Local Ancestry Inference in a Large US-Based Hispanic/Latino Study: Hispanic Community Health Study/Study of Latinos (HCHS/SOL). G3 (Bethesda). 2016 Jun 1;6(6):1525-34. doi: 10.1534/g3.116.028779. PMID: 27172203; PMCID: PMC4889649.

Nelson SC, Stilp AM, Papanicolaou GJ, et al. Improved Imputation Accuracy in Hispanic/Latino Populations with Larger and More Diverse Reference Panels: Applications. 2016 Hum Mol Genet. PMCID: PMC5179925

GENEVA

Laurie CC, Doheny KF, Mirel DB, et al; GENEVA Investigators. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet Epidemiol. 2010 Sep;34(6):591-602. doi: 10.1002/gepi.20516. PubMed PMID: 20718045; PubMed Central PMCID: PMC3061487.

Laurie CC, Laurie CA, Rice K, et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat Genet. 2012 May 6;44(6):642-50. doi: 10.1038/ng.2271. PubMed PMID: 22561516; PubMed Central PMCID: PMC3366033.

Laurie CC, Laurie CA, Smoley SA, et al. Acquired chromosomal anomalies in chronic lymphocytic leukemia patients compared with more than 50,000 quasi-normal participants. Cancer Genet. 2014 Jan-Feb;207(1-2):19-30. doi: 10.1016/j.cancergen.2014.01.004. PubMed PMID: 24613276; PubMed Central PMCID: PMC4074414.

Statistical genetics and methods

Browning BL, Browning SR. 2016. Genotype imputation with millions of reference samples. American Journal of Human Genetics, 98: 116-126. PubMed Central PMCID: PMC4716681

Browning SR, Browning BL. 2015. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. American Journal of Human Genetics, 97:404-418. PubMed Central PMCID: PMC4564943.

Buckleton JS, Curran JM, Goudet J et al, 2016. Population-specific Fst values: A worldwide survey. Forensic Science International: Genetics 23:91-100.

Conomos MP, Reiner AP, Weir BS et al. 2016. Model-free estimation of recent genetic relatedness. American Journal of Human Genetics 98:127--148. PubMed Central PMCID: PMC4716688

Graffelman J, Weir BS. 2016. Testing for Hardy-Weinberg equilibrium at bi-allelic genetic markers on the X chromosome. 2016. Heredity 116:558-568

Zheng X, Weir BS. 2015. Eigenanalysis of SNP data with an interpretation of identity by descent. Theoretical Population Biology 107:65-76. PubMed Central PMCID: PMC4716003

Zhu ZH, Bakshi A, Vinkhuyzen AE et al. 2015. Dominance genetic variation contributes little to the missing heritability for human complex traits. American  Journal of Human Genetics 96:377-385. PubMed Central PMCID: PMC4375616

Software

Overview

We develop open source software for analyzing genetic data.

UW GAC GitHub Repository

Central collection of publicly available source code across various GAC projects.

Docker images

Docker images containing GAC software.

R Packages

gdsfmt

The package gdsfmt provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms and include hierarchical structure to store multiple scalable array-oriented data sets with metadata information.

Zheng X, Levine D, Shen J, et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012 Dec 15;28(24):3326-8. doi: 10.1093/bioinformatics/bts606. Epub 2012 Oct 11. PMID: 23060615; PMCID: PMC3519454.

GENESIS

An R package for single- and aggregate-variant genetic association testing using computationally efficient mixed models in samples with complex population and pedigree structure. Also provides tools for de-convoluting population and pedigree structure in genetic data.

Gogarten SM, Sofer T, Chen H, et al. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019 Dec 15;35(24):5346-5348. doi: 10.1093/bioinformatics/btz567. PMID: 31329242; PMCID: PMC7904076.

GWASTools

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Gogarten SM, Bhangale T, Conomos MP, et al. GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies. Bioinformatics. 2012 Dec 1;28(24):3329-31.

SeqArray

Big data management of whole-genome sequence variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.

Zheng X, Gogarten SM, Lawrence M, et al. SeqArray-a storage-efficient high-performance data format for WGS variant calls. Bioinformatics. 2017 Aug 1;33(15):2251-2257. doi: 10.1093/bioinformatics/btx145. PMID: 28334390; PMCID: PMC5860110.

SeqVarTools

An interface to the fast-access storage format for VCF data provided in SeqArray, with tools for common operations and analysis.

Gogarten SM, Zheng X, Stilp A (2021). SeqVarTools: Tools for variant data. R package version 1.30.0, https://github.com/smgogarten/SeqVarTools.

SNPRelate

A parallel computing toolset for relatedness and principal component analysis of SNP data.

Zheng X, Levine D, Shen J, et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012 Dec 15;28(24):3326-8. doi: 10.1093/bioinformatics/bts606. Epub 2012 Oct 11. PMID: 23060615; PMCID: PMC3519454.

TOPMed WGS analysis pipeline

Analysis pipeline for TOPMed whole genome sequencing project

WGSAParsr

An R package the TOPMed DCC uses to parse genetic variant annotation files produced by the WGSA annotation tool.

Tools on BioData Catalyst powered by Seven Bridges

Ancestry and Relatedness workflows

Workflows for genetic ancestry and relatedness inference, implementing methods including LD-pruning, PC-AiR, PC-Relate, KING-robust, and KING-ibdseg.

Annotation Explorer

Interactive application to explore, query, and study characteristics of an inventory of annotations for all possible SNVs, indels in dbSNP and variants called in TOPMed studies. This application can be used pre-GWAS to generate annotation-informed variant filters and groups for rare variant association testing, and post-GWAS for fine-mapping and variant prioritization

Data Management tools

Tools to manipulate and format data files, such as, tool for merging multiple VCF/BCF files and filtering monomorphic variants, and tool for converting variant calls from VCF into GDS format

GENESIS Association Testing workflows

Workflows for genetic association testing using the GENESIS R package. Available workflows include: fitting a null model, single variant association testing, aggregate variant association testing (including burden, SKAT, fastSKAT, and SMMAT methods), sliding window association testing, and tools for making Manhattan, QQ, and LocusZoom plots.

Quality Control tools

Workflows for variant and sample QC using WGS data. Available workflows include: Pedigree check, Heterozygosity by sample and XY chromosome depth