2018 SISG Modules

Scholarship applications open January 5. Registration opens February 1.

Session 1: Monday, July 9, 8:30 a.m.-5 p.m.; Tuesday, July 10, 8:30 a.m.-5 p.m., and Wednesday, July 11, 8:30 a.m.-Noon

Module 1: Probability and Statistical Inference

Instructor(s): Hughes, JamesWillis, Amy

This module features in-class exercises and serves as a foundation for almost all of the later modules. It covers:

  • The laws of probability and the binomial, multinomial, and normal distributions.

  • Descriptive statistics and methods of inference including maximum likelihood, confidence intervals and simple Bayes methods.

  • Classical hypothesis testing topics, including type I and II errors, two-sample tests, chi-square tests and contingency table analysis, and exact and permutation tests

  • Resampling methods such as the bootstrap and jackknife are covered as well.

Also offered as part of the Summer Institute in Statistics and Modeling in Infectious Diseases (SISMID 2018).

Jim Hughes is Professor of Biostatistics at the University of Washington. He is interested in the application of statistical methods to problems in AIDS and other sexually transmitted diseases. He is particularly interested in cluster randomized trial designs and statistical methods for dealing with misclassified data. He is heavily involved in undergraduate teaching and graduate student advising, and he has won teaching awards. His recently published “Projected demographic composition of the United States population of people living with diagnosed HIV,” AIDS Care - Psychological and Socio-Medical aspects of AIDS/HIV 29:1543-1550, 2017.

Amy Willis is Assistant Professor of Biostatistics at the University of Washington. She is interested in biological communities, the relatedness of different taxa, the number and proportion of different species in an ecosystem, and causes of shifts in ecologies. She is especially interested in microbial communities, which are incredibly diverse, responsive, and critical to ecosystem function. Her most recent publication is “Improved detection of changes in species richness in high diversity microbial communities.” J Royal Statistical Society: Series C, 2017.

Module 2: Introduction to Genetics and Genomics

Instructor(s): Gibson, Greg; Queitsch, Christine

This module covers the theory and practice of modern genetics. It is designed to provide biologists with the foundations upon which statistical genetics is built, and/or an introduction to the concepts of classical and contemporary genetics for statisticians and informaticians.

The module starts with the key concepts of quantitative and Mendelian genetics and then illustrates how these have been reconciled with molecular biology. Three half-days are then spent on the basics of genome-wide association mapping as well as exome and whole genome sequencing; on evolutionary and population genetics particularly as they pertain to human biology; and on gene expression profiling and integrative genomics leading to systems biology, also touching on personalized medicine.

Greg Gibson is Professor and Director of the Center for Integrative Genomics at Georgia Tech. He conducts research on genomic approaches to human genetics; variability of gene expression; systems biology of disease; theory of canalization and biological robustness. He recently published “Constraints on eQTL fine mapping in the presence of multisite local regulation of gene expression.” G3-Genes,Genomes, Genetics7:2532-2544, 2017.

Christine Queitsch is Associate Professor of Genome Sciences at the University of Washington. Her research focuses on two related fields: the genetic architecture of complex traits and the role of gene regulation and protein folding in generating heritable phenotypic variation. She advances complex trait genetics by ascertaining uncharacterized sequence variation and by resolving the relative importance of additive variation and epistasis in complex traits. Her most recent publication is “Variability in a short tandem repeat mediates complex epistatic interactions in Arabidopsis thaliana.” Genetics 205:455, 2017.

Module 3: Introduction to R

Instructors: Rice, KenThornton, Timothy

This module introduces the R statistical environment, assuming no prior knowledge. It provides a foundation for the use of R for computation in later modules.

In addition to discussing basic data management tasks in R, such as reading in data and producing summaries through R scripts, we will also introduce R’s graphics functions, its powerful package system, and simple methods of looping.

Examples and exercises will use data drawn from biological and medical applications including infectious diseases and genetics. Hands-on use of R is a major component of this module; users require a laptop and will use it in all sessions.

Also offered as part of the Summer Institute in Statistics and Modeling in Infectious Diseases (SISMID 2018).

Ken Rice is Professor of Biostatistics at the University of Washington. His research focuses primarily on developing and applying statistical methods for complex disease epidemiology, notably cardiovascular disease. He leads the Analysis Committee for the CHARGE consortium, a large group of investigators studying genetic determinants of heart and aging outcomes. He recently published “Large-scale genome-wide analysis identifies genetic variants associated with cardiac structure and function.” J. Clinical Investigation 127:1798-1812.

Tim Thornton is Associate Professor of Biostatistics at the University of Washington. His research interest is in the area of statistical genetics, with an emphasis on statistical methodology for genetic association studies of complex traits in samples with relatedness, ancestry admixture, and/or population structure. He recently published “Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits.” PLoS One 12:e0188400, 2017.


Return to menu

 

Session 2 – Wednesday, July 11, 1:30-5 p.m.; Thursday, July 12, 8:30 a.m.-5 p.m., and Friday, July 13, 8:30 a.m.-5 p.m.

Module 4: Regression Methods: Concepts & Applications

Instructor(s): Hubbard, Rebecca; Inoue, Lurdes

This module is designed as a foundation for the quantitative genetics and QTL modules as well as for the association mapping modules. It assumes the material in Module 1 and will cover the basic commands in R. It focuses on linear regression and analysis of variance and includes an introduction to logistic regression. This module includes both lectures and interactive data analysis using R. Specific topics discussed are: simple linear regression; multiple linear regression; residual analysis; transformations; one-way ANOVA; two-way ANOVA; analysis of covariance; multiple comparisons; logistic regression.

Rebecca Hubbard is Associate Professor of Biostatistics at the University of Pennsylvania. Her research focuses on development and application of statistical methodology for studies that use observational data from clinical medical practice. Her work emphasizes development of statistical tools for biomedical inference and has been applied to studies of cancer screening, aging and dementia, pharmacoepidemiology, women’s health and behavioral health. Her most recent publication is “An electronic health record-based algorithm to ascertain the date of second breast cancer events,” Medical Care 55:E91-E87.

Mary Lou Thompson is Professor of Biostatistics at the University of Washington. She works on epi- demiology, longitudinal data, diagnostic methods, maternal and child health, occupational health, aging and cognition. Her most recent publication is “Cognitive trajectory changes over 20 years before dementia diagno- sis: A large cohort study.” J American Geriatrics Society 65:2627-2633.

Module 5: Population Genetic Data Analysis

Instructor(s): Goudet, JérômeWeir, Bruce

This module serves as a foundation for many of the later modules. It includes:

  • A unified treatment for the analysis of discrete genetic data, starting with estimates and sample variances of allele frequencies to illustrate genetic vs statistical sampling and Bayesian approaches.
  • A detailed look at Hardy-Weinberg and linkage disequilibrium, including the use of exact tests with mid-p-values and a new look at X-chromosome Hardy-Weinberg testing.
  • A new characterization of population structure with F-statistics, based on allelic matching within and between populations with individual inbreeding and relationship estimation as a special case.

Analyses illustrated with applications to forensic science and association mapping, with particular reference to rare variants. Concepts illustrated with R exercises.

Jérôme Goudet is Associate Professor of Ecology and Evolution at the University of Lausanne, Switzer- land. His research concerns an understanding of the interplay of population structure, trait architecture, and selection. For this, he uses different approaches, from theory and the development of statistical tools to field observations. He recently published “apex: phylogenetics with multiple genes.” Molecular Ecology Resources 17:19-26, 2017. He developed the R package hierfstat.

Bruce Weir is Professor of Biostatistics and Director of the Institute for Public Health Genetics at the University of Washington. He develops statistical methodology for genetic data with an emphasis on allelic dependencies, population structure, disease associations and relationships, and the use of genetic data for human identification. His most recent publication is “Detection and quantification of inbreeding depression for complex traits from SNP data.” Proc. Natl. Acad. Sci. USA 114:8602-8607, 2017.

Goudet and Weir recently jointly published “A unified characterization of population structure and re- latedness.” Genetics 206:2085-2103, 2017. They are currently working on the 3rd Edition of “Genetic Data Analysis” published by Sinauer.

Module 6:  TBD

 


Return to menu

 

Session 3 – Monday, July 16, 8:30 a.m.-5 p.m.; Tuesday, July 17, 8:30 a.m.-5 p.m., and Wednesday, July 18, 8:30 a.m.-Noon

Module 7: Association Mapping: GWAS and Sequencing Data

Instructor(s): Thornton, TimothyWu, Michael

This module will provide students with the basic tools to carry out genetic association analysis within the context of genome wide association studies (GWAS) and next-generation sequencing studies with considerable emphasis on hands-on learning.

Topics covered include: case-control (disease) association testing; quantitative trait analysis; quality control processes in GWAS; multi-locus testing using gene and pathway information; population structure and ancestry inference; association testing in the presence of population structure and/or relatedness; gene-environment and gene-gene interactions; basic rare variant association analysis in sequencing studies; advanced rare variant methods; sequence kernel association tests (SKAT); meta analysis; design considerations; and other emerging topics.

An important component of this module is in-class software exercises which will provide students with hands-on experience analyzing real data using state-of-the-art analysis tools for GWAS and next generation sequencing data.

Assumes basic familiarity with R. Other public domain software that will be used includes PLINK.

Timothy Thornton is Associate Professor of Biostatistics at the University of Washington. His research interest is in the area of statistical genetics, with an emphasis on statistical methodology for genetic association studies of complex traits in samples with relatedness, ancestry admixture, and/or population structure. He recently published “Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits.” PLoS One 12:e0188400, 2017.

Michael Wu is an Associate Member in the Biostatistics and Biomathematics Program at the Fred Hutchin- son Cancer Research Center. The major thrust of his research lies in the development and application of sta- tistical methods for translational science and particularly for analysis of high-dimensional genomic data within the broader context of clinical trials as well as population-based genetic, genomic, epigenetic, and microbiome studies. He recently published “A fast small-sample kernel independence test with application to microbiome association studies.” Biometrics 73:1453-1463,2017.

Thornton and Wu recently jointly published “Powerful genetic association analysis for common or rare variants with high-dimensional structured traits.” Genetics 206:1779-1790, 2017.

Module 8: Quantitative Genetics

Instructor(s): Rosa, GuilhermeWalsh, Bruce

This module assumes the material in Module 1: Probability and Statistical Inference and Module 4: Regression Methods: Concepts & Applications, and provides a foundation for many later modules.

Quantitative Genetics is the analysis of complex characters where both genetic and environment factors contribute to trait variation. Since this includes most traits of interest — disease susceptibility, crop yield, growth and reproduction in animals, human and animal behavior, and all gene expression data (transcriptome and proteome) — a working knowledge of quantitative genetics is critical in diverse fields from plant and animal breeding, human genetics, genomics, and behavior, to ecology and evolutionary biology.

The course will cover the basics of quantitative genetics including: genetic basis for complex traits, population genetic assumptions including detection of admixture, Fisher’s variance decomposition, covariance between relatives, calculation of the numerator relationship matrix based on IBD alleles and an arbitrary pedigree, the genomic relationship matrix based on AIS alleles, heritability in the broad and narrow sense, inbreeding and cross-breeding, and response to selection.

The module also includes an introduction to advanced topics such as: Mixed Models, Best Linear Unbiased Prediction (BLUP), Genomic selection (GBLUP), Genome Wide Association Analysis (GWAS), QTL mapping, detection of selection from genomic data, correlated characters; and the multivariate response to selection.

Guilherme Rosa is Professor of Animal Science at the University of Wisconsin, Madison. He develops research programs at the interface between statistical/theoretical and molecular genetics, focusing on appli- cations to animal models in domestic/managed and natural populations. He recently published “A predictive assessment of genetic correlations between traits in chickens using markers.” Genetics, Selection, Evolution 49:Article 16.

Bruce Walsh is Professor of Ecology and Evolutionary Biology at the University of Arizona. His interests are broadly in using mathematical models to explore the interface of genetics and evolution, with particular focus on two areas: the evolution of genome structure and the analysis of complex genetic characters (aka quantitative genetics). He is well-known as co-author of “Genetics and Analysis of Quantitative Characters.” 980 pp. Sinauer Associations.

Module 9: Advanced Population Genetics

Instructor(s): Hernandez, RyanO'Connor, Timothy

This  module considers the analyses now possible for whole-genome sequence data collected on large numbers of individuals. Specific topics include characterization of de novo mutations and the comparison of growth rates among populations. Also covered, how sequence data allow detailed examination of the signatures of natural selection and methods to compare selective constraints across populations and to seek evidence for recent, population-specific adaptation. The analysis of identity-by-descent segment sharing and random projection for IBD detection (RaPID) to infer demographic history will be covered, as will methods to reconstruct the genetic architecture of major human diseases.

Ryan Hernandez is Associate Professor of Bioengineering and Therapeutic Sciences at the University of California, San Francisco. His research focuses on computational genomics: characterizing patterns of genetic variation within and between populations using large-scale genome resequencing data; developing novel population genetic simulation techniques; and exploiting population genetic models of demographic history and natural selection to interrogate the genetic basis of disease. He recently published “Prominent features of the amino acid mutation landscape in cancer.” PLoS One 12:Article e0183273.

Tim O’Connor is Assistant Professor at the University of Maryland School of Medicine. His research explores the effects of evolution and population structure on the genomic architecture of disease and other phenotypes. He has a track record of developing new algorithms and statistics to interdisciplinary biological problems as well as the use of large multifaceted data sets, particularly the output of next-generation sequenc- ing. He is especially interested in the recent evolution of New World populations such as Hispanic Americans, African Americans, and the Old Order Amish. He recently published “Accurate and equitable medical genomic analysis requires an understanding of demography and its influence on sample size and ratio.” Genome Biology 18:Article 42.

Hernandez and O’Connor recently jointly published “Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.” Bioinformatics 33:1147-1153.

Session 4 – Wednesday, July 18, 1:30-5 p.m.; Thursday, July 19, 8:30 a.m.-5 p.m., and Friday, July 20, 8:30 a.m.-5 p.m.

Module 10: Genetic Epidemiology

Instructor(s): Edwards, KarenHutter, Carolyn

This module provides an overview of genetic epidemiology with a focus on design, analysis and interpretation in studies of complex disease. It is meant as an introduction to the field with a focus on surveying the various methods for discovering how genetic factors influence health and disease.

The module discusses classic genetic epidemiology methods and study designs, including twin studies, family studies, segregation analysis, linkage analysis and population-based association studies, as well as more contemporary topics including gene-environment interactions, rare variant analysis and precision medicine applications. These topics will be reinforced through in-class exercises along with critical reading and discussion of recent publications.

Karen Edwards is Professor and Chair of Epidemiology at the University of California, Irvine. Her primary research focus is in genetic epidemiology and the use of multivariate approaches to define phenotypes for complex diseases. Her research covers a broad range of conditions, including diabetes, metabolic syndrome, cardiovascular disease, melanoma and Parkinsons Disease. She has recently published “Large-scale exploratory genetic analysis of cognitive impairment in Parkinson’s disease.” Neurobiology of Aging 56:Article 211.e1, 2017.

Carolyn Hutter is Acting Director of the Division of Genome Sciences of the National Human Genome Research Institute. She is the NHGRI team lead for The Cancer Genome Atlas (TCGA), and a program director on the Clinical Sequencing Exploratory Research (CSER) project. She has recently published “Current challenges and new opportunities for gene-environment interaction studies of complex diseases.” Am Journal of Epidemiology 18:753-761/

Module 11: Mixed Models in Quantitative Genetics

Instructor(s): Walsh, BruceRosa, Guilherme

Assumes the material in Module 1: Probability and Statistical Inference and Module 4: Regression Methods: Concepts & Applications.

Provides a foundation for Module 14: Advanced Quantitative Genetics and Module 18: Statistical & Quantitative Genetics of Disease.

”Mixed models” refers to the analysis of linear models with arbitrary (co)variance structures among and within random effects and may be due to such factors as relationships or shared environments, cytoplasm, maternal effects and history. Mixed models are utilized in complex data analysis where the usual assumption(s) of independence and/or homogeneous variances fail.

Mixed models allow effects of nature to be separated from those of nurture and are emerging as the default method of analysis for human data. These issues are pervasive in human studies due to the lack of ability to randomize subjects to households, choice, and prior history.

In plant breeding, growth and yield data are correlated due to shared locations, but diminish by distance resulting in spatial correlations. In animal breeding, performance data are correlated because individuals may be related and may share common material environment as well as common pens or cages. Further, when individuals share a common space, they may experience indirect genetics effects (IGEs), which is an inherited effect in one individual experienced as an environmental effect in an associated individual. The evolution of cooperation and competition is based on IGEs, the estimation of which require mixed model analysis. Detection of cytoplasmic and epigenetic effects rely heavily on mixed model methods because of shared material or parental histories.

Topics to be discussed include a basic matrix algebra review, the general linear model, derivation of the mixed model, BLUP and REML estimation, estimation and design issues, and Bayesian formulations.

Applications to be discussed include estimation of breeding values and genetic variances in general pedigrees, association mapping, genomic selection, spatial correlations and corrections, maternal genetic effects, detecting selection from genomic data, admixture detection and correction, direct and indirect genetic effects, models of general group and kin selection, and genotype by environment interaction models.

Guilherme Rosa is Professor of Animal Science at the University of Wisconsin, Madison. He develops research programs at the interface between statistical/theoretical and molecular genetics, focusing on appli- cations to animal models in domestic/managed and natural populations. He recently published “A predictive assessment of genetic correlations between traits in chickens using markers.” Genetics, Selection, Evolution 49:Article 16.

Bruce Walsh is Professor, Ecology and Evolutionary Biology, University of Arizona. His interests are broadly in using mathematical models to explore the interface of genetics and evolution, with particular focus on two areas: the evolution of genome structure and the analysis of complex genetic characters (aka quantitative genetics). He is well-known as co-author of “Genetics and Analysis of Quantitative Characters.” 980 pp. Sinauer Associations.

Module 12: Computational Pipeline for WGS Data - New for 2017

Instructor(s): Gogarten, Stephanie; Rice, Ken

This new module is designed to follow on from Module 9, and it provides an alternative to module 11 targeted at complex human populations. It uses the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) as a case study, but the methods apply to all situations with multiple ethnic groups, admixture, and relatedness. The HCHS/SOL study has collected extensive clinical data and it has been extensively characterized with genome wide array data and more recently with whole genome sequencing data. The module covers the use of linear mixed models (LMM) and generalized estimating equations (GEE), with further discussion of pooled and stratified analysis, as well as admixture mapping. Use of the SUGGEN and GENESIS packages. After this module, we expect that students should be able to begin the analysis of HCHS/SOL genetic data and further will become familiar in general with the analysis of complex data which may encompass multiple ethnic groups, admixture, and relatedness.

Stephanie Gogarten is a Research Scientist in the Genetics Analysis Center at the University of Washing- ton. She develops computational pipelines for GWAS and WGS data. She was lead author on “GWASTools: an R/Bioconductor package for quality control and analysis of Genome-Wide Association Studies. Bioinfor- matics 28:3329-3331, 2012. She recently published “Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology.” nature Genetics 49:1560-1563, 2017.

Ken Rice is Professor of Biostatistics at the University of Washington. His research focuses primarily on developing and applying statistical methods for complex disease epidemiology, notably cardiovascular disease. He leads the Analysis Committee for the CHARGE consortium, a large group of investigators studying genetic determinants of heart and aging outcomes. He recently published “Large-scale genome-wide analysis identifies genetic variants associated with cardiac structure and function.” J. Clinical Investigation 127:1798-1812.


Return to menu

 

Session 5 – Monday, July 23, 8:30 a.m.-5 p.m.; Tuesday, July 24, 8:30 a.m.-5 p.m., and Wednesday, July 25, 8:30 a.m.-Noon

Module 13: Forensic Genetics

Instructor(s): Aalbers, Sanne; Weir, Bruce

This model covers the basic statistical and genetic methods leading to likelihood ratios (LRs) for the presentation of genetic evidence. It provides the background necessary for using analysis results from packages such as CODIS Popstats.

This module also:

  • Describes forensic STR markers: mutation process, genotyping technology, and electropherogram artifacts particularly new considerations for back, forward, double back stutter and exotics.
  • Reviews principles of population genetics, and measurement of relatedness.
  • Covers general principles of evidence evaluation using LRs, computing LRs for identification using presence/absence of autosomal STR genotypes and for mitochondrial and Y-chromosome markers.
  • Addresses the complications of mixture interpretation when the queried contributor is a relative of true contributor.
  • Describes the consequences of database searches.
  • Discusses briefly probabilistic interpretation of STR profiles.
  • Provides information about new molecular techniques for human identification.

Sanne Aalbers is a Research Scientist in the Genetic Analysis Center at the University of Washington. She received degrees in Applied Mathematics from Delft University of Technology and in Forensic Science from the University of Amsterdam. She has worked on financial crime analytics, for Deloitte Forensic. At the University of Washington she derived ROC curves for familial DNA database searching, and developed match probabilities for Y-STR profiles. Currently she is developing new stutter models for NGS forensic data.

Bruce Weir is Professor of Biostatistics and Director of the Institute of Public Health Genetics at the University of Washington. He is a member of the Biology/DNA Scientific Area Committee of the NIST/NIJ OSAC organization. He develops statistical analysis methods for the interpretation of forensic genetic profiles. He is co-author of “Interpreting DNA Evidence,” Sinuaer, 1998. His recent forensic publications include “Population-specific FST values: A worldwide survey.” Forensic Science International: Genetics 23:91-100, 2016.

Module 14: Advanced Quantitative Genetics

Instructor(s): Visscher, PeterRobinson, Matthew

This module focuses on the genetics and analysis of quantitative traits in human populations, with emphasis on estimation and prediction analysis using genetic markers. It is a good match with Module 18: Statistical & Quantitative Genetics of Disease that deals with similar topics but with a focus on disease (binary) outcomes.

Topics include: the resemblance between relatives; estimation of genetic variance associated with genome-wide identity by descent; GWAS for quantitative traits; the use of GWAS data to estimate and partition genetic variation; principles and pitfalls of prediction analyses using genetic markers.

A series of computer exercises will provide hands-on experience of implementing a variety of approaches using R, the Merlin suite of software, PLINK and GCTA (http://www.complextraitgenomics.com/software/gcta/).

Matthew Robinson is Professor of Computational Biology at the University of Lausanne. He develops and applies statistical methodology for large human phenotype-genotype datasets to address questions in population, quantitative, and medical genetics. His current work focuses on improved testing for sex-, age-, or environment-specific genetic effects, quantifying maternal genetic and social genetic effects, and investigating the role of genetic interactions between microbial and host genotype in shaping phenotype in the human population. He has recently published “Genotype-covariate interaction effects and the heritability of adult body mass index.” Nature Genetics 49:1174, 2017.

Peter Visscher is Professor and Chair of Quantitative Genetics at the University of Queensland. His research focuses on understanding individual differences between people in traits that are important for health outcomes and aging. A better understanding of the genes that underlie variation in risk to diseases may lead to better treatments. The traits he studies include gene expression, gene methylation, height and body- mass-index, psychiatric disease and neurogenetic conditions. He recently published “Concepts, estimation and interpretation of SNP-based heritability.” Nature Genetics 49:1304, 2017.

Module 15: Integrative  Genomics

Instructor(s): Gibson, Greg; Powell, Joseph

This module emphasizes how the theory and application of transcriptomics can be extended to include other types of omic analysis, and then integrated using statistical and machine learning tools.

It starts with the statistical basis of hypothesis testing covering the central role of normalization strategies and the specifics of differential expression analysis. Students will be given the opportunity to work examples using open source R code that is in standard use for RNASeq data.

The module then discusses options for downstream processing by clustering and module detection/comparison; extensions to methylation profiling, proteomics, and metabolomics; eQTL analysis including fine mapping of regulatory variation; and finally, integrative methodologies addressing the relationship between genomic, meta-genomic, and phenotypic variation.

This module deals primarily with upstream data processing methods that lead to the delineation of networks and pathways that are then considered in Module 19: Pathway & Network Analysis for Omics Data.

Greg Gibson is Professor and Director of the Center for Integrative Genomics at Georgia Tech. He conducts research on genomic approaches to human genetics; variability of gene expression; systems biology of disease; theory of canalization and biological robustness. He recently published “Constraints on eQTL fine mapping in the presence of multisite local regulation of gene expression.” G3-Genes,Genomes, Genetics7:2532-2544, 2017.

Joseph Powell is Head of Computational and Single Cell Genomics at the University of Queensland. He uses large-scale, high-throughput genomic data to investigate how DNA sequence variants contribute to human disease. His research engages sophisticated statistical methodology and the use of high performance computing resources for novel analyses and methods development. He has recently published “Genetic correlations reveal the shared genetic architecture of transcription in human peripheral blood.” Nature Communications 8:Article 483.

Module 16: Cancer Genetic Counseling & Interpretation

Instructor(s): Bennett, RobinAmendola, Laura

This new module anticipates the new training program in genetic counseling at the University of Washington.

Genetic testing is on the rise. Women are routinely offered genetic testing during pregnancy. And more people are offered genetic screening to identify risks for common disorders like cancer and heart disease, in addition to genetic disorders like cystic fibrosis and dementia.

This means that more people are facing tough choices and questions, such as: Is this the right test for me? What will I find out about my health or the health of my children? Will my insurer discriminate against me?

Genetic counselors can help people find answers in order to make informed decisions. There is a substantial unmet need for genetic counselors and this module, taught by leaders of the UW Genetics Clinic, surveys the methodology used in the field and provides illustrative case example.

Robin Bennett is Clinical Professor of Medicine and Senior Genetic Counselor and Clinic Manager for the Medical Genetics Clinics at the University of Washington. She is a Past President of the US National Society of Genetic Counselors, and is author of “The Practical Guide to the Genetic Family History.” She recently published “Medical genetics and genomics education: how do we define success? Where do we focus our resources?” Genetics in Medicine 19:751-753, 2017.

Laura Amendola is a Genetic Counselor and is the manager of the NHGRI-funded UW New Exome Technology in (NEXT) medicine study at the University of Washington. She recently published “Genome sequencing and carrier testing: decisions on categorization and whether to disclose results of carrier testing.” Genetics in Medicine 19:803-808


Return to menu

Session 6 - Wednesday, July 25, 1:30-5 p.m.; Thursday, July 26, 8:30 a.m.-5 p.m., and Friday, July 27, 8:30 a.m.-5 p.m.

Module 17: MCMC for Genetics

Instructor(s): Anderson, EricRobinson, Matthew; Stephens, Matthew

This module examines the use of Bayesian Statistics and Markov chain Monte Carlo methods in modern analyses of genetic data. It assumes a solid foundation in basic statistics and the concept of likelihood. Some population genetics and a basic familiarity with the R statistical package, or other computing language, will be helpful.

The first day includes an introduction to Bayesian statistics, Monte Carlo, and MCMC. Mathematical concepts covered include expectation, laws of large numbers, and ergodic and time-reversible Markov chains. Algorithms include the Metropolis-Hastings algorithm and Gibbs sampling. Some mathematical detail is given; however, there is considerable emphasis on concepts and practical issues arising in applications.

Mathematical ideas are illustrated with simple examples and reinforced with a computer practical using the R statistical language. With that background, two applications of MCMC are investigated in detail: inference of population structure (using the program STRUCTURE) and haplotype inference (using the program PHASE). Computer exercises using both programs are included.

Further topics include the use of MCMC in model evaluation and model checking, strategies for assessing MCMC convergence and diagnosing MCMC mixing problems, importance sampling, and Metropolis-coupled MCMC.

Eric Anderson is Research Molecular Geneticist at the National Oceanic and Atmospheric Administration. He is interested in statistical and computational methods for inference from genetic data, with applications to management and conservation of fish species. His current work is directed toward the use of single nucleotide polymorphisms (SNPs) for fisheries management, large-scale parentage inference and for genetic stock identi- fication in mixed-stock fisheries. He has recently published “Genetic and individual assignment of tetraploid green sturgeon with SNP assay data.” Conservation Genetics 18:1119-1130.

Matthew Robinson is Professor of Computational Biology at the University of Lausanne. He develops and applies statistical methodology for large human phenotype-genotype datasets to address questions in population, quantitative, and medical genetics. His current work focuses on improved testing for sex-, age-, or environment-specific genetic effects, quantifying maternal genetic and social genetic effects, and investigating the role of genetic interactions between microbial and host genotype in shaping phenotype in the human populations. He recently published “Genotype-covariate interaction effects and the heritability of adult body mass index.” Nature Genetics 49:1174, 2017.

Matthew Stephens  is Professor of Statistics and Human Genetics at the University of Chicago. He was a developer of STRUCTURE, a widely used computer program for determining population structure and estimating individual admixture. He also was a developer of the influential Li and Stephens model as an efficient model for linkage disequilibrium. His recent publications include “Bayesian large-scale multiple regression with summary statistics from genome-wide association studies.” Annals of Applied Statistics 11:1561-1592.

Module 18: Statistical & Quantitative Genetics of Disease

Instructor(s): Witte, JohnWray, Naomi

This module builds on the advanced quantitative genetics in Module 14: Advanced Quantitative Genetics, but now focusing on the analysis of genetic data for qualitative phenotypes, such as disease status from case-control or cohort studies, and interpretation of the ensuing results particularly with respect to risk prediction.

The module considers, in detail, the statistical genetics of binary disease with emphasis on the equivalences and relationships between different models. It contrasts and synthesizes the traditional viewpoints of quantitative geneticists and epidemiologists. The module demonstrates the caution needed in interpreting “precision medicine” risk predictors for common complex diseases.

Topics will include: risk models on different scales including the observed (or disease) scale and the liability threshold scale; estimation of heritability from familial risk ratios; estimation of the contribution of individual and multiple risk loci to disease; estimation of variance attributable to genome-wide SNPs individually and together; approaches for the analysis of rare genetic variants; polygenic modeling; risk profile scoring; power; GxE and pleiotropy.

Participants should have basic R programming skills, matrix algebra, statistical methods and analysis of GWAS data.

John Witte is Professor of Epidemiology and Biostatistics at the University of California at San Francisco. His research program encompasses a synthesis of methodological and applied genetic epidemiology, with the overall aim of deciphering the mechanisms underlying complex diseases and traits. His methods work is focused on the design and statistical analysis of next-generation sequencing and genetic association studies, and is applied to studies of prostate cancer, birth defects and pharmacogenomics. He recently published “Non- additive and epistatic effects of HLA polymorphisms contributing to risk of adult glioma.” J Neuro-Oncology 135:237-244, 2017.

Naomi Wray is Professor in the Institute for Molecular Bioscience at the University of Queensland. Her research focuses on development of quantitative genetics and genomics methodology with application to psy- chiatric and neurological disorders. She plays a key role in the International Psychiatric Genomics Consortium and co-leads the IceBucket Challenge funded sporadic ALS Australia systems genomics consortium (SALSA). She recently published “Inference in psychiatry via 2-sample Mendelian randomization - From association to causal pathway?” JAMA Psychiatry 74:1191-1192.

Module 19: Pathway & Network Analysis for Omics Data

Instructor(s): Shojaie, AliMotsinger-Rief, Alison

Networks represent the interactions among components of biological systems. In the context of high dimensional omics data, relevant networks include gene regulatory networks, protein-protein interaction networks, and metabolic networks.

These networks provide a window into biological systems as well as complex diseases, and can be used to understand how biological functions are implemented and how homeostasis is maintained. On the other hand, pathway-based analyses can be used to leverage biological knowledge available from literature, gene ontologies or previous experiments in order to identify the pathways associated with disease or an outcome of interest.

In this module, various statistical learning methods for reconstruction and analysis of networks from omics data are discussed, as well as methods of pathway enrichment analysis. Particular attention is paid to omics datasets with a large number of variables, e.g. genes, and a small number of samples, e.g. patients.

The techniques discussed will be demonstrated in R. This course assumes familiarity with R or other command-line programming languages.

Ali Shojaie is Associate Professor of Biostatistics at the University of Washington. He is interested in developing statistical methods for analysis of large, complex systems, particularly biological and social sys- tems. His research focuses on statistical methods for high-dimensional networks, statistical machine learning methods for estimation and inference in high-dimensional problems. He recently published “Using Twitter for demographic and social science research: Tools for data collection and processing.” Sociological Methods and Research 46:390-421, 2017.

Alison Motsinger-Reif is Associate Professor of Statistics at North Carolina State University. The primary goal of her research is the development of computational methods to detect genetic risk factors of common, complex traits in human populations. She focuses on the development of methods to detect complex predictive models in high-throughput genomic data. She recently published “Metabolic network failures in Alzheimer’s disease: A biochemical road map.” Alzheimers and Dementia 13:965-984, 2017.

Module 20: Bayesian Statistics for Genetics

Instructor(s):  Rice, KenWakefield, Jonathan

The module provides a foundation for Module 17: MCMC for Genetics.

The use of Bayesian methods in genetics has a long history. This introductory module begins by discussing introductory probability. It then describes Bayesian approaches to binomial proportions, multinomial proportions, two-sample comparisons (binomial, Poisson, normal), the linear model, and Monte Carlo methods of summarization. Advanced topics include hierarchical models, generalized linear models, and missing data.

Illustrative applications will include: Hardy-Weinberg testing and estimation, detection of allele-specific expression, QTL mapping, testing in genome-wide association studies, mixture models, multiple testing in high throughput genomics.

Ken Rice is Professor of Biostatistics at the University of Washington. His research focuses primarily on developing and applying statistical methods for complex disease epidemiology, notably cardiovascular disease. He leads the Analysis Committee for the CHARGE consortium, a large group of investigators studying genetic determinants of heart and aging outcomes. He recently published “Large-scale genome-wide analysis identifies genetic variants associated with cardiac structure and function.” J. Clinical Investigation 127:1798-1812.

Jon Wakefield is Professor of Statistics and Biostatistics at the University of Washington. His research interests include spatial epidemiology, space-time models for infectious disease data, small area estimation, hierarchical models for survey data, estimating national and subnational disease burden, ecological inference for non-infectious and infectious disease data, genome-wide association studies, analysis of next generation RNAseq data and the links between Bayes and frequentist procedures. He recently published “Impacts of Neanderthal-introgressed sequences on the landscape of human gene expression.” Cell 168:916, 2017.


Return to menu