23rd Summer Institute in Statistical Genetics (SISG)

Module 17: MCMC for Genetics

Session 6: Wed Jul 25 to Fri Jul 27

Module dates/times: Wednesday, July 25, 1:30-5 p.m.; Thursday, July 26, 8:30 a.m.-5 p.m., and Friday, July 27, 8:30 a.m.-5 p.m.

This module examines the use of Bayesian Statistics and Markov chain Monte Carlo methods in modern analyses of genetic data. It assumes a solid foundation in basic statistics and the concept of likelihood. Some population genetics and a basic familiarity with the R statistical package, or other computing language, will be helpful.

The first day includes an introduction to Bayesian statistics, Monte Carlo, and MCMC. Mathematical concepts covered include expectation, laws of large numbers, and ergodic and time-reversible Markov chains. Algorithms include the Metropolis-Hastings algorithm and Gibbs sampling. Some mathematical detail is given; however, there is considerable emphasis on concepts and practical issues arising in applications.

Mathematical ideas are illustrated with simple examples and reinforced with a computer practical using the R statistical language. With that background, two applications of MCMC are investigated in detail: inference of population structure (using the program STRUCTURE) and haplotype inference (using the program PHASE). Computer exercises using both programs are included.

Further topics include the use of MCMC in model evaluation and model checking, strategies for assessing MCMC convergence and diagnosing MCMC mixing problems, importance sampling, and Metropolis-coupled MCMC.

Eric Anderson is Research Molecular Geneticist at the National Oceanic and Atmospheric Administration. He is interested in statistical and computational methods for inference from genetic data, with applications to management and conservation of fish species. His current work is directed toward the use of single nucleotide polymorphisms (SNPs) for fisheries management, large-scale parentage inference and for genetic stock identi- fication in mixed-stock fisheries. He has recently published “Genetic and individual assignment of tetraploid green sturgeon with SNP assay data.” Conservation Genetics 18:1119-1130.

Matthew Robinson is Professor of Computational Biology at the University of Lausanne. He develops and applies statistical methodology for large human phenotype-genotype datasets to address questions in population, quantitative, and medical genetics. His current work focuses on improved testing for sex-, age-, or environment-specific genetic effects, quantifying maternal genetic and social genetic effects, and investigating the role of genetic interactions between microbial and host genotype in shaping phenotype in the human populations. He recently published “Genotype-covariate interaction effects and the heritability of adult body mass index.” Nature Genetics 49:1174, 2017.

Matthew Stephens  is Professor of Statistics and Human Genetics at the University of Chicago. He was a developer of STRUCTURE, a widely used computer program for determining population structure and estimating individual admixture. He also was a developer of the influential Li and Stephens model as an efficient model for linkage disequilibrium. His recent publications include “Bayesian large-scale multiple regression with summary statistics from genome-wide association studies.” Annals of Applied Statistics 11:1561-1592.

Access 2017 Course Materials (2018 materials will be uploaded to this page prior to the start of the module)