21st Summer Institute in Statistical Genetics

Module 14: MCMC for Genetics

Week 3, Session 5, Monday 8:30 AM - Wednesday 12:00 PM: Mon Jul 25 to Wed Jul 27

This module examines the use of Bayesian Statistics and Markov chain Monte Carlo methods in modern analyses of genetic data.  It assumes a solid foundation in basic statistics and the concept of likelihood. Some population genetics and a basic familiarity with the R statistical package, or other computing language, will be helpful. The first day includes an introduction to Bayesian statistics, Monte Carlo, and MCMC. Mathematical concepts covered include expectation, laws of large numbers, and ergodic and time-reversible Markov chains. Algorithms include the Metropolis-Hastings algorithm and Gibbs sampling.  Some mathematical detail is given; however, there is considerable emphasis on concepts and practical issues arising in applications. Mathematical ideas are illustrated with simple examples and reinforced with a computer practical using the R statistical language.  With that background, two applications of MCMC are investigated in detail: inference of population structure (using the program STRUCTURE) and haplotype inference (using the program PHASE). Computer practicals using both programs are included.   Further topics include the use of MCMC in model evaluation and model checking, strategies for assessing MCMC convergence and diagnosing MCMC mixing problems, importance sampling, and Metropolis-coupled MCMC.  Software used: R, STRUCTURE, PHASE.

Background reading: Shoemaker, J.S., I.S. Painter and B.S. Weir. (1999). Bayesian statistics in genetics. Trends in Genetics 15:354--358.
Beaumont, M.A. and B. Rannala. (2004). The Bayesian revolution in genetics. Nature Reviews Genetics 5:251--261.
Gilks, W.R., S. Richardson and D.J. Spiegelhalter. (1996). ``Markov Chain Monte Carlo in Practice.'' Chapman and Hall.