Presentation: Scalable and Accurate Methods for Big Biobank Data Analysis
Speaker: Seunggeun Shawn Lee, PhD, Associate Professor of Biostatistics, University of Michigan School of Public Health
Abstract: Large-scale biobanks have emerged as a powerful resource for complex disease studies and precision medicine. The genomic information coupled with clinical, behavior and environmental measurements enables to discover novel genetic associations and disease mechanism across the entire phenome. However, the scale and complex structure of biobank data have remained substantial challenges. In this talk, I will introduce our new methods of rare variant tests and gene-set analysis for biobank size data. The proposed rare variant test, SAIGE-GENE, can analyze 500,000 samples for binary phenotypes with adjusting for family relatedness and case-control imbalance. By using a subset-based approach, the gene-set analysis method, GAUSS, can identify the subset of genes associated with phenotypes, which greatly improves the interpretation of the results. Phenome-wide analysis in UK-Biobank shows that these two methods can analyze biobank size data and identify novel associations.