5th Annual Summer Institute in Statistics for Clinical Research (SISCR)

Module 3: Modern Statistical Learning Methods for Observational Biomedical Data (EXPANDED TWO DAY MODULE)

Session 1: Mon Jul 23

** Expanded two-day format **

** This module is in both Session 1 (Monday) & Session 2 (Tuesday) **

Module date/time: Monday, July 23, 8:30 a.m.-5 p.m. and Tuesday, July 24, 8:30 a.m.-5 p.m.

While clinical trials provide the highest level of evidence to compare clinical treatments or public health interventions, they are often not feasible due to ethical, logistic or economical constraints. Observational studies provide an opportunity to learn about the effect of interventions for which little or no trial data are available. These studies constitute a potentially rich and relatively cheap source of information. However, in such studies, treatment or intervention allocation may be strongly confounded by other important patient characteristics and much care is needed to disentangle observed relationships and infer causal effects.

In this course, we will provide an overview of modern statistical techniques for analyzing observational data. We will focus primarily on recent advances in the field of targeted learning, which uses of state-of-the-art machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In contrast, conventional techniques for confounding adjustment rely on restrictive statistical models and may therefore lead to severely biased inference. Use of the Super Learner framework, an implementation of model stacking, will be discussed as a particularly appealing means of performing flexible, pre-specified adjustment for confounding.

We will discuss methods for comparative effectiveness studies for single time-point interventions. We will also introduce the multi time-point extension of these methods and discuss strategies for dealing with missing data. Methods will be illustrated using data from recent observational studies and extracted from electronic medical records. Analyses will be illustrated in R but knowledge of R is not required for this module. In addition to lectures, the course will include in-class hands-on activities to allow students to familiarize themselves with the methods and tools.