Presentation: A Unified Approach to Model-agnostic Variable Importance
Candidate: Brian Williamson, Graduate Student, UW Biostatistics
Committee Members: Marco Carone (co-Chair), Noah Simon (co-Chair), Peter Gilbert, Scott Emerson, Annette Fitzpatrick (GSR)
Abstract: Assessing the relative contribution of subsets of features towards predicting the response is often of interest in predictive modeling applications. Often, simple population models are used because the associated variable importance measure is easy to interpret; however, estimates may be misleading if the model used is overly simplistic. In an effort to improve prediction performance, complex prediction algorithms are often used instead; however, in these cases variable importance is often defined as a function of the algorithm rather than a summary of the population, rendering formal statistical inference on population importance difficult. In this dissertation, we propose a unified model-agnostic framework for statistical inference on population-level variable importance. Specifically, we define variable importance as a contrast between the predictiveness of the best possible prediction function based on all available features versus all features but those under consideration. We discuss general conditions under which a simple estimator of this importance is nonparametric efficient and allows the construction of valid confidence intervals. We also propose a valid strategy for hypothesis testing. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.