Ameer Dharamshi, a University of Washington Biostatistics PhD student, has received the 2024 best student paper award given by the American Statistical Association (ASA) Student Paper Competition in the Section on Statistical Learning and Data Science.
Dharamshi’s paper “Generalized Data Thinning Using Sufficient Statistics” explores data thinning, a strategy to decompose individual data points into independent pieces, and offers data analysts a new tool to perform various common statistical tasks such as model selection, model testing, and inference.
Data analysts often wish they had access to two independent realizations of their data. For example, they may want one to fit a complicated model, and one to see how that model performs on new data, or they may want one to generate a hypothesis, and one to test that hypothesis.
“In practice, there is typically only one dataset, so the classical solution is to “sample split” the data by randomly allocating some observations to fitting, and the rest to validation,” said Dharamshi.
“Unfortunately, sample splitting is not a universal solution, it often fails when the data are not independent and identically distributed. So, in this work, we identify the underlying principles that enable data thinning alternatives, characterize the set of problems where this strategy is applicable, and show that sample splitting is actually a special case of data thinning.”
Dharamshi is the lead author of the paper and co-authors include Anna Neufeld, Keshav Motwani, Lucy Gao, Daniela Witten, and Jacob Bien.
As a student paper competition winner, Dharamshi will receive a cash prize and is invited to present at the 2024 Joint Statistical Meeting (JSM) in August. JSM draws more than 6,500 participants and is the largest gathering of statisticians and data scientists held in North America.