Excavating answers from massive health data sets

In his first year as a biostatistics PhD student, Jasper Yang hoped to conduct meaningful research. After all, the chance to collaborate with University of Washington and other area researchers was one of the reasons he chose to attend UW.

UW Biostatistics students at a department-sponsored poster session in 2024
Biostatistics PhD Student Jasper Yang (left) attending a department-sponsored poster session in February 2024.

But even he felt lucky to land a position with a Kaiser Permanente Health Research Institute team that was developing statistical methods to analyze electronic health records (EHR) in order to answer questions about HIV treatments and better understand the disease in general.

“Traditionally, collecting medical data for research has involved careful instructions for clinicians and participants regarding how the data is going to be collected. But in this modern age of medicine, we now have access to these massive data sets that are collected routinely through hospital visits or regular doctor checkups, and these data sets have the potential to be extremely powerful,” said Yang.

EHR research allows investigators to study populations that traditional research may have overlooked and in a manner that has no time constraints. But since the data wasn’t collected for research purposes, it has what Yang referred to as “baked-in biases,” which require the development of cutting-edge statistical tools to ensure that any analyses of the data are as accurate and precise as possible.

Research interview: Jasper Yang

Yang discusses his research improving the accuracy and precision of statistical analyses conducted using routinely collected electronic health records for his research project with Kaiser Permanente Health Research Institute.

“One of the primary reasons why this area is so interesting to me is that when you get a data set, there isn’t just one variable that has some systemic error that you can look at and fix. Health-related data sets have a number of variables which were measured with error. The measurement error problem is both challenging and pervasive and this generates really interesting statistical questions with important implications.”

Sharing the team’s new statistical methods with biostatisticians, epidemiologists, doctors, and other practitioners is every bit as important as the methods themselves.

“If we develop methods but don’t offer any tools for other people to use them then how useful are those methods?” said Yang.

UW Biostatistics students at an autumn social gathering.
Jasper Yang (center) with other UW Biostatistics students at one of the autumn social gatherings held by the department.

“I was very grateful to end up working at Kaiser Permanente. It’s been everything I’ve expected and more. I’ve met so many people and learned so much from them. Just hearing their thoughts and their decision-making processes when they’re conducting statistical analyses has been extremely valuable. I know they’ll be mentors for me for a long time.”

In addition to the EHR research, Yang started an independent study project with Professor of Biostatistics Ken Rice that focuses on multiple testing and decision theory.

“This opportunity to explore a new area while working one-on-one with a professor who is an expert in the field has been extremely valuable to me as a first year PhD student,” said Yang.

Yang was first exposed to public health through his work as an EMT and he initially wanted to be a doctor.

“But I quickly found that I love mathematics and statistics and so biostatistics was kind of a perfect intersection of all those ideas. And now here I am hoping that through my research I can make an important impact on public health,” said Yang.