News & Events

Biostatistics students take on real-world data challenges

Applying quantitative skills to real-world questions and challenges is fundamental to the UW Biostatistics student experience, and summer internships are one way students can apply and expand their statistical and professional skill sets.

This past summer, students tackled a broad range of challenges including:

  • Working with the King County public health department's Overdose Surveillance Team to develop a supervised learning model aimed at identifying locations where deadly overdoses commonly occur.
  • Conducting data analyses for a project exploring a way to accelerate early development of novel Chronic Obstructive Pulmonary Disease (COPD) therapies.
  • Using residual analysis to improve development of a new pricing model for auto fleets from major companies like FedEx and Amazon.
  • Applying causal inference methodology to address questions related to Amazon’s fulfillment center operations and the impact on delivery.    

Learn more about the work some of the students conducted and the professional insights they gained.

 

Image
Breanna Brown

Breanna Brown – MS Capstone student

Company: Public Health — Seattle & King County

The work: I worked on a direct study with King County public health department's Overdose Surveillance team to develop a supervised learning model that classifies textual responses into predetermined categories. Specifically, we were looking at the location section of the form that is filled out by the Medical Examiner's Office when they investigate a death. We took the overdose-related responses and used a supervised learning model to categorize these responses into broad location types (like supportive housing, public transit, private business, etc.) to determine the types of locations that deadly overdoses are commonly occurring in King County.

Most valuable take-away: My most valuable takeaway was the importance of being open to learning new skills and software. I thought this project was going to just be using R, but I ended up needing to learn some SQL and Python to provide the best model for the task at hand. This year, I plan to practice other programming languages outside of the classroom so I can be a well-rounded candidate for future jobs. 


Image
Ethan Ashby

Ethan Ashby – PhD student

Company: Genentech

Position: Data and Statistical Science Intern

The work: I worked on a project examining whether combining a traditional definition of pulmonary exacerbation with a patient-reported respiratory symptom measure could accelerate early development of novel Chronic Obstructive Pulmonary Disease (COPD) therapies. The clinically accepted endpoint in COPD studies – exacerbations requiring healthcare utilization – is believed to underreport the true exacerbation burden, necessitating larger and longer trials to evaluate the efficacy of new treatments. My project examined whether the EXACT patient-reported outcome instrument, a daily respiratory symptom diary, could be used to detect additional exacerbation events and obtain earlier readouts of treatment effects. As a case study, my project focused on a recently completed randomized, phase 2b study of an investigational biologic developed by the company for patients living with COPD.

In collaboration with data scientists, clinical scientists, and members of the patient-reported outcomes group, I led the analyses of the EXACT-PRO data in collaboration with data scientists. My main contributions were developing and validating a codebase to analyze the EXACT-PRO data, developing creative ways to improve signal detection from the daily time series of symptom scores, and developing data-driven approaches to identify a promising composite endpoint for boosting the statistical power of future trials. I also built interactive applications to facilitate exploration of individual-level participant symptom data. Throughout the summer, I presented my work to teams of data scientists, clinical scientists, and the company’s early clinical development leadership.

Most valuable takeaway: During my time at Genentech, I realized that a statistician’s ability to communicate technical concepts to different audiences is as important as the ability to execute technical analyses. Presenting my work and fielding questions from colleagues across a variety of disciplines helped me deeply understand my project’s goals, advance my understanding of the disease area, build interest in my project, and identify compelling research questions to explore in the future.


Image
Albert Osom

Albert Osom – PhD student

Company: Amazon

Position: Research Scientist intern, OpsLab - Modeling and Optimization team

The work: I applied causal inference methodology to address questions related to Amazon’s fulfillment center operation and its impact on delivery promise. My responsibilities included formulating statistical questions that aligned with business objectives, implementing analyses, and presenting findings to leadership. 

Most valuable take-away: I learned that true scientific impact comes from deeply understanding the business problem and tailoring statistical solutions to meet those needs. I was fortunate to work under a manager who emphasized and created opportunities to engage directly with business stakeholders. The fast pace of industry underscored the importance of having a strong statistical foundation and a versatile set of methods. While much of the work involved adapting existing approaches, I also recognized the value of bridging academic literature with real-world challenges and knowing when to innovate methodologically.


Image
Michael Yung

Michael Yung – PhD student

Company: Liberty Mutual Insurance Group

Position: Graduate Data Science Intern

The work: I primarily focused on a key project within the "Commercial Auto Products" space that involves insurance accounts and contracts for auto fleets from major companies like FedEx and Amazon. My main responsibility was conducting residual analysis on an existing production model and using my findings to improve the development of a new pricing model. These pricing models play a crucial role in key underwriting decisions made by actuaries, where inference is a priority. As a result, the pricing models in this space are straightforward regularized general linear models. 

Most valuable take-away:  I had the opportunity to collaborate with a diverse team of experts, including PhDs in fields ranging from statistics to engineering. This experience demonstrated that industry-level work is often not too far off from the advanced research we do as PhD students. It not only enhanced my technical skills but also boosted my confidence in applying theoretical knowledge to real-world problems.