Microbes are everywhere. Your body is home to thousands of unique bacteria (your “microbiome”), and the microbes that you carry and are exposed to daily can cause or prevent dangerous infections.
But current methods used to determine which microorganisms are present and at what levels are not exact. Tools such as high throughput sequencing technology introduce bias into the results, a concept Willis demonstrated in collaboration with researchers at the North Carolina State University.
Specifically, the team identified that common sequencing tools used by researchers display a preference for certain microorganisms over others, but in a consistent and correctable way. The paper, Consistent and correctable bias in metagenomic sequencing experiments, was published in eLife Sciences earlier this month.
Willis explains, “Lactobacillus iners—a commensal—is overdetected by common sequencing protocols, while Streptococcus agalactiae—a pathogen—is underdetected. Failing to account for this puts us at risk of making incorrect conclusions about risk factors for Streptococcus infections, for example.”
“Statistical and computational tools like the ones we are developing improve the accuracy of microbial abundance measurements, and provide scientists studying the microbiome with more reliable information about what keeps a microbiome healthy,” says Willis. “If we cannot accurately profile microbial communities, we can't understand how to prevent and treat infections and microbially-mediated diseases.”
An interest in the microbiome is not new to Willis. Her first project as a graduate student at Cornell University involved estimating species diversity in the context of the microbiome, and other projects throughout graduate school involved improving methods and software for microbial ecology.
When Willis joined the Department of Biostatistics in 2017, she knew statistical methods for the analysis of microbiome data would be a focus in her research.
“This is a fantastic time to be a methods developer for microbiome data because of the confluence of the high-throughput sequencing revolution and our improving understanding of the role of microbiomes in various human diseases and health outcomes.”
Willis sees one of the goals of her research at UW as developing methods that leverage data that microbial ecologists are collecting, but not using. This is also a focus of the NIH grant research. For example, it is common to collect various types of control data when doing a microbiome experiment (e.g., sequencing blanks or mock communities), but very few methods exist to use that data.
In other research which was recently released as a preprint on biorxiv, Willis and co-authors from UW Biostatistics proposed a method to jointly model relative abundance data for many taxa and absolute abundance data for a subset of taxa.
“Essentially, we demonstrate that there are many different tools that we can use to correct taxon-specific biases in high throughput sequencing data. This is an important idea because different microbiome fields have access to different technologies, and demonstrating the breadth of the calibration idea will be important to its long-term adoption.”
Co-authors on the paper include Jim Hughes, professor of biostatistics at the University of Washington, and Brian Williamson, PhD candidate.