Every click of a computer mouse adds to the growing mountains of data now being mined for answers to important societal, business or biomedical questions.
“The goal is to take these large data sets and try to find signal in the noise,” says Associate Professor Daniela Witten, a faculty member since 2010. “Unfortunately, it’s often easier to collect the data than it is to draw insights from it.”
Bigger, in other words, doesn’t necessarily mean better. “The era of Big Data is really challenging, because as the data get bigger, the signals are there, but there’s even more noise,” Witten says.
Witten’s expertise is in machine learning. She seeks to develop statistical methods to make sense of all this information. One motivating example for her work, she says, lies within the context of genomics.
“Incredible scientific breakthroughs in the last decade make it possible to sequence an entire human genome for a relatively low cost,” Witten says. “This means that we should be able to identify the genetic underpinnings of a lot of human diseases and obtain a much better understanding of the science than was ever possible before.”
Witten tries to develop algorithms that distinguish the signal from the noise, and to answer scientific questions on the basis of that data.
The challenge, Witten says, is that you may see an answer that is not quite right – noise that happens to look like a signal. “Often it’s possible to get results that look great, but won’t replicate in future studies,” she says.
That kind of science has huge costs, she adds: it can result in huge amounts of money wasted on a study. Furthermore, published results that are later shown to be incorrect could undermine faith in the field. “It’s really important that we get these things right and figure out best practices for how to deal with these very large data sets,” she says.
Witten adds, “As biostatisticians, we bring to the table an understanding of not only the key statistical issues, but also the underlying biological and medical contexts.”
Witten has helped put a friendly face on these complex issues. She’s been interviewed on TV and radio about her work, and has garnered numerous awards from the popular media. For three years in a row, she made Forbes’ “30 Under 30” list of top young scientists in America. (The streak ended when she turned 30.)
Elle Magazine gave her a “genius award.”
“It’s a great time to be a statistician,” Witten says. “Historically, statisticians haven’t gotten much attention. But this is beginning to change due to the importance of statistics in the analysis of big data, and the potential for new statistical methods to yield key insights across a number of fields.”
Federal funding agencies have certainly taken notice of Witten. She’s received an NIH Director’s Early Independence Award and a National Science Foundation CAREER award, each providing five years of funding.
In 2013, she co-authored Introduction to Statistical Learning, a best-seller in the statistical world. “There really was no textbook that provided the cutting edge, state-of-the-art information for machine learning in a way that was accessible to a non-specialist,” Witten says. The book is targeted at advanced undergraduates majoring in math or statistics, and graduate students in fields such as biology or oceanography.
After double-majoring in math and biology as an undergraduate, Witten wasn’t sure what to do. She was “terrible at lab work,” she says, and had a short attention span, so knew biology wasn’t the right path. At the same time, she wanted to develop a broad skill set. “I don’t know what the interesting problems are going to be in 10 years,” she says, “and I want to solve those when they come around. That’s how I ended up in statistics.”
Why the UW? “Seattle has a vibrant tech community,” she says. “It’s a beautiful city with great outdoors. It’s the full package.” The UW, meanwhile, is strong in statistics, biostatistics, computer science, genomics, and other fields.
Witten collaborates with faculty across campus, including those in the fields of genome sciences and electrical engineering. She loves the strong link between statistics and biostatistics. “The PhD students in Statistics and Biostatistics basically take the same classes,” says Witten, who is also an associate professor of statistics. “For me, that was a really big draw.”
Despite her many honors and accolades, Witten says that “with every passing month and year, you realize how little you know.” The UW has been “super supportive,” she adds, a place where colleagues are also your friends, you learn on the job, and you “develop the skills you need to keep growing as an individual and as a professional.”