A recent analysis by University of Washington researchers supports longtime concerns surrounding the acquisition and use of information collected through the Human Genome Diversity Project (HGDP). The analysis appeared in a recent Nature Genetics Comment.
HGDP was established decades ago with the aim of creating a collection of cell lines that reflected global ethnic and geographic variation, with a particular focus on isolated, Indigenous populations. But from the beginning, unease about data collection, informed consent, and scope of data use have besieged HGDP.
Lead investigators Sarah Catherine Nelson, a research scientist with the UW Department of Biostatistics, and Stephanie Malia Fullerton, a UW professor of Bioethics and Humanities, set out to learn more about HGDP data and the ways in which it has been used over the years. To accomplish this, they reviewed research papers published between 2010 and 2024.
Both Nelson and Fullerton were surprised by how difficult it was to determine if a particular peer-reviewed publication had actually used data derived from the HGDP collection.
“This was not only a challenge we had to overcome for our project, but a broader sign of just how embedded and often unacknowledged the use of HGDP is in genomics research. This is a problem because it obscures where and how HGDP is being used, and makes it harder for downstream users or readers to have transparency into that,” said Nelson.
Transparency is especially important considering the ways in which HGDP samples were obtained and the uncertainty about consent parameters agreed to by donors.
Indigenous groups criticized scientists for failing to coordinate the collection of samples within their communities, leading many to distrust researchers and dub HGDP a “vampire project.”
Also, the source of many HGDP samples is uncertain as the database includes pre-existing materials collected by geneticists and anthropologists for other purposes.
“The HGDP cell line collection is an unusual ‘legacy’ dataset insofar as there is no direct access to the original consent forms or protocols,” said Fullerton.
“This is partly a function of the fact that the data were collected in the 1980s (or possibly even earlier) and by many different investigators. While it may be tempting to continue to use data of such uncertain provenance, it is risky.”
Nelson noted that while not much is known publicly about the informed consent procedures for the individuals and communities whose samples comprise the HGDP, what is known is that the initial project aims were focused on population-based descriptions of genetic variation.
But their analysis revealed that in nearly 40 percent of the articles reviewed, HGDP data were used in ways that were inconsistent with this initial intent. North American researchers accounted for the majority of these instances, leading to concerns about Western appropriation of the biological heritage of Indigenous groups.
“That current uses extend beyond that initial aim is troubling as it heightens the concern that HGDP data contributors were not informed about the scope of how their samples and data would be used,” said Nelson.
The paper notes that while many scientists may not be aware of HGDP’s contested history or the origins of its samples, steps still need to be taken to raise awareness and prevent misuse.
“The biggest danger of using HGDP-derived data for purposes other than those originally intended is that it undermines trust in genomic research and makes it less likely that other communities will choose to participate in such research in the future,” said Fullerton.
The analysis offers actions that can be taken to phase out use of HGDP-derived data, steps that emphasize transparency, accountability, training, and identifying and evaluating alternative datasets. But taking these steps will be challenging.
“HGDP has become so embedded in much of genomics research that it can seem daunting to try and remove it from our paradigms,” said Nelson.
Fullerton agrees. “I think that the biggest challenge will be the one of deciding whether or not to continue to use the rich data derived from this sample collection. Because the data are in the public domain and often embedded in other, larger resources, this will be a matter of individual conscience and due diligence on the part of individual researchers and research consortia.”
Study co-authors include Stephanie Gogarten, a research scientist with the UW Department of Biostatistics, and Jacky Dahlquist, a former research coordinator with the UW Department of Bioethics and Humanities.
This work was supported by a tier 1 pilot grant from the UW Population Health Initiative, as well as partial matching funds from the UW Departments of Bioethics & Humanities and Biostatistics.