When Data Parasites Are a Positive
A growing trend of information sharing will move science forward, one cardiologist says.
It started as an insult, but advocates for data sharing are encouraging people to become research parasites and are inspiring symbiosis in science.
With the increasing sophistication of methods in disciplines like computational biology and epidemiology, and the vast possibilities that come with newer fields like deep learning, there’s more of a need than ever for skilled professionals who focus full-time on data analysis.
That means existing data sets from studies including clinical trials can now become meaningful in ways far beyond the original idea that sparked the research, says cardiologist J. Brian Byrd, M.D., M.S., at Michigan Medicine’s Frankel Cardiovascular Center.
That’s true as long as more researchers start to become interested in sharing their data for others to use, he says.
“Over time, I anticipate funding agencies will begin to reward those who share more and share better with an increased chance of future funding,” Byrd says.
To help foster an environment where people would want to share with excellence, he hosted a podcast with The Lancet about open data last year, and co-founded the Research Symbiont Awards, which he presently chairs. The annual award recognizes someone who goes beyond typical standards for data sharing, making it easy for others to use the data.
Byrd says he isn’t worried about fears that the growing demands to share data openly could lead to a situation where some researchers exclusively analyze existing data, without generating any of their own. That’s a growing need in the medical community, now that we can do more with large amounts of data, he says.
Byrd discusses the award he created and why he’s passionate about shifting the culture toward more sharing.
What’s the current status of data sharing?
Byrd: While I’d still say there is no roadmap for data sharing, things are changing. I see a confluence of interests toward more, better, safe, ethical and careful data sharing today than even months ago or last year.
For example, I’ve personally found basic infrastructure is coming along. I had an experience several months ago while starting to plan a study, in which I know we’d like to share the data, where it seemed unclear who could figure out the logistics of doing that and make sure any regulatory implications were managed. Now, I’ve circled back months later and have found more people are able to help us figure this out.
I think we’re at a tipping point where sharing becomes more normal and more desirable for the person who holds the data.
What makes for successful data sharing?
Byrd: The most important thing is to make it as easy as possible, within legal and ethical constraints, for other people to use your data. That could look like a downloadable data set with additional information on a website.
For the Research Symbiont Awards, we look at researchers who went beyond typical standards to create openly shared resources or data sets that could allow other people to take science further using what was shared.
I find that what people do currently varies quite a bit, and it varies by field. For example, it’s very common today for people who do sequencing to upload the data to a public repository. But there are other fields in which sharing is still more unusual.
What are the main advantages for researchers to share their data?
Byrd: You allow for a longer life cycle of your research. Other people could use what you’ve gathered in ways you may not ever have thought of, or may not have time to collaborate on right now.
Some newer forms of analysis, like deep learning, require a vast amount of data to train the models that will then be used for a helpful application in the medical sphere. To the extent that data sets become available, we can do interesting things beyond the original research aims.
For example, I worked with colleagues at the University of Pennsylvania to create a synthetic dataset from the original data out of the SPRINT blood pressure trial. We used a novel method called generative adversarial neural networks, in which two neural networks train each other to make synthetic data similar to the original data, but not the original data. Researchers can do analyses to find real meaningful results that don’t contain any trial participant’s real data.
What risks or concerns come with data sharing at this point?
Byrd: We must be concerned about any threats to the privacy of participants in a study. This is something we’re giving a lot of thought to: how can we best enable sharing of data without compromising privacy?
What people originally consented to matters a lot. The literature shows that the large majority of clinical trial participants are open to their data being shared for a variety of purposes. People may in fact expect broader use of the information generated through their process of volunteerism for a study. It’s important to discuss these topics in a public conversation so everyone has the same information.