Making data sharing count

Consider a typical fMRI study: 
  • Twenty participants scanned for an hour = 10000 USD.
  • Research Assistant to run participants = 20000 USD.
  • Postdoc to invent the study and write it up = 40000 USD.
70000 USD later science is richer by an eight page paper, peer reviewed and published in an academic journal. The authors might look at the data again some time later, maybe join it with some other of their dataset to improve power. Maybe. Or maybe they will not have time. We may never learn if there was anything more in the data (all 360 million datapoints of it) than what those eight pages described.

Most scientists agree that sharing data makes sense and leads to better, more reproducible, transparent, and objective science. Funding agencies (the guys who turn your taxes into academic papers) understand how expensive data collection is and want to squeeze as much as possible out of existing data. But the perspective of an individual scientist is different. Sharing data does not come for free. You need to clean the data and describe it properly so other could make good use of it. You also risk that someone will try and fail to replicate your findings - unearthing a mistake in your analysis. All that for what? So someone else could take YOUR data find something interesting that you have missed and publish it? Leaving you with no credit for the data collection, nothing to put on your CV when you are going to face the tenure track committee?

Luckily not all scientists think this way, but plenty do. Even though there are many visionaries and idealists in science (luckily!) in many situations it is a dog eat dog, you publish or you perish dynamic. I don't believe this is fundamentally wrong - competition is driving development. Besides entities distributing money in science have to somehow make their decisions. Therefore we should not fight this, but try to tap into the existing system of academic credit.

Together with Daniel and Mike we have recently written a paper describing an attempt to increase the motivation of an individual researcher to share data. Instead of just putting your data on a website and not getting anything in return one would write a short paper describing in details how the data was acquired and how it is organized. Such data paper is publication like every other paper. It has a DOI, can be cited, and has to be peer reviewed before being accepted. This simple idea solves multiple problems:
  • Through citation data producers get appropriate credit. Interesting data sets will lead to highly cited papers.
  • Peer review process assures that the quality of the data and metadata leans to trouble free reuse.
  • A separate publication allows more space for detailed description of acquisition methods in contrast to just a few paragraphs of a typical cognitive neuroscience paper.
  • All people involved in the data collection (including research and lab assistants) can co-author the paper without concerns of "dilution of credit".
By no means this is a new idea: it has been implemented in other fields (see our paper for more details). It just needs to gain momentum (and this is the main reason for this shameless plug ;). There are already several neuroimaging journals that will accept data papers: GigaScience (they will also host your data), Neuroinformatics and Frontiers in Brain Imaging Methods. There is really not much to loose. With little effort you can get a publication, promote and share your data. So what are you waiting for? Publish a data paper to increase the impact of your research and receive credit for your data sharing efforts!

Popular posts from this blog

Highlights from the NeuroImage Data Sharing Issue

This is my brain: sharing the risk

The unsung heroes of neuroinformatics