Making data sharing count

Consider a typical fMRI study:

Twenty participants scanned for an hour = 10000 USD.
Research Assistant to run participants = 20000 USD.
Postdoc to invent the study and write it up = 40000 USD.

70000 USD later science is richer by an eight page paper, peer reviewed and published in an academic journal. The authors might look at the data again some time later, maybe join it with some other of their dataset to improve power. Maybe. Or maybe they will not have time. We may never learn if there was anything more in the data (all 360 million datapoints of it) than what those eight pages described.

Most scientists agree that sharing data makes sense and leads to better, more reproducible, transparent, and objective science. Funding agencies (the guys who turn your taxes into academic papers) understand how expensive data collection is and want to squeeze as much as possible out of existing data. But the perspective of an individual scientist is different. Sharing data does not come for free. You need to clean the data and describe it properly so other could make good use of it. You also risk that someone will try and fail to replicate your findings - unearthing a mistake in your analysis. All that for what? So someone else could take YOUR data find something interesting that you have missed and publish it? Leaving you with no credit for the data collection, nothing to put on your CV when you are going to face the tenure track committee?

Luckily not all scientists think this way, but plenty do. Even though there are many visionaries and idealists in science (luckily!) in many situations it is a dog eat dog, you publish or you perish dynamic. I don't believe this is fundamentally wrong - competition is driving development. Besides entities distributing money in science have to somehow make their decisions. Therefore we should not fight this, but try to tap into the existing system of academic credit.

Together with Daniel and Mike we have recently written a paper describing an attempt to increase the motivation of an individual researcher to share data. Instead of just putting your data on a website and not getting anything in return one would write a short paper describing in details how the data was acquired and how it is organized. Such data paper is publication like every other paper. It has a DOI, can be cited, and has to be peer reviewed before being accepted. This simple idea solves multiple problems:

Through citation data producers get appropriate credit. Interesting data sets will lead to highly cited papers.
Peer review process assures that the quality of the data and metadata leans to trouble free reuse.
A separate publication allows more space for detailed description of acquisition methods in contrast to just a few paragraphs of a typical cognitive neuroscience paper.
All people involved in the data collection (including research and lab assistants) can co-author the paper without concerns of "dilution of credit".

By no means this is a new idea: it has been implemented in other fields (see our paper for more details). It just needs to gain momentum (and this is the main reason for this shameless plug ;). There are already several neuroimaging journals that will accept data papers: GigaScience (they will also host your data), Neuroinformatics and Frontiers in Brain Imaging Methods. There is really not much to loose. With little effort you can get a publication, promote and share your data. So what are you waiting for? Publish a data paper to increase the impact of your research and receive credit for your data sharing efforts!

Comments

gwern06 February, 2013 08:51
This is an interesting idea, but the paper doesn't seem to address the fundamental question: what makes you think that data papers will make data sharing count? Reading this post and skimming the paper, I didn't see anything about this: is there any evidence that data papers boost tenure prospects? Salaries? Chance of still being in a field at a later followup? Publication of additional papers?

If data papers are being used in other fields, this data should exist; or another angle would be to look at software packages since at least among R people it's not uncommon to see a published paper justifying and explaining a package which is then cited by subsequent users.
ReplyDelete
Replies
Chris Gorgolewski08 February, 2013 04:20
True, we did not include any data on impact of publishing data papers on researcher careers. I will have a look at this, but I'm afraid it would be a very difficult comparison. Many factors contribute to academic success so it would be hard to make a fair comparison between authors that published data papers and those that don't. Additionally some factors may correlate with tendency to publish data papers in a non causal way.

It is also a question how to measure academic success. Normally it would be based on the number and popularity of published papers, but it is not clear we can use it in this context. Clearly being able to publish data papers can increase the number of publications you have. The question is if those publication will have any impact, or in other words how will the be perceived by grant reviewers and tenure committees. Quantifying success without using publications can turn out to be quite tricky.

The software example you have mentioned also fills me with hope - some of the most cited papers in neuroimaging are describing methods, which would not succeed without a good software package.
ReplyDelete
Replies
gwern08 February, 2013 08:36
> I will have a look at this, but I'm afraid it would be a very difficult comparison. Many factors contribute to academic success so it would be hard to make a fair comparison between authors that published data papers and those that don't.

Which also means that anyone who publishes a data paper will be taking as much a gamble as anyone who just shares data, and since papers are harder to write than a short webpage describing informally the data and linking files...

> The software example you have mentioned also fills me with hope - some of the most cited papers in neuroimaging are describing methods, which would not succeed without a good software package.

If you can't show any benefit to the author from those most-cited papers, then a fortiori, that undermines any case for data papers.

Also, the existence of software papers could easily not show that data papers have a chance: software is not data. Software has a much better history or story about how providing software can help you: other people can contribute bug fixes, keep it up to date and still compiling & running, optimize it, etc. If you plan to reuse the software in the future, then it can easily be a good idea to clean it up a bit and publish it; and even if it isn't a good idea strictly from the cost-benefit view, there's a widespread programming culture of sharing code under liberal licenses.

Most of these reasons do not exist for data: the most valuable 'bug fixes' are pointing out serious errors or inconsistencies in the data of the sort that would discredit papers and hence careers, there's not really an equivalent of compiling/running (a text file will always be readable), data can't really be optimized short of just deleting parts (which is bad from an archival point of view) or compressed (which is trivial), and there obviously is no such culture in science encouraging data release as the default.
ReplyDelete
Replies
aivivubooking102 May, 2021 18:38
Aivivu chuyên vé máy bay, tham khảo

vé máy bay đi Mỹ bao nhiêu tiền

vé bay hồ chí minh đi hà nội

vé máy bay sg

giá vé đi nha trang máy bay

vé máy bay từ mỹ về việt nam hãng ana

taxi đi sân bay

combo đi quy nhơn 4 ngày 3 đêm
ReplyDelete
Replies
Zonahobisaya21 January, 2022 08:21
гэта прыгожа : X Drake One Piece
гэта прыгожа : Law One Piece
гэта прыгожа : Diamante One Piece
гэта прыгожа : Biodata
гэта прыгожа : Chopper One Piece
гэта прыгожа : Denjiro One Piece
гэта прыгожа : Kozaburo One Piece
гэта прыгожа : Terbanyak
ReplyDelete
Replies
Anonymous15 March, 2022 00:22
Casino Nightclub Review - TrickToAction | TrickToAction
Casino Nightclub is a social casino nba중계 보는곳 리치티비 and nightlife destination. 토토 졸업 넷마블 This 네임드 파워 사다리 modern 먹튀 다 자바 and eclectic nightlife venue opened in 2003, welcoming players from all over the world 가입 머니 즉시 지급 to
ReplyDelete
Replies
DAVE26 August, 2022 01:23
Extraordinary Article. Looking extraordinary work dear, I truly appreciated to you on this quality work. I would like agree that gratitude for this post.
https://www.seo-bookmarks.win/hop-over-to-this-website-34
ReplyDelete
Replies
jack william24 July, 2025 08:47
QuickBooks Error 1317 QuickBooks Error 1317usually shows up during installation when the application cannot create a directory because of system-level conflicts or restricted folder permissions. Limited user rights, antivirus program intervention, or corrupted system files can all cause this error.
ReplyDelete
Replies
Anonymous11 September, 2025 02:30
Thank you for the helpful blog, "Making data sharing count." I want you to know that your information is invaluable for aspiring candidates. Keep sharing valuable updates!
Neet World Coaching Institute in Narayanguda, Hyderabad
ReplyDelete
Replies
healthy türkiye14 November, 2025 01:00
Nice article, thanks for the information. If you are interested in medical treatment in Turkey, I definitely recommend looking zubne korunky turecku
ReplyDelete
Replies
Thanks for sharing this insightful post! I especially liked the way you explained. It gave me a fresh perspective. I’ll definitely try to implement this in my https://menteso.com/legal-solutions/. Loo10 June, 2026 00:16
Thanks for sharing this insightful post! I especially liked the way you explained. It gave me a fresh perspective. I'll definately try to implement this in my https://menteso.com/article/ip-tech-operations-2025/
ReplyDelete
Replies
Thanks for sharing this insightful post! I especially liked the way you explained. It gave me a fresh perspective. I’ll definitely try to implement this in my https://menteso.com/legal-solutions/. Loo01 July, 2026 23:47
Thanks for sharing this insightful post! I especially liked the way you explained. It gave me a fresh perspective. I'll definately try to implement this in my https://ipdocketers.com/patent-prosecution-workflow-optimization-best-practices/
ReplyDelete
Replies

Add comment

Chris Gorgolewski: Multiple Comparisons

Search This Blog

Making data sharing count

Labels

Comments

Post a Comment