Wednesday, February 26, 2014

The data sharing revolution has begun

Beginning from 1st of March all papers published in journals run by +Public Library of Science (PLoS) will have to publicly share data. This includes PLoS ONE -  currently the biggest (in terms of volume) academic journal in the world. But is it a big deal? Many leading journal such as Nature, Science, and PNAS for a long time have been requiring their authors to provide data to fellow scientists upon request. Is there a difference between depositing data in a public repository and making it available upon individual requests? Yes there is. There are dozens of excuses researchers can use to delay sharing of the data almost infinitely. Additionally without proper description (which public repositories will enforce) data is useless. I could go on and on how imperfect the "available upon request" solution is, but this video depicts it in a much better way: 

Making data available upon request looks good only on paper, but it just does not work in practice. Public sharing of data through domain specific repositories will make the data actually reusable. This of course will be problematic for some authors. People will reanalyze the data and possible challenge some findings. Some will use existing data in a novel way potentially "scooping" papers from people who acquired the data. These "problems" would not exist if data was not shared. By any means the new PloS policy is not going to be popular among some scientists. They will be afraid people will find mistakes in their work, they will have to put extra work into the description of data, and people will publish papers using their data without giving them coauthorship. This new policy will cause a decrease in submissions numbers. It's a bad business decision.

Luckily Public Library of Science is not a business. It's a not for profit organisation with fully transparent financing. They care more about science than their profits. That's why they are able to make unpopular decisions for the sake of the greater good. This cannot be always said about other commercial publishers. It does not mean they don't do anything to help science, but it has to fit within their financial goals. Take for example Nature's Scientific Data (of which I am a great supporter). It's a journal solely devoted to publishing data papers. The journal is open access, but you have to pay to publish. It's a great idea and it will make more data publicly available. Nonetheless it is also a great business endeavor for Nature Publishing Group (NPG). They manage to monetize their greatest asset - their brand - so scientists can say they have published in Nature <cough>Scientific Data</cough>. NPG will still make a lot of money on this journal, and there is nothing wrong with it. It's one of those occasions where financial goals overlapped with the needs of science. PLoS, however does not have to prioritize it's financial goals and is making difficult decisions that will benefit science in the long term. Let's hope other non commercial journals such as PNAS (run by United States National Academy of Science) and eLife (Max Planck Society, Wellcome Trust, and Howard Hughes Medical Institute) will follow this trend.

PLoS new data sharing policy mean some changes for scientists planning to publish in PloS. Apart from the need to prepare your data for submission to data repositories scientists working with human subjects need to make sure their participants agree for their data to be shared. This should be done during the informed consent and is independent of anonymization of the data. When phrased correctly such "data sharing" clause should not discourage subjects. Quite the opposite - subjects should be proud that their impact of their contribution will be maximized.

PS It is worth noting that other smaller journals such as +F1000Research also have a similar data sharing policy. Hopefully more will join soon!