Skip to main content

Posts

To pin or not to pin dependencies: reproducible vs. reusable software

We recently had a very interesting conversation in our lab about how to describe software dependencies (libraries one needs to install) for a software project in the context of research. One camp was proposing explicitly listing which version of a dependency is required (a scheme also referred to "pinning") and the other camp was more in favor of either not specifying version at all or specifying the minimal required version. Luckily both camps agreed on the importance of specifying dependencies, but what's the big deal about pinning vs not pinning?

Advantages of pinning dependencies When you pin a dependency (for example by saying "numpy==1.1.3") you explicitly point to a version of a library that a) you know works with your script b) was used to generate the result you present in your paper. This is very useful for people trying to replicate your results using your code as well as yourself attempting to revisit a project that was put aside for a while (for e…

Forever free: building non-profit online services in a sustainable way

In the past decade, we have seen a big switch from client run software to online services. Some services such as scientific data repositories were a natural fit for a centralized online implementation (but one day we can see a distributed versions of those). Others, such as Science-as-a-Service platforms, were more convenient and scalable versions of the client/desktop based software. One thing is certain - online platforms, available 24/7 via a web browser have proven to be very convenient in a range of tasks such as communication, sharing data, and data processing. Non-profit sector (such as projects funded by scientific grants) has also entered this domain. There are countless examples where modern web technologies based on centralized services can benefit scientists and general public even if the service they provide is not part of a commercial operation. This is especially true due to increased trend to share data and materials in science. Those outputs need to be stored and made …

Sharing academic credit in an open source project

We live in truly wonderful times to develop software. Thanks to the growth of the Open Source movement and emergence of platforms such as GitHub, coding became something more than just an engineering task. Social interactions, bonds between developers, and guiding new contributors are sometimes as important as sheer technical acumen. A strong and healthy developer community revolving around a software tool or library is very important. It makes the tool more robust (tested in many more environments), sustainable (the progress does not depend on a single person), and feature rich (more developers == more features). Even though there exist some excellent guides on how to build a welcoming and thriving community they miss out on one aspect that is specific to software development performed by academics - academic credit. For those not familiar with how things run in academia a quick refresher: the main currency of science is papers (manuscripts) and the number of times they are referenced …

How to meet the new data sharing requirements of NIMH

National Institute of Mental Health (NIMH) have recently mandated uploading data collected from all of clinical trials sponsored by them to the NIMH Data Archive (NDA). Similar policies are not in place for many of their grant calls. This initiative differed from the previous attempts of NIH to make more data shared. In contrast to "data management plans" that have to be included in all NIH grants that historically remained unimplemented without any consequences to the grantees this new policy has teeth. Folks at NDA have access to all ongoing grants and are motivated to go after the researchers that are late with their data submission. Since there is nothing more scary than an angry grant officer it's worth taking this new policy seriously!

In this brief guide I'll describe how to prepare your neuroimaging data for the NDA submission with minimal fuss.


Minimal required data NDA requires each study to collect and share some small subset of values for all subjects and…

Highlights from the NeuroImage Data Sharing Issue

This week the first part of NeuroImage special issue on Data Sharing was published. It's a great achievement and I am glad to see that more focus is being put on sharing data in our field. However the issue is a mixed bag of papers that describe different types of resources. Some of my friends were confused by this heterogeneity, so I decided to highlight some of the resources presented in the issue.

The issue included papers about many data sharing platforms/databases (XNAT Central, LORIS, NIDB, LONI IDA, COINS, UMCD and NeuroVault) that are well known and covered by previous publications. Similarly some datasets (FBIRN and CBFBIRN) also have been previously covered in the literature. I understand that those have been included in the issue for completeness, but I will leave them out in this review.


Developmental and aging datasetsThe issue includes an impressive developmental dataset consisting of 9498 subjects with medical, psychiatric, neurocognitive, and genomic data (ages 8-2…

The unsung heroes of neuroinformatics

There are many fascinating and exciting developments in human cognitive and clinical neurosciences. We are constantly drawn to novel and groundbreaking discoveries. There is nothing wrong with this - I would even say that's part of the human nature. This kind of research is not, however, what I want to talk about today. This post is dedicated to people building tools that play a crucial role as a backbone of research - helping novel discoveries happen. They go beyond providing a proof of concept, publishing a paper and pointing to undocumented piece of code that works only in their labs. They provide maintenance, respond to user needs, and constantly update their tools fixing bugs and adding features. Here I will highlight two tools which in my personal (and very biased) opinion play an important role in supporting human neuroscience, and could do with some more appreciation.

nibabel Anyone dealing with MRI data in Python must know about this library. Nibabel allows you to read and…

Software workaround for corrupted RAM in OS X

Recently my computer has been acting up. Software started crashing, compilations failing, etc. Many small errors that I could not replicate. I wasn't too concerned, because I'm a natural tinkerer - I play with software, install many different additions and one of the side effects can be an unstable operating system. Eventually my system stopped booting - the partition table was corrupted. I had to wipe it and reinstall (which was a massive pain in the ass). I also tried to run some hardware checks just in case (the computer is over three years old), but the "Apple Hardware Test" was hanging each time I run (bad sign huh?). I'v eventually run memtest86 overnight and discovered that part of my RAM is corrupted. My computer is a Mac Book Pro Retina with expired warranty.


Normally I would buy new ram and install it myself, but the retina MBPs have RAM permanently soldered to the logic board. Instead of paying through the nose to get it fixed I researched software so…