Thursday, November 19, 2015

Highlights from the NeuroImage Data Sharing Issue

This week the first part of NeuroImage special issue on Data Sharing was published. It's a great achievement and I am glad to see that more focus is being put on sharing data in our field. However the issue is a mixed bag of papers that describe different types of resources. Some of my friends were confused by this heterogeneity, so I decided to highlight some of the resources presented in the issue.

The issue included papers about many data sharing platforms/databases (XNAT Central, LORIS, NIDB, LONI IDA, COINS, UMCD and NeuroVault) that are well known and covered by previous publications. Similarly some datasets (FBIRN and CBFBIRN) also have been previously covered in the literature. I understand that those have been included in the issue for completeness, but I will leave them out in this review.

The original art used in the NeuroImage cover.

Developmental and aging datasets

  • The issue includes an impressive developmental dataset consisting of 9498 subjects with medical, psychiatric, neurocognitive, and genomic data (ages 8-21). 1000 of those subjects include neuroimaging data (T1, PCASL, DWI, fMRI: resting, n-back and emotion ID). Data is available through DBGAp (you need to be a NIH approved PI to apply for access, application process can be lengthy and involve substantial amount of paperwork).
  • Another developmental dataset (PedsDTI) consisting of 274 subjects of age from 10 days (from date of birth) through 22 years includes high resolution DWI scans and reference T1 scans as well as precomputed derivatives and age matched atlases (DWI only). The imaging data is accompanied with a set of behavioral, hormonal and clinical measurements. The data is located on NDAR servers and you need to apply to gain access.
  • PING is yet another developmental database that includes data from 1493 children in ages 3 to 20 years old. Scanning protocol includes T1, T2, fMRI (rest). Behavioral measures include NIH Toolbox for Cognition and PhenX. Genotyping information is also includes (1000 SNPs). Due to IRB constraints only subset of this data is available (neither the paper nor the website says how much though). To gain access you will have to apply (only postdocs and higher can apply). All publications using this data require PING approval and co-authorship.
  • Age-lity projects includes 131 participants (ages 15-37). Imaging data includes T1, DWI, fMRI (resting state) and EEG (resting state). On the behavioral side there is only basic demographics information available. Data is easy to access through NITRC and requires only registration for the notification mailing list.

Clinical datasets

  • Parkinson's Disease Biomarkers Program provides data from 460 controls and 878 diagnosed cases (mostly Parkinson's). Data were acquired across many sites without normalization of the protocols so different subsets will have different measurements. You will need to apply to gain access.
  • Northwestern University Schizophrenia Data and Software Tool provides access to 171 schizophrenia Subjects 170 Controls 44 non-schizophrenic siblings 66 control siblings. MRI data includes only T1 scans, but is accompanied by cognitive and genotypic measurements. You need to request access to gain access to this resource.
  • PLORAS is a dataset of 750 stroke patient accompanied with 450 healthy controls. The data includes T1 scans, fMRI (two simple language protocols). You will need to apply to gain access to this resource.

Other datasets

  • OMEGA is a dataset of consisting of resting state MEG and T1 data collected from 97 participants. You will have to apply to gain access.
  • Open Science CBS Neuroimaging Repository is a dataset consisting of high resolution (7T) MP2RAGE (T1 maps) images from 28 healthy participants. The data is available publicly without the need for registration.
  • Cimbi is somehow heterogenous dataset of PET (mostly serotonin receptors) and T1 scans. The dataset consists of 402 healthy individuals and 206 patients with various coverage of different behavioral measures. You need to apply to gain access and you might have to put members of the Cimbi consortium as coauthors on your paper. 
  • BIL&GIN is a dataset consisting of 453 subjects (205 of which are left handed!) with T1, DWI, and fMRI (resting state) scans. Additionally 303 have 8 task fMRI scans (probing language, visuospatial, motor and arithmetic activities). You will need to apply to gain access to this resource and the authors will require co-authorship on your papers.

Non-human datasets

  • The Cambridge MRI database for animal models of Huntington disease provides T1 and DWI data from mice and sheep models of Huntington. The data is publicly available without any restriction.

Data aggregation

  • Global Alzheimer's Association Interactive Network facilitates finding and accessing multiple Alzheimer datasets.
  • SchizConnect joins together 4 different datasets with participants diagnosed with Schizophrenia.
  • ANIMA is a database of statistical maps from meta-analyses.
  • is a repository of intracranial EEG datasets. It is not clear from the paper what data is in the database and you cannot browse it without an account (I had problems registering a new account).
Summing up - it's nice to see that there is more data sharing going on in our field. I hope that NeuroImage will keep publishing more data papers in the future without the need for a special issue. Together with Mike Milham and Daniel Margulies we have written extensively about this form of data dissemination - have a look at our paper for more information (including guidelines for reviewers).

The thing that struck me the most when reviewing the contents of this special issue was how restrictive the access to most of the datasets is. Most of them require you to apply to gain access. The official explanation for this procedure is that the repositories make sure that you can be trusted with data obtained from human subjects (even though all of it is anonymized before sharing). In practice no one checks if you have appropriate facilities to keep the data safe (such as for example encrypted storage servers). On the other hand the access request approval system can be potentially abused by denying access to competing researchers and forcing beneficiaries to share co-authorships.

Many projects have been promoting unrestricted public access to data (Open Science CBS and Cambridge datasets from this review, OpenfMRI, Study Forrest, NeuroVault etc.) - this means no "requests for approval". There were no privacy disasters or lawsuits reported in the context of the fully open datasets mentioned above, which proves that unrestricted sharing can be done. At the same time removing the need for requesting access to data lowers the usage barriers and makes the whole process more transparent.

Monday, September 28, 2015

The unsung heroes of neuroinformatics

There are many fascinating and exciting developments in human cognitive and clinical neurosciences. We are constantly drawn to novel and groundbreaking discoveries. There is nothing wrong with this - I would even say that's part of the human nature. This kind of research is not, however, what I want to talk about today. This post is dedicated to people building tools that play a crucial role as a backbone of research - helping novel discoveries happen. They go beyond providing a proof of concept, publishing a paper and pointing to undocumented piece of code that works only in their labs. They provide maintenance, respond to user needs, and constantly update their tools fixing bugs and adding features. Here I will highlight two tools which in my personal (and very biased) opinion play an important role in supporting human neuroscience, and could do with some more appreciation.

Early years of Captain Neuroimaging


Anyone dealing with MRI data in Python must know about this library. Nibabel allows you to read and write a variety of different file formats used in neuroimaging (most importantly NIFTI). It hides the obscurity of those standards and provides easy to use objects and methods that let you efficiently access, modify and visualize neuroimaging data. It seems like nothing, but not having to deal with finding the right header format each time you want to read a file can be easily overlooked. I use nibabel all the time and I am very grateful for its existence!
Its a really good example of something that even though is not "novel" or sexy but is absolutely crucial and enables many researchers to get closer to understanding how the human brain works. Despite the fact that nibabel plays an essential role in python neuroimaging ecosystem it does not get enough credit. Nibabel is an open source project lead by +Matthew Brett who is tirelessly keeping it up to date with frequent release cycle.


Papaya is a relatively new project providing a modular, reusable javascript based NIFTI and DICOM viewer. Being able to read the data apply the right affine transformation and perform efficient interpolation is probably not the most fascinating work in the world, but it's incredibly important. Web based applications are the future and I am sure that Papaya will play a crucial role in bringing neuroimaging to the cloud. Papaya has already been used in projects such as NeuroVault, ANIMA, and NIFTI-drop. I wonder if those projects had to develop their own javascript viewer they would exist at all! Thanks to the work of of the Papaya team they can all reuse the same reliable and fast viewer.
Papaya is also an open source project, but it is mainly developed by Biomedical Image Analysis Division of the Research Imaging Institute at University of Texas San Antonio lead by Jack Lancaster. Their (sadly) unnamed developers are doing a great job by constantly improving the viewer and providing new features upon user request.

I love those two projects and I have written this post to tip my hat towards people spending their time making this software happen. It enabled me to do research over the years and build tools of my own. Behind my urge to compliment the unappreciated there is a bigger issue. Science is currently so obsessed with novelty and groundbreaking discovery there is no space for appreciating, crediting and most importantly funding those that provide essential support for this science to happen. If we want to have solid reproducible and robust findings we need to improve our tools and focus on maybe less fascinating, but nonetheless important work.

Sunday, September 13, 2015

Software workaround for corrupted RAM in OS X

Recently my computer has been acting up. Software started crashing, compilations failing, etc. Many small errors that I could not replicate. I wasn't too concerned, because I'm a natural tinkerer - I play with software, install many different additions and one of the side effects can be an unstable operating system. Eventually my system stopped booting - the partition table was corrupted. I had to wipe it and reinstall (which was a massive pain in the ass). I also tried to run some hardware checks just in case (the computer is over three years old), but the "Apple Hardware Test" was hanging each time I run (bad sign huh?). I'v eventually run memtest86 overnight and discovered that part of my RAM is corrupted. My computer is a Mac Book Pro Retina with expired warranty.

Normally I would buy new ram and install it myself, but the retina MBPs have RAM permanently soldered to the logic board. Instead of paying through the nose to get it fixed I researched software solutions. Linux users have a very handy kernel option that will tell the OS not to use a particular range of memory addresses - it's called memap. Situation on OS X is not so rosy. The only option available is to restrict memory up to the point where it's corrupted (but this way you lose everything after it). In my case I had around 60Mb range of corrupted memory in the 13th gigabyte. My only option was to restrict the system to use 12Gb. This is the procedure:

  1. Run memtest86 overnight to figure where your memory is corrupted.
  2. Estimate the lowest range of usable memory (in my case it was 12000Mb).
  3. Restrict the memory by setting a kernel flag: 
    sudo nvram boot-args="maxmem=12000"

This did the trick and made my laptop usable again!