Skip to main content

To pin or not to pin dependencies: reproducible vs. reusable software


We recently had a very interesting conversation in our lab about how to describe software dependencies (libraries one needs to install) for a software project in the context of research. One camp was proposing explicitly listing which version of a dependency is required (a scheme also referred to "pinning") and the other camp was more in favor of either not specifying version at all or specifying the minimal required version. Luckily both camps agreed on the importance of specifying dependencies, but what's the big deal about pinning vs not pinning?

Advantages of pinning dependencies

When you pin a dependency (for example by saying "numpy==1.1.3") you explicitly point to a version of a library that a) you know works with your script b) was used to generate the result you present in your paper. This is very useful for people trying to replicate your results using your code as well as yourself attempting to revisit a project that was put aside for a while (for example when reviewers come back after 3 months). By knowing which versions of libraries worked last time and generated the results in question one can quickly run and modify the code.

Advantages of not pinning dependencies

On the other side when you opt not to pin dependencies (for example by specifying "numpy" or "numpy>=1.1.3") you provide fewer constraints on your software and make it easier to incorporate into an existing software stack. Let's say you specified "numpy==1.1.3" in your setup.py and your friend Jack who has numpy 1.1.4 installed tries to install your package. They will have to downgrade numpy to meet the package dependencies or create a new Python environment just to use your package. However, if you specify "numpy>=1.1.3" Jack will be able to install your package without problems.

When to pin dependencies


  • If you are not developing a reusable package, but distributing a set of scripts to perform a particular analysis (for example to accompany a manuscript). This will increase the reproducibility of your work. Additionally, since you are not building a package, the dependencies will be only specified in requirements.txt or a Dockerfile allowing people to ignore them if they wish so. It's worth noting that you do not need to create an installable package (with setup.py) in such scenario (see for example this repository).
  • In Dockerfiles. This way each time someone builds the image using this Dockerfile they will get the same result.
  • If you are developing a reusable package, but only for specifying testing environments for continuous integration. If you don't pin dependencies a release of a broken (or backward incompatible) library could break your tests. This can happen at a random moment causing Pull Requests to fail randomly and leaving contributors perplexed what is wrong with their code.

When not to pin dependencies


  • Inside setup.py, when developing a reusable package (library, framework, tool etc.). You want your package to be easy to integrate with other people software stack and thus you should only specify minimally required dependency versions and only if you know that your package will not work with older versions.

At the end of the day, this is a good example of a difference between reusability and reproducibility. Reusable software is much more valuable - it can be integrated into many different software stacks and become a building block of other tools. It is also hard to make and maintain - for once you need to validate that it runs with all the versions of ever updated dependencies. Mere reproducibility is much easier and still has its place in a context of one-off analyses to create results for a scientific paper.

Comments

  1. In fact on some of the university websites you have to be going through one of the college courses to gain access to the main web pages. cursos de ti online

    ReplyDelete
  2. IEEE Final Year projects Project Centers in India are consistently sought after. Final Year Students Projects take a shot at them to improve their aptitudes, while specialists like the enjoyment in interfering with innovation. For experts, it's an alternate ball game through and through. Smaller than expected IEEE Final Year project centers ground for all fragments of CSE & IT engineers hoping to assemble. Final Year Projects for CSE It gives you tips and rules that is progressively critical to consider while choosing any final year project point.

    JavaScript Online Training in India

    JavaScript Training in India

    The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training

    ReplyDelete

Post a Comment