Skip to main content

The glass box design philosophy

There is an interesting paradox in context of developing data analysis software. On one side, there are clear benefits of designing tools that are easy to use, robust and require as little manual intervention or user expertise as possible. Such design philosophy allows more users to take advantage of the tools and apply them automatically to large heterogeneous datasets. On the other side, blindly applying tools that are not fully understood or do not provide useful information on whether the input data meets their assumptions can raise serious concerns. Developers take not only great pride in the quality of their software but also feel responsible for how the software is being used. Unexperienced users can misuse a “black box” tool and obtain misleading results. Whether we like it or not, such situations can lead to bad reputation misattributed to the tool itself.

Ease of use seems to be at odds with avoiding misuse. Extending your user base to less experienced users can lead to mistakes. Is there a way of designing “black box” tools that can minimize misuses? I believe there is - I call it the glass box philosophy.

The glass box principles

Write educational documentation

The documentation for a data analysis tool should aspire not only to describe how to run the tool but also to explain the theory and assumptions behind the analysis. In this, it should resemble more of an academic handbook than instruction manual.

It does not mean that developers must come up with their original content if such exists already. The purpose of educational documentation is that if the user wants to understand how a tool works they can learn about it from documentation in a programming language agnostic way.

Verify or visualize assumptions

Most data analysis tools will consist of several steps each with their distinctive assumptions. If a step fails to produce expected results (in a silent way) it could have catastrophic consequences down the road. Unfortunately, in many cases, it is very hard or even impossible to programmatically verify if the results of a particular step meet quality requirements of the subsequent step. Thus, it is important that data analysis software provides an option to verify or visualize those assumptions. Such reporting capabilities might not be necessarily taken advantage by the user directly but can provide an extra layer of transparency. For example, more experienced users or reviewers of a paper describing the results could audit the reports generated by data analysis.

Guide dissemination of the results

Obviously, the purpose of running a data analysis tool is to learn something about the data from the results. In clear majority of the cases, users will share the findings with people who were not involved in the analysis and might not know the details of the tool. Whether it is an internal report or an academic paper the user that runs the tool and obtained the result bears the responsibility of explaining to others what the analysis entailed. Here the tool developers can also help by providing boilerplate language summarizing the inner workings of the tool (referencing relevant external materials when possible). Peers of the user who performed the analysis will appreciate such feature because it allows them to understand better what exactly happened to the data. In case of robust tools that use heuristics to adapt to the input data, the boilerplate language should also automatically adapt to accurately describe the analysis path that was taken.

FMRIPREP – an example of a glass box application

To better understand how a glass box application should look like let’s have a look at an example. FMRIPREP is an MR data preprocessing tool that takes whatever comes out of a magnetic resonance scanner and prepares it for higher level analysis. It was designed to adapt to a range of different scan types and use heuristics to provide quality results on a data produces by different scanners. The robustness and ease of use make it appealing to use, but also susceptible to misuse. So how does FMRIPREP implements the glass box principles?

Educational documentation

Documentation provided by the developers of FMRIPREP goes beyond the instruction how to use it. It includes a detailed explanation of the data processing workflow – together with figures and references to relevant literature. This rich documentation allows interested users to understand what happens to their data. The documentation does not rely on any knowledge of Python (which is the language FMRIPREP is written in).


Preprocessing performed by FMRIPREP consist of many interdependent steps. Some of them cannot be validated in an automatic way which leads to a need for visual reports. For every processed piece of data, FMRIPREP produces an HTML report that includes figures and animations designed to highlight different data processing steps. Those reports enable users to quickly verify the validity of individual steps without the need to write any custom code or open intermediate results using specialized software.

Citation boilerplate

FMRIPREP is targeted for research use, and thus its uses will most likely lead to scientific publications. One cannot assume that readers of those publications will be familiar with FMRIPREP, so there is a need to provide an abbreviated description of processing performed by FMRIPREP. The documentation website provides such boilerplate text ready to be reused in publications that used FMRIPREP. Because FMRIPREP runs the slightly different type of processing depending on the inputs, the boilerplate text can be easily adapted via JavaScript controls listing different input options.


I hope I made a convincing argument for building robust, easy to use software that also excels in transparency. It is worth noting that the extra steps that need to be taken to turn a black box analysis tool into a glass box analysis tool require extra effort. After all guides for interpreting results and the educational documentation will not write itself and code for reporting tools will not appear out of anywhere. Nonetheless, I do feel that the glass box philosophy is worth pursuing and may reduce the amount of user support necessary for the analysis tool. Furthermore, some of the addition necessary to turn your app into a glass box (for example documentation) could be contributed by users themselves. This is a great opportunity to grow your open source contributor network.

PS I by no means invented the term “glass box” – it has been used previously (for example in the context of software testing). However, because this term fits so well with these design principles I decided to highjack it.


  1. I wish to show thanks to you just for bailing me out of this particular trouble. As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.chartered accountant firms in dubai

  2. I feel this is among the such a lot vital info for meCashflowbest

  3. Thanks for sharing this, I actually appreciate you taking the time to share with everybody.
    Best Institute For Data Science In Hyderabad

  4. It is nice thing! Looks cool. Buy facebook likes for it from this page

  5. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
    data science training

  6. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..
    data science course in delhi

  7. As always your articles do inspire me. Every single detail you have posted was great.
    certification of data science

  8. Hello there to everyone, here everybody is sharing such information, so it's fussy to see this webpage, and I used to visit this blog day by day
    data science course in delhi

  9. Enjoyed reading your blog. Please check my latest post on the professional accountant in Dubai and let me know what you think.

  10. Very awesome!!! When I searched for this I found this website at the top of all blogs in search engines.
    Data Science Training in Hyderabad

  11. I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
    data science course fees in bangalore

  12. great article!! sharing these type of articles is the nice one and i hope you will share an article on data science.By giving a institute like is one the best institute for doing certified courses
    data science certification

  13. nice blog!! i hope you will share a blog on Data Science.
    data science training in pune

  14. Interesting article. AMCA is a leading audit firm in Dubai, UAE. We offer a variety of auditing & accounting services in Dubai. We are FTA approved tax agency.

  15. This article is amazing. It helped me a lot. keep up the good work. data science institute in delhi/ncr


Post a Comment