Sunday, January 27, 2013

"The Shallows: What the Internet is Doing to Our Brains" - Book Review

I'm impatient by nature. When I'm on the interner, however, I'm like Twitchy on coffee (now imagine what happens when I'm browsing the web after having a cup of the dark brew). I don't stay on one website longer than five minutes, I struggle to finish watching a youtube video without checking something in the background, I catch myself mindlessly going to websites and asking myself "Why did I opened this tab?". It's bad. It's a constant crave for new information, new stimuli, something surprising, something funny.

But it didn't use to be this way. I used to be able to finish reading a book in one sitting (even though it did not happen often). Now I struggle with one academic paper without a break. Something changed in me about how hard it is for me to focus on one task for an extended period of time. The obvious thing to blame was the Internet. And I know I'm not the only one. Why services like rescuetime.com and keepmeout.com are so popular? Why when I mention pomodoro you think about productivity instead of pasta sauce?

I turned to research for an answers, but I was too lazy to do a proper literature review myself. I needed something to start. That's how I stumbled across "The Shallows: What the Internet is Doing to Our Brains" by Nicholas Carr. The book starts very slowly with history of neuroscience, computers, and books. Then when it get into the meat of the problem it suddenly ends. The introductory part is interesting (especially if you did not have much to do with those topics), but some seem like a space filler that is forcefully related to the real topic. Nonetheless, I have learned a handfull of interesting little facts. For example did you know that Freud, before he started his psychiatric practice was working in the field of molecular neuroanatomy? That was as close to neurophysiology as one could get those days! Another fun fact concerned Nietzsche: when his sight deteriorated he started using one of the first type writers and soon became a proficient touch typist (which apparently changed his style of writing).

But apart from those little gems there are other bigger ideas laid on the pages of "The Shallows". First of all this is not he first time a technological revolution has changed how we exchange and consume information. The invention and popularization of printed books lead to wide spread (and for obvious reasons undocumented) changes of our cognitive capabilities. People were able to acquire knowledge no only through personal interaction but from then on also by spending some quality time with a book. So maybe the skill of focusing on a read that I am missing now and that seems so natural wasn't so natural back in the days.

This leads to another interesting thought. Maybe evolution shaped us in a way that promotes multitasking and rapid switches of attention. After all, those seem like very useful and practical skills in an environment full of predators. So maybe the reason why the Internet has such a strong effect on us is because it taps into intrinsically natural set of skills. I know that comparing watching YouTube while checking your Facebook, to peeling fruits while looking out for lions seems far fetched, but there might be something to it.

Unfortunately the book provided only a handful of useful references to directly related studies (which I'll try to review later), but maybe this accurately describes the state of our knowledge. We still have not figured out how human vision or memory works, so maybe we should not expect much from research on a complex interaction between culture, technology and attention (although crazier things have been attempted using fMRI). Besides if it was a proper literature review not a pop sci book it would probably not get shortlisted for The Pulitzer Prize ;)

PS A friend reviewed this book for the Science magazine - please have a look. If you have access... Cannot wait for open access to become a standard!

PPS I'm done with the book and since I hate accumulating things (makes moving really hard) I'm happy to give it (mail it if needs be) to anyone who gets in touch with me first.

Sunday, January 20, 2013

On how we estimate value

A friend of mine once had a crush on a girl. The girl wasn't really interested, but valued him as a friend and was too polite to tell him he does not stand a chance (or maybe she just enjoyed the extra attention - who knows!). So the chase went on reaching pretty pathetic levels. At the same time another girl was basically throwing herself at him. Yet he wasn't paying attention to her and preferred to chase an illusion.

It's not a single case. Even if you have not experienced it yourself (on which I congratulate you - I'll try to touch on individual differences later on), you must've heard similar stories from your friends. Actually if you look for most popular dating advice you will learn that the trick is to let go and maintain the magical balance of not caring and being interested (or as John Green would say in his witty way: "dumpees should fight the clingy urge"). Speaking more general my friend was assessing a potential relationship mostly basing on the fact how hard it would be to get. Other more objective criteria (such as for example disturbing lack of sense of humour of the aforementioned femme fatale) did not matter that much.

But this does not happen just in the field of romance. We tend to apply a similar rule to jobs and positions. A highly competitive position will attract even more people. After all if so many people want it it must be good! Some people reapply for some jobs multiple times until they get it often discarding posts which would be a better fit. You could observe this in Google recruitment for engineering positions. Thousands of applicants competing for a few positions, at least seven stages of interviews, high rejections rates, and admitting openly they reject many good candidates - yet many peoples try up to five times (and probably a couple of years) to get a position. At the same time many of them would be equally happy at a startup company somewhere in California.

The same rule applies to physical possessions and services. A at the beginning of the month I accidentally took the wrong train and ended up spending the night at various train stations trying to get to my final destination. I started talking to a guy who also was in a similar situation. He run a barber shop in a small town. He told me what was the "secret" to the success of his business. There were many barber shops in this town. Most of them provided services on a similar or indistinguishable level. His shop, however, charged more than others providing an illusion of a premium service. Despite the higher price and similar quality they were always fully booked. And as in the examples above the clients used the price (more expensive == harder to get) to estimate the quality (value). Of course this trick is nothing new and is known in economics as premium pricing.

What I'm really interested in is what science has to say about this phenomenon. The Nobel prize winner Daniel Kahneman had several theories on how probabilities and absolute values of events translate to their utilities. His Prospect Theory gives a mathematical model explaining how we overweight extreme events (those that are highly unlikely) disregarding their true value (of course if I understand it correctly). The problem is that we don't know the true value (neither the probability) and we have to estimate it. Search for scientific work on this topic has proven to be hard mostly due to the fact that it must exist across fields under different names. There is at least one study showing that the "playing hard to get" tactic in romance is popular (surprise, surprise...). If you have any hints where should I look please let me know!

Last but not least I am not claiming that this heuristic is always bad. After all we don't have access to objective value (if something like this even exists). Estimating it based on the fact how much effort we need to put into trying to get it might be in many situations the best heuristic. Maybe a more expensive hairdresser is in fact better, maybe a position that is harder to get would be indeed more fulfilling, and the girl that my friend was chasing so relentlessly would be more likely o be a keeper.

Sunday, January 13, 2013

A more probabilistic view on multiple comparisons problem

Even though this blog is not going to be only about multiple comparisons (I could not think of another name), I decided to write about an old problem in slightly new way.

Multiple Comparisons

Whenever we are testing many hypotheses and are trying to figure out which of them are true we stumble upon so called Multiple Comparisons problem. This is especially evident in fields where we do tens of thousands tests (such as neuroimaging or genetics). So what is the big deal? Imagine that you divide the brain into a buch of regions (voxels) and for each of them you will perform some statistical test (checking for example if this part of the brain is involved in perception of kittens). Some of the regions will yield high statistical values (suggesting relation to kittens) and some will not. Lets try to show this with a simple simullation.

Let's assume for now that we will test 100 voxels and only 10 of them will be related to kittens. We will model both populations of voxels using Gaussian distributions. Noise distribution will be centred on zero opposed to signal centred on three.

In [2]:
import numpy as np
noise_voxels = np.random.normal(size=90, loc=0.0, scale=1.0)
signal_voxels = np.random.normal(size=10, loc=3.0, scale=1.0)

Lets plot this

In [6]:
import pylab as plt
figsize(10,6)
plt.hist([noise_voxels, signal_voxels], bins=20, label=['noise', 'kittens'], histtype='step', fill=True, stacked=True)
plt.legend()
Out[6]:
<matplotlib.legend.Legend at 0x105eb9790>

Even though noise is dominating in this example it would be very easy to draw a line distinguishing non-kitten related voxels from those that really do say "meow". What does it has to do with multiple comparisons will be clear in a moment.

Firstly let's show that this is just a simulation and depending on what mood my computer is in the results will be different. Here are four instances.

In [7]:
for i in range(4):
    plt.subplot(2,2,i)
    noise_voxels = np.random.normal(size=90, loc=0.0, scale=1.0)
    signal_voxels = np.random.normal(size=10, loc=3.0, scale=1.0)
    plt.hist([noise_voxels, signal_voxels], bins=20, label=['noise', 'kittens'], histtype='step', fill=True, stacked=True)
    plt.legend()

We can operate on the theoretical distributions instead of just the simulations. Since we are dealing with two Gaussians let's plot two Gaussians.

In [12]:
x_range = np.linspace(-3,6,100)
noise_samples = 90.0
signal_samples = 10.0
snr = signal_samples/noise_samples
from scipy.stats import norm
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
Out[12]:
<matplotlib.legend.Legend at 0x105b7ac50>

Now we can clearly see that the overlap between the two distributions is fairly small. Notice that there are two important parameters that can influence this: Signal to Noise Ration (SNR) and location of the signal distribution (also known as the effect size).

The multiple comparisons problem is all about... well multiple comparisons so in other words the number of tests we make. In our example this is equivalent to how many voxels we have). So let's show this by upsampling our data! Let's say we will be able to divide each old (big) voxel into eight small voxels.

In [13]:
noise_samples = 90.0*8
signal_samples = 10.0*8
snr = signal_samples/noise_samples
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
Out[13]:
<matplotlib.legend.Legend at 0x10756a150>

Surprisingly nothing has changed... But we have more voxels and did more comparisons (tested more hypotheses)! True, but becaue we only upsampled the data we just created identical copies of old values. SNR thus stayed the same. However, things change when we consider a more realistic situation than "10% of the brain selectively responds to young cats". Out of 60000 voxels (average head size, 4x4x4mm resolution, skull stripped) 100 will respond to kittens.

In [77]:
noise_samples = 60000.
signal_samples = 100.
snr = signal_samples/noise_samples
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
Out[77]:
<matplotlib.legend.Legend at 0x1093126d0>

Where are the cats gone?!? Let's have a closer look.

In [78]:
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
plt.xlim([0,6])
plt.ylim([0.00,0.01])
Out[78]:
(0.0, 0.01)

Haha! If we zoom in we will be able to find the signal distribution dwarfed by the noise. The problem is not the number of comparison we do but the fraction of those comparison that will be yield no signal. If you look carefully you will notice that the crossing point between the distributions increased with decreased SNR. This crossing is a potential candidate for a threshold. Let's try to find this point.

In [85]:
from scipy.optimize import fsolve
fsolve(lambda x : norm.pdf(x)*(1-snr) - norm.pdf(x, loc=3)*(snr),2.0)
Out[85]:
array([ 3.26443494])

The interesting aspect is the relation between this crossing point and SNR.

In [80]:
snrs = np.linspace (0.3,0.005, 1000)
crossing_points = []
for snr in snrs:
    crossing_point = fsolve(lambda x : norm.pdf(x)*(1-snr) - norm.pdf(x, loc=3)*(snr),2.0)
    crossing_points.append(crossing_point)

plt.plot(snrs, crossing_points)
plt.xlabel("SNR")
plt.ylabel("crossing point")
Out[80]:
<matplotlib.text.Text at 0x109629110>

As we can see it reises sharply for very small SNR values. Another popular option for picking a threshold is controling for False Discovery Rate. The fraction of false discoveries among all voxels labeled as significant. This is equivalent to the ratio of area under the blue curve right of the threshold to the sum of the areas under the blue and green curves right of the threshold. This areas are summarized by the Cumulative Distribution Functions (CDFs).

In [87]:
thr = 3.26
(1-norm.cdf(thr))*(1-snr)/((1-norm.cdf(thr))*(1-snr) + (1-norm.cdf(thr, loc=3))*snr)
Out[87]:
0.45640041955468291

Another important value is the percentage of missed voxels.

In [89]:
norm.cdf(thr, loc=3)
Out[89]:
0.6025681132017604

As mentioned before a popular option in dealing with Multiple Comparisons is to keep the FDR at a certain level (usually 0.05). Let's see what happens to percentage of missed voxels if we do this at different SNRs.

In [83]:
missed_voxels = []
fdr_thresholds = []
for snr in snrs:
    fdr_thr = fsolve(lambda x : (1-norm.cdf(x))*(1-snr)/((1-norm.cdf(x))*(1-snr) + (1-norm.cdf(x, loc=3))*snr)-0.05,2.0)
    missed_voxels.append(norm.cdf(fdr_thr, loc=3))
    fdr_thresholds.append(fdr_thr)

plt.plot(snrs, missed_voxels)
plt.xlabel("SNR")
plt.ylabel("percentage of missed voxels")
plt.figure()
plt.plot(snrs, fdr_thresholds)
plt.xlabel("SNR")
plt.ylabel("FDR corrected threshold")
Out[83]:
<matplotlib.text.Text at 0x10966df90>

From this plot we can see that when we decrease SNR, even though we control for FDR we are missing a lot of voxels. For extremely low SNR and small absolute number of signal voxels chances of finding anything are very slim.

Take home message

In this inaugral post I was trying to show multiple comparison problem in a slightly different view. I hope that from those simple simulations it will be clear that the problem is not really about the number of tested hypotheses, but the ration between noise and signal. Next week I'll try to write about something more light hearted :)