## Sunday, January 27, 2013

### "The Shallows: What the Internet is Doing to Our Brains" - Book Review

## Sunday, January 20, 2013

### On how we estimate value

It's not a single case. Even if you have not experienced it yourself (on which I congratulate you - I'll try to touch on individual differences later on), you must've heard similar stories from your friends. Actually if you look for most popular dating advice you will learn that the trick is to let go and maintain the magical balance of not caring and being interested (or as John Green would say in his witty way: "dumpees should fight the clingy urge"). Speaking more general my friend was assessing a potential relationship mostly basing on the fact how hard it would be to get. Other more objective criteria (such as for example disturbing lack of sense of humour of the aforementioned femme fatale) did not matter that much.

But this does not happen just in the field of romance. We tend to apply a similar rule to jobs and positions. A highly competitive position will attract even more people. After all if so many people want it it must be good! Some people reapply for some jobs multiple times until they get it often discarding posts which would be a better fit. You could observe this in Google recruitment for engineering positions. Thousands of applicants competing for a few positions, at least seven stages of interviews, high rejections rates, and admitting openly they reject many good candidates - yet many peoples try up to five times (and probably a couple of years) to get a position. At the same time many of them would be equally happy at a startup company somewhere in California.

The same rule applies to physical possessions and services. A at the beginning of the month I accidentally took the wrong train and ended up spending the night at various train stations trying to get to my final destination. I started talking to a guy who also was in a similar situation. He run a barber shop in a small town. He told me what was the "secret" to the success of his business. There were many barber shops in this town. Most of them provided services on a similar or indistinguishable level. His shop, however, charged more than others providing an illusion of a premium service. Despite the higher price and similar quality they were always fully booked. And as in the examples above the clients used the price (more expensive == harder to get) to estimate the quality (value). Of course this trick is nothing new and is known in economics as premium pricing.

What I'm really interested in is what science has to say about this phenomenon. The Nobel prize winner Daniel Kahneman had several theories on how probabilities and absolute values of events translate to their utilities. His Prospect Theory gives a mathematical model explaining how we overweight extreme events (those that are highly unlikely) disregarding their true value (of course if I understand it correctly). The problem is that we don't know the true value (neither the probability) and we have to estimate it. Search for scientific work on this topic has proven to be hard mostly due to the fact that it must exist across fields under different names. There is at least one study showing that the "playing hard to get" tactic in romance is popular (surprise, surprise...). If you have any hints where should I look please let me know!

Last but not least I am not claiming that this heuristic is always bad. After all we don't have access to objective value (if something like this even exists). Estimating it based on the fact how much effort we need to put into trying to get it might be in many situations the best heuristic. Maybe a more expensive hairdresser is in fact better, maybe a position that is harder to get would be indeed more fulfilling, and the girl that my friend was chasing so relentlessly would be more likely o be a keeper.

## Sunday, January 13, 2013

### A more probabilistic view on multiple comparisons problem

Even though this blog is not going to be only about multiple comparisons (I could not think of another name), I decided to write about an old problem in slightly new way.

# Multiple Comparisons

Whenever we are testing many hypotheses and are trying to figure out which of them are true we stumble upon so called Multiple Comparisons problem. This is especially evident in fields where we do tens of thousands tests (such as neuroimaging or genetics). So what is the big deal? Imagine that you divide the brain into a buch of regions (voxels) and for each of them you will perform some statistical test (checking for example if this part of the brain is involved in perception of kittens). Some of the regions will yield high statistical values (suggesting relation to kittens) and some will not. Lets try to show this with a simple simullation.

Let's assume for now that we will test 100 voxels and only 10 of them will be related to kittens. We will model both populations of voxels using Gaussian distributions. Noise distribution will be centred on zero opposed to signal centred on three.

```
import numpy as np
noise_voxels = np.random.normal(size=90, loc=0.0, scale=1.0)
signal_voxels = np.random.normal(size=10, loc=3.0, scale=1.0)
```

Lets plot this

```
import pylab as plt
figsize(10,6)
plt.hist([noise_voxels, signal_voxels], bins=20, label=['noise', 'kittens'], histtype='step', fill=True, stacked=True)
plt.legend()
```

<matplotlib.legend.Legend at 0x105eb9790>

Even though noise is dominating in this example it would be very easy to draw a line distinguishing non-kitten related voxels from those that really do say "meow". What does it has to do with multiple comparisons will be clear in a moment.

Firstly let's show that this is just a simulation and depending on what mood my computer is in the results will be different. Here are four instances.

```
for i in range(4):
plt.subplot(2,2,i)
noise_voxels = np.random.normal(size=90, loc=0.0, scale=1.0)
signal_voxels = np.random.normal(size=10, loc=3.0, scale=1.0)
plt.hist([noise_voxels, signal_voxels], bins=20, label=['noise', 'kittens'], histtype='step', fill=True, stacked=True)
plt.legend()
```

We can operate on the theoretical distributions instead of just the simulations. Since we are dealing with two Gaussians let's plot two Gaussians.

```
x_range = np.linspace(-3,6,100)
noise_samples = 90.0
signal_samples = 10.0
snr = signal_samples/noise_samples
from scipy.stats import norm
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
```

<matplotlib.legend.Legend at 0x105b7ac50>

Now we can clearly see that the overlap between the two distributions is fairly small. Notice that there are two important parameters that can influence this: Signal to Noise Ration (SNR) and location of the signal distribution (also known as the effect size).

The multiple comparisons problem is all about... well multiple comparisons so in other words the number of tests we make. In our example this is equivalent to how many voxels we have). So let's show this by upsampling our data! Let's say we will be able to divide each old (big) voxel into eight small voxels.

```
noise_samples = 90.0*8
signal_samples = 10.0*8
snr = signal_samples/noise_samples
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
```

<matplotlib.legend.Legend at 0x10756a150>

Surprisingly nothing has changed... But we have more voxels and did more comparisons (tested more hypotheses)! True, but becaue we only upsampled the data we just created identical copies of old values. SNR thus stayed the same. However, things change when we consider a more realistic situation than "10% of the brain selectively responds to young cats". Out of 60000 voxels (average head size, 4x4x4mm resolution, skull stripped) 100 will respond to kittens.

```
noise_samples = 60000.
signal_samples = 100.
snr = signal_samples/noise_samples
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
```

<matplotlib.legend.Legend at 0x1093126d0>

Where are the cats gone?!? Let's have a closer look.

```
plt.plot(x_range, norm.pdf(x_range)*(1-snr), 'b', label="noise")
plt.plot(x_range, norm.pdf(x_range,loc=3)*(snr), 'g', label="kittens")
plt.legend()
plt.xlim([0,6])
plt.ylim([0.00,0.01])
```

(0.0, 0.01)

Haha! If we zoom in we will be able to find the signal distribution dwarfed by the noise. The problem is not the number of comparison we do but the fraction of those comparison that will be yield no signal. If you look carefully you will notice that the crossing point between the distributions increased with decreased SNR. This crossing is a potential candidate for a threshold. Let's try to find this point.

```
from scipy.optimize import fsolve
fsolve(lambda x : norm.pdf(x)*(1-snr) - norm.pdf(x, loc=3)*(snr),2.0)
```

array([ 3.26443494])

The interesting aspect is the relation between this crossing point and SNR.

```
snrs = np.linspace (0.3,0.005, 1000)
crossing_points = []
for snr in snrs:
crossing_point = fsolve(lambda x : norm.pdf(x)*(1-snr) - norm.pdf(x, loc=3)*(snr),2.0)
crossing_points.append(crossing_point)
plt.plot(snrs, crossing_points)
plt.xlabel("SNR")
plt.ylabel("crossing point")
```

<matplotlib.text.Text at 0x109629110>

As we can see it reises sharply for very small SNR values. Another popular option for picking a threshold is controling for False Discovery Rate. The fraction of false discoveries among all voxels labeled as significant. This is equivalent to the ratio of area under the blue curve right of the threshold to the sum of the areas under the blue and green curves right of the threshold. This areas are summarized by the Cumulative Distribution Functions (CDFs).

```
thr = 3.26
(1-norm.cdf(thr))*(1-snr)/((1-norm.cdf(thr))*(1-snr) + (1-norm.cdf(thr, loc=3))*snr)
```

0.45640041955468291

Another important value is the percentage of missed voxels.

```
norm.cdf(thr, loc=3)
```

0.6025681132017604

As mentioned before a popular option in dealing with Multiple Comparisons is to keep the FDR at a certain level (usually 0.05). Let's see what happens to percentage of missed voxels if we do this at different SNRs.

```
missed_voxels = []
fdr_thresholds = []
for snr in snrs:
fdr_thr = fsolve(lambda x : (1-norm.cdf(x))*(1-snr)/((1-norm.cdf(x))*(1-snr) + (1-norm.cdf(x, loc=3))*snr)-0.05,2.0)
missed_voxels.append(norm.cdf(fdr_thr, loc=3))
fdr_thresholds.append(fdr_thr)
plt.plot(snrs, missed_voxels)
plt.xlabel("SNR")
plt.ylabel("percentage of missed voxels")
plt.figure()
plt.plot(snrs, fdr_thresholds)
plt.xlabel("SNR")
plt.ylabel("FDR corrected threshold")
```

<matplotlib.text.Text at 0x10966df90>

From this plot we can see that when we decrease SNR, even though we control for FDR we are missing a lot of voxels. For extremely low SNR and small absolute number of signal voxels chances of finding anything are very slim.

# Take home message

In this inaugral post I was trying to show multiple comparison problem in a slightly different view. I hope that from those simple simulations it will be clear that the problem is not really about the number of tested hypotheses, but the ration between noise and signal. Next week I'll try to write about something more light hearted :)