## Wednesday, June 6, 2018

### What is Sampling Noise?

This post is the second in a series of six posts in which I am arguing against the use of p-values for reporting the results of statistical analysis. You can find a summary of my argument and links to the other posts in the first post of the series. In this post, I introduce the problem of sampling noise, the issue that p-values are trying to solve. It is only when I started to understand better what sampling noise really is that I started to hate p-values so much.

What is sampling noise? Let’s take the example of a Randomized Controlled Trial (RCT). In a RCT, we randomly allocate individuals to two groups: one that receives the treatment of interest (drug, job training program, ...) and one that does not. We interpret the difference in average outcomes between the treatment and control groups as the causal effect of the treatment. The common intuition behind a RCT is that treatment and control groups are identical in every respect except as to whether they receive the treatment. But actually, when you run a RCT, the treatment and control groups are identical only when their sizes are infinite. In real life applications, with finite sample sizes, treatment and control groups differ. Some confounding variables are distributed differently in the treatment and control samples and they bias the estimator of the treatment effect. Thanks to randomization, there is no systematic direction to this bias, and it is null on average over sample replications, a property that we call unbiasedness. But in a given sample, the very sample that you might have inherited and that you might be using, the size and direction of the bias are unknown. Knowing that it is zero on average is a poor consolation.  You’re not dealing in averages, you’re dealing with the sample that you have.

Here is an illustration from my class. In order to build this illustration, I generated 1000 random allocations to a treatment and control group for four different samples of increasing size taken from the same population (i.e. governed by the same model). For each random allocation, I computed the difference in average outcomes between treatment and controls. The histogram presents the distribution of these estimates. In red is the true effect. (The histogram actually presents the results of drawing a different sample at each replication on top of a different treatment allocation. Both graphs are extremely similar, I just happen to have this one readily available in a nice suitable format. The comparison with the graph obtained with keeping the same sample can be found in Lecture 0 of my course).
With a small sample size (N=100), sampling noise is large and estimates stemming from a given random allocation are extremely imprecise. To the point that almost a 1/4 of the estimates have the wrong sign. In my class, I formally define sampling noise y as the width of the 99% (or 95%) symmetric confidence interval around the true value. You can also use the standard deviation of this distribution as an estimate of sampling noise. Actually, with normal distributions, sampling noise = 5 (or 4 for the 95%CI) times the standard deviation. Here are the estimates of sampling noise for the examples above:
As you can see, sampling noise is large with small sample size and decreases as sample size increases (actually, it decreases with the square root of sample size). With a small sample size, sampling noise is large, precision is low, and a lot of values of the estimated effect might be due to sampling noise. With a small sample, we are not going to be able to rule out a lot of true values of the effect because noise is going to affect our estimates too much. With a very large sample size, noise is trivial and the order of magnitude of the true effect is much more clearly estimated.

So the question then becomes: what can we do when there is sampling noise? At least two things:
1. Make sampling noise as small as possible (the best approach)
2. Quantify the size of sampling noise (when you cannot do 1.).

Let me talk about quantifying sampling noise, because this is what p-values are about. Several of the most important tools in statistics enable you to compute an estimate of sampling noise using information from only one sample. Think about how beautiful this is: you can recover an estimate of an unobserved quantity defined over replications of samples from one unique sample! We have several ways to do that:

• Chebyshev’s inequality, that gives you an upper bound on sampling noise.
• The Central Limit Theorem (CLT), that approximates the sampling noise of an average by a normal distribution. When combined with other tools like Slutsky’s Theorem and the Delta Method, the CLT enables to approximate sampling noise for estimators that are combinations of sample averages.
• Resampling methods, that use the current sample as a population and mimic the sampling process from this pseudo population. There is the bootstrap, randomization inference, subsampling...
Here is an example from my class where you can see the true sampling noise (in red) along with its estimate from one sample obtained here using the CLT (in blue).
Pretty impressive, right?

Now that we understand what sampling noise is about, and how to estimate it, we will see in the next installment of the series, how p-values deal with sampling noise.