Sunday, September 28, 2014

The (Crazy?) Dream of George Udny Yule: Is Non Experimental Causal Inference Feasible?



In 1899, George Udny Yule published a fundamental paper for economics and statistics. In this paper, he proposed to use the method of ordinary least squares to estimate the causal effect of policy variables on outcomes. Actually, the theory was outlined in a 1897 paper and the 1899 paper is the application of the theory of non-experimental causal inference to the estimation of the effect of anti-poverty policies on poverty. By so doing, Yule was setting up a method and a problem that would haunt applied economists for decades and that one hundred years later, is still unsolved.  Some of my own research is trying to revisit this problem. I like Yule's problem because it is an engineering problem and some of the most beautiful engineering results in economics stem from trying to solve this problem, or checking whether the solutions we have do work or not. I also like it because Yule's writing is crystal clear and one would be hard-pressed to try to come up with a cleanest formulation of both the issue and the solution. This is the first post of a series examining the history of non experimental causal inference that will end up with descriptions of some of my current work.

Before detailing the 1899 paper, I would like to quote the introduction to the 1897 paper that is extremely clear on what Yule is doing:



For Yule, the main problem of economists and social scientists is the inability to make experiments. This state of fact has long plagued economics and has lead to the emergence of econometrics. Starting in the 70s, and sometimes even before, experimental methods have started to be used in economics and are now developing at a very large speed.

If you cannot perform experiments, you have to rely on the data of daily experience, a.k.a. observational data. The most pressing problem with observational data is that of confounding factors: factors that simultaneoulsy affect the variable of interest and the outcome variable. Let's be more concrete with Yule's 1899 paper. In this paper, Yule examines whether the strictness of the application of antipoverty laws has an impact on poverty rates. In england at the time were two types of relief offered to the poor: In-relief and Out-relief (See Stephen Stigler's wonderful book for more detail). In-relief or "indoors relief" was given to able bodied individuals taken into workhouses where they would perform various tasks in returns for their subsistence. Out-relief or "out of doors relief" was assistance given in their own homes to persons judged unable to work, e.g. the old or the infirm. A larger proportion of Out-relief was considered to signal a lenient administration.It was clamied at Yule's time that the strictness of the administration was not linked with the poverty rate. In previous work, Yule had shown that this was false and that there was a positive correlation between poverty and the proportion of Out-relief. This seemed to favor a stricter application of the antipoverty laws.

But as Yule was well aware of, correlation is not causation. In his 1899 paper, Yule tackles the difficult task of trying to infer the causal effect of the change in the proportion of Out-relief on the change in poverty. Yule first lists 5 distinct groups of causes that might affect the changes in the rate of poverty:



He then goes on to explain his interest in the first group in a way that no modern economist would deny:

Finally, Yule describes extremely accurately the problem of confounding factors:


An increase in the proportion of the aged, in Yule's example, would simultaneously increase the proportion of Out-relief and the poverty rate. At the same time, the correlation between Out-relief and poverty would be positive but not causal: this is not because there is more Out-relief that there is more poverty, but because there are more elderly. Distinguishing between mere correlation and true causality is a very important task in this case, since we would like to know whether a policy and the way it is implemented can have an effect on poverty, and not whether being older increases poverty. Yule claims that he has an adequate method to solve the problem of confounding factors.

What is this method? Suppose that you could have data on all these variables, says Yule. If you were to use this data to estimate the parameter B in the following simple model, you would have a problem:

 
So what can we do?


Yule argues that by estimating the parameters of this simple multiple linear regression, he can measure the effect of the policy net of all the other influences, ceteris paribus. The regression and econometric adjustment have taken the place of the experiment. By using data and econometrics to control for the influence of the confounding factors, Yule claims that we can conduct causal inference almost as good as in an experiment. Hence, we call this type of approach non-experimental causal inference. The key question obviously is whether the chance of error is now much smaller than before, as Yule argues, or if some missing external influence still messes with the relationship of interest. This is what we will examine in the next posts. The methodological impact of Yule's work has been far-reaching. Today, most social scientists still use the linear model to estimate causal effects. Yule was well aware that the true relationships were not linear, but he argued that he could capture an average effect.

In practice, Yule estimates the parameters of equation (2) above by minimizing the sum of the squared residuals of this equation. He finds the following result:


An increase in Out-relief of 10 percent increases poverty by roughly 3 percent. Thus the positive raw correlation between Out-relief and poverty did not seem to mask a negative relationship. The size of the effect is much smaller though after accounting for the porportion of old and the population. Indeed, the raw correlation coefficient was .38, almost a third more. Thus, multiple regression enables one to pin down more efficently and precisely causal effects.

Yule's paper is extremely rich and any student could take inspiration from it. For example, Yule uses his estimates to decompose the total variation in poverty rate between factors due the policy and other factors:


We can wee that even if unkwown factors were responsible for the majority of the poverty decrease in rural and mixed areas, the increase in the severity of the relief policy (and the resulting decrease in Out-relief) have been a major factor in decreasing the urban poverty rate, at least between 1871 and 1881.

Yule's beautiful analysis withstand the test of time. The main question for its validity is whether the important factors that have been left out and that account for a majority of the variation in poverty are also correlated with Out-relief. If this is the case, they are still biasing Yule's estimates. This is the key question that nonexperimental causal inference has to answer. That is, we still do not know whether the method that Yule has devised is really competent to extract causal relationships from the data of daily experience.

I've also just become aware that Yule can be credited as being the inventor the the Difference In Difference Matching method that I study in my own work. More on this in subsequent posts.

No comments:

Post a Comment