Sunday, September 28, 2014

The (Crazy?) Dream of George Udny Yule: Is Non Experimental Causal Inference Feasible?



In 1899, George Udny Yule published a fundamental paper for economics and statistics. In this paper, he proposed to use the method of ordinary least squares to estimate the causal effect of policy variables on outcomes. Actually, the theory was outlined in a 1897 paper and the 1899 paper is the application of the theory of non-experimental causal inference to the estimation of the effect of anti-poverty policies on poverty. By so doing, Yule was setting up a method and a problem that would haunt applied economists for decades and that one hundred years later, is still unsolved.  Some of my own research is trying to revisit this problem. I like Yule's problem because it is an engineering problem and some of the most beautiful engineering results in economics stem from trying to solve this problem, or checking whether the solutions we have do work or not. I also like it because Yule's writing is crystal clear and one would be hard-pressed to try to come up with a cleanest formulation of both the issue and the solution. This is the first post of a series examining the history of non experimental causal inference that will end up with descriptions of some of my current work.

Before detailing the 1899 paper, I would like to quote the introduction to the 1897 paper that is extremely clear on what Yule is doing:



For Yule, the main problem of economists and social scientists is the inability to make experiments. This state of fact has long plagued economics and has lead to the emergence of econometrics. Starting in the 70s, and sometimes even before, experimental methods have started to be used in economics and are now developing at a very large speed.

If you cannot perform experiments, you have to rely on the data of daily experience, a.k.a. observational data. The most pressing problem with observational data is that of confounding factors: factors that simultaneoulsy affect the variable of interest and the outcome variable. Let's be more concrete with Yule's 1899 paper. In this paper, Yule examines whether the strictness of the application of antipoverty laws has an impact on poverty rates. In england at the time were two types of relief offered to the poor: In-relief and Out-relief (See Stephen Stigler's wonderful book for more detail). In-relief or "indoors relief" was given to able bodied individuals taken into workhouses where they would perform various tasks in returns for their subsistence. Out-relief or "out of doors relief" was assistance given in their own homes to persons judged unable to work, e.g. the old or the infirm. A larger proportion of Out-relief was considered to signal a lenient administration.It was clamied at Yule's time that the strictness of the administration was not linked with the poverty rate. In previous work, Yule had shown that this was false and that there was a positive correlation between poverty and the proportion of Out-relief. This seemed to favor a stricter application of the antipoverty laws.

But as Yule was well aware of, correlation is not causation. In his 1899 paper, Yule tackles the difficult task of trying to infer the causal effect of the change in the proportion of Out-relief on the change in poverty. Yule first lists 5 distinct groups of causes that might affect the changes in the rate of poverty:



He then goes on to explain his interest in the first group in a way that no modern economist would deny:

Finally, Yule describes extremely accurately the problem of confounding factors:


An increase in the proportion of the aged, in Yule's example, would simultaneously increase the proportion of Out-relief and the poverty rate. At the same time, the correlation between Out-relief and poverty would be positive but not causal: this is not because there is more Out-relief that there is more poverty, but because there are more elderly. Distinguishing between mere correlation and true causality is a very important task in this case, since we would like to know whether a policy and the way it is implemented can have an effect on poverty, and not whether being older increases poverty. Yule claims that he has an adequate method to solve the problem of confounding factors.

What is this method? Suppose that you could have data on all these variables, says Yule. If you were to use this data to estimate the parameter B in the following simple model, you would have a problem:

 
So what can we do?


Yule argues that by estimating the parameters of this simple multiple linear regression, he can measure the effect of the policy net of all the other influences, ceteris paribus. The regression and econometric adjustment have taken the place of the experiment. By using data and econometrics to control for the influence of the confounding factors, Yule claims that we can conduct causal inference almost as good as in an experiment. Hence, we call this type of approach non-experimental causal inference. The key question obviously is whether the chance of error is now much smaller than before, as Yule argues, or if some missing external influence still messes with the relationship of interest. This is what we will examine in the next posts. The methodological impact of Yule's work has been far-reaching. Today, most social scientists still use the linear model to estimate causal effects. Yule was well aware that the true relationships were not linear, but he argued that he could capture an average effect.

In practice, Yule estimates the parameters of equation (2) above by minimizing the sum of the squared residuals of this equation. He finds the following result:


An increase in Out-relief of 10 percent increases poverty by roughly 3 percent. Thus the positive raw correlation between Out-relief and poverty did not seem to mask a negative relationship. The size of the effect is much smaller though after accounting for the porportion of old and the population. Indeed, the raw correlation coefficient was .38, almost a third more. Thus, multiple regression enables one to pin down more efficently and precisely causal effects.

Yule's paper is extremely rich and any student could take inspiration from it. For example, Yule uses his estimates to decompose the total variation in poverty rate between factors due the policy and other factors:


We can wee that even if unkwown factors were responsible for the majority of the poverty decrease in rural and mixed areas, the increase in the severity of the relief policy (and the resulting decrease in Out-relief) have been a major factor in decreasing the urban poverty rate, at least between 1871 and 1881.

Yule's beautiful analysis withstand the test of time. The main question for its validity is whether the important factors that have been left out and that account for a majority of the variation in poverty are also correlated with Out-relief. If this is the case, they are still biasing Yule's estimates. This is the key question that nonexperimental causal inference has to answer. That is, we still do not know whether the method that Yule has devised is really competent to extract causal relationships from the data of daily experience.

I've also just become aware that Yule can be credited as being the inventor the the Difference In Difference Matching method that I study in my own work. More on this in subsequent posts.

Monday, September 15, 2014

History of econometrics: special conference session at the EEA-ESEM meeting

One of my secret manias (and one that I am less ashamed to admit) is an interest for the history of econometrics. With such an inclination, the two successive sessions of the EEA-ESEM meeting on the history of econometric thought were obviously an absolute delight to me. Here, I am going to try to convey some of the excitement that I got out of this sessions by giving a quick summary of the papers presented. This post will also appear in the next issue of the TSE Mag.

In the first talk of the first session, John Aldrich detailed how the father of modern econometrics  Trygve Haavelmo, contributed to the understanding and formalization of the concept of causality. John especially emphasized the modernity of Haavelmo's views on this topic and how they have resurfaced in more recent literature. Marcel Boumans described the interaction between Milton Friedman and the members of the Cowles Commission, the founders of modern econometrics. Both interacted in the late forties in Chicago. Marcel gave a fascinating account of how much Friedman was involved into methodological debates at the time, that eventually yielded to the writing of his essays in positive economics. Friedman was at the time extremely critical of the strong theorisation of econometrics promoted but the then Commission president Tjalling Koopmans (Marcel quoted from an unpublished assessment by Friedman of the works of the Cowles commission). At some point, Friedman defied Koopmans and one of his boys, Lawrence Klein, to check whether the complex macro simultaneous equations model estimated on pre-war data could predict macro conditions in 1948. This was a dismal failure and yielded Klein to reassess his model. Marcel put this debate in perspective by recalling how it related to the views of early economists Alfred Marshall on one side and Léon Walras on the other. Till Dueppe presented a thorough account of a trip taken by Tjalling Koopmans to the Soviet Union in 1976. Koopmans was really eager to discover if the linear programming methods developed by Leonid Kantorovich were applied in practice by the planning bureau of the Soviet Union. What he found was a very dispirited Kantorovitch, saying that they might have read about it, at least it was somewhere in published form. Koopmans was also somewhat disappointed by the quality of economics research that he found there. He was on the other hand extremely impressed to meet with some of the most outstanding mathematicians of the world.

The next session started with a minutely accurate detail of the first meeting of the econometrics society in 1931 in Lausanne by Olav Bjerkholt. What striked me the most was the impressive imprint that Ragnar Frish left on this meeting and on the early years of the econometric society. While François Divisia, the Vice President officially in charge of organizing the meeting was preoccupied by such urgent matters as choosing the suitable French translation of "econometrician," Frish used his spare time while recovering from a skiing accident to write a sketch of a programme and one of the three papers he presented at the conference. Frish also insisted that the conference opened with a presentation of some of the works of six of the founding fathers of economics (Marshall was not one of them). This echoes evidence given by Marcel Boumans that early Econometrica issues published handwritten notes by Walras, along with his correspondence with other economists. Duo Qin advocated the return of the use of the terms autonomy and confluence in modern econometrics textbooks. These terms have been coined by Frish to separate what we now call structural and reduced form relationships. These late notions were in fact derived from the earlier notions by Tjallling Koopmans. The great advantage of the term "autonomy" is that it puts emphasis on the fact that we are looking for invariant relationships that remain true when the other relations in the economy change. The Lucas critique is in a sense a mere application of the notion of autonomy, which Lucas acknowledged explicitly in his paper (see the reference to Marschak in footnote 3). Cléo Chassonery-Zaïgouche presented the evolution over the years of the empirical analysis of discrimination. She especially showed that the traditional econometric techniques developed in the 70s in the Cowles tradition are progressively replaced by experiments. This trend is not apparent in the number of papers published in top journals though, but is clear in the influence of the published papers: experiments gain much more citations. Finally Jan Höffler presented an exciting collaborative wiki project on replication in economics. With his colleagues, he has set up a list of the published papers in top journals with a link to the data sets and code. With his students, they are now trying to replicate the authors' findings. Being a collaborative project, anyone can report the results of his own replication exercise on the web page. An exciting exercise for students in econometrics.