An unprecedented crisis in social science, comparable to the 2008 financial crisis

Compensated contributions can be made to the econometrics economy circle, and measurement related can be

All the do files, micro-databases and various academic research related software of the econometric circle methodology are placed in the community, and can be directly taken out and run. Original: Econometric circle; Translator: Jiang Xiaolin London scholar, Econometric circle community Senior group of friends. The econometric circle continues to recruit translators to form a group of high-level English scholars.

Timeline of the reproducibility crisis:

Probably in 2011, the replicability crisis may not be a crisis yet. Here are some events that I think are important:

In the 1960-1970s, Paul Meehl believed that the standard paradigm of experimental psychology research was actually wrong. He believes that this paradigm is actually a series of long related experiments carried out by enthusiastic and smart researchers (for readers who are not critical, these experiments may seem to be a very good comprehensive research plan in the past) , Do not need too much refutation and corroboration, you can slowly find a path in the slender theoretical network. Psychologists all know this Paul Meehl, but most people ignore his warning. For example, Robert Rosentha once posted an awesome article about the "drawer problem". But in fact, you will still find that this is actually the same kind of small problem derived from another big problem.

1960s: Jacob Cohen studied statistical power and put forward a new idea, namely the importance of reasonable design and collection of data for a good psychology research. Some research groups have introduced Cohen's research methods and related terminology into practice, but they have avoided a crucial and important issue: the overestimation of the effectiveness of the real world.

In 1971, Tverskyand Kahneman wrote an article called Belief in the law of small numbers. This was their first study on persistent errors in human cognition. Early research mainly focused on researchers' misunderstandings of uncertainty and variability (especially but not limited to p-values and statistical significance), but soon they began another series of more general studies. They don't seem to realize the significance of their early research for practical research.

1980-1990s: The significance test of the null hypothesis has become more and more controversial in the field of psychology. Unfortunately, this is considered to be just a method problem rather than a research problem. I think this is actually saying: The research plan is ok, and all we need is to change the analysis method a bit!

In 2006, I first heard of Satoshi Kanazawa, a sociologist who published a series of papers with provocative remarks (for example, he said that engineers are prone to have sons and nurses are prone to daughters). However, almost every paper later found out There are many statistical errors. Of course I knew that statistical errors exist, but at that time I didn't realize that for this type of research, the extremely low signal-to-noise ratio meant that these studies were going to hang up.

In 2008, EdwardVul, Christine Harris, Piotr Winkielman, and Harold Pashler co-authored a controversial article titled "Voodoo Relationships in Social Neurology". They proposed that these technical problems not only appear in published papers. At the same time, these statistical issues are destroying the entire research field, and many significant research results will lose credibility.

Also in 2008, the blog Neuroskeptic began to criticize scientific hype. I don't know the Neuroskeptic blog was so awesome, but it does symbolize the shift of scientific blogs from traditional political topics to internal criticism.

In 2011, Joseph Simmons, Leif Nelson, and Uri Simonsohn collaborated to publish an article in the Journal of Psychological Science titled, "False-positive psychology, which introduced a very useful term: researcher freedom. Later they proposed another one. The new term is "p-value manipulation." In fact, researchers use its "degrees of freedom" to achieve (manipulate) statistical significance (freestyle significance).

In the same year, Simonohn published another article criticizing this dentist-named-Dennis article (the article claims that a person named Dennis is more likely to become a dentist in the future). This may not be an unusually important moment in the history of psychology, but it is For me, this is very important, because I will indeed accept the final conclusion of this article without thinking. However, I did not realize that there are such serious problems in empirical research.

2011: Daryl Bem published an article titled "Feeling the future: Experimentalevidence for anomalous retroactive influences on cognition and affect" in a top journal of psychology. Although not many people think that Bem has discovered ESP, everyone generally believes that his research is relatively reliable, so this is also considered a worrying point in psychological research. For example, the New York Times reported that the editor of the magazine, Charles Judd, who is also a psychologist at the University of Colorado, said that this article passed the regular review of the magazine. Four reviewers commented on the manuscript of the paper, and these reviewers are trustworthy people.

However, Bem’s article actually has a huge and significant contrast problem-but neither the editor of the magazine nor his four reviewers know how to pay attention to this issue. In 2011, we were not very good at considering this. Type of problem.

Up to this point in time, we found that many of the earlier articles had similar problems, which means that the flaws in these research methods will no longer be an independent problem, they will seriously damage the process of scientific research. Later there were some articles related to this issue, such as a paper by John Ioannidis in 2005: "Why are most of the published research wrong?" and a paper by Nicholas and James in 2007, which considered obesity It is contagious. Ioannidis' article is now a classic, but when it first came out, most people still didn't agree with it. Christakis and Fowler's paper was considered a great idea at the time, but now we don't seem to take it too seriously. In fact, what I mean is that although these things happened, we didn't take this issue too seriously.

So in 2011, everyone gradually discovered that something might be wrong, but it was not clear how big the problem was. Everyone (including myself) might not have discovered that the fatal contrast problem would appear in so many published publications. Under study. Or rather, it is the uncontrollable "researcher's degree of freedom" that has led to these "statistically significant" research results.　

2011: A lot of news about academic misconduct suddenly appeared. Two scholars were rejected by Tilburg and Harvard. These events brought everyone's attention to the RetractionWatch blog. I found out that, under normal circumstances, these researchers who are very confident in their assumptions often fail to provide effective explanations and proofs for their doubts.　

2012: GF published an article called: Good is a bit fake. The article sparked a series of controversies, suggesting that the repetitive statistical results may be caused by selective bias.　

During this period of time, I received a lot of scumbag articles, all of which used very weak data to reach very extreme conclusions. The term "psychological research" began to appear.　

As a result, there has been a series of "repeatability" campaigns, and some celebrities have failed in repeatability experiments. First of all, the unexpected research of Bem cannot be repeated. Although he himself stated that he successfully replicated his own experimental results, his meta-analysis was completely unsuccessful. Then came a series of other failed copies of psychological research.

At the same time, the very famous Proceedings of the National Academy of Sciences (PPNAS) began to publish some articles that were of poor quality but were very popular in the media. These articles were edited by a man named Susan Fiske from Princeton University.

And some things that happened one after another in the following two years.

It’s raining hard, it’s raining for a long time

The above is a very detailed timeline. For a long time, nothing happened. Until 2011, Daniel Kahneman also emphasized the outstanding contributions of these studies.　

However, all of a sudden, the whole world turned over.　

If you are already familiar with the traditional one, it may be distressing to start thinking about changes suddenly. This is a bit similar to if Fiske owns shares in a company that is about to fail, then she will inevitably stand up and become a blogger, although the analogy may not be perfect. But what Fiske can do is reduce losses, admit mistakes, and move on.　

So who is Susan Fiske anyway? Why does she think there is methodological terrorism around? Regarding the latter point, I am not quite sure, because she did not directly refer to a specific "terrorist" or specific "terrorist act". Her article did not give any evidence, but it did mention a few things.

I first knew about Susan because she was the editor of some scum articles on PPNAS (Proceedings of the National Academy of Sciences) we mentioned earlier. So in some cases, so, her judgment in the field of social sciences is a bit poor.

Or it can be said that she lived in 2016 but is still thinking about problems in the same way as 2006. 10 years ago, I might not pay much attention to papers on himmicanes and air rage. Under the influence of Simonsohn and others, I have become more cautious than before with even published papers. It took a lot of time for many of us to start standing where Meehl was already standing 15 years ago.

Fiske's own published papers are also problematic. Since I haven't read much of her paper, I won't comment too much on her research. Here are some of his thoughts on Susan’s paper that Nick Brown sent me.

He [Brown] read an article titled This Old Stereotype: The Pervasiveness and Persistence of the Elderly Stereotype by Amy JC Cuddy, Michael I. Norton, and Susan T. Fiske (Journal of Social Issues, 2005), but found There are many errors.

First of all, the main conclusion of the article is based on the t statistics of 5.03 and 11.14. Hmmm, however, after repeated calculations, I found that these two values should actually be 1.8 and 3.3. So one of her conclusions is not even statistically significant.

However, this is not the worst. It turns out that some of the numbers reported in her article simply cannot be correct. It may be that the author made a mistake in the calculation, for example, made a mistake in rounding. Although the rounding error does not sound like a big problem, it does provide a "degree of freedom" for researchers to "manipulate" the data in order to get the results they want.

There are more questions. In short, Cuddy, Norton, and Fiske made a lot of data errors-although the incident itself was not bad, they refused to reconsider the incident when it happened and was told to them. Their theories are so big that you can explain any result from any angle.

This is why the author claims that the results of modifying these errors will not change the final conclusion of the article-although it is absurd, it is indeed "reasonable." The absurdity is because the original conclusion was based on a statistically significant p-value, which no longer exists. "It is reasonable" because the conclusion of this article does not depend on any details-the only important thing is that as long as there is a p-value less than 0.05 somewhere, then they can publish whatever they want. .

When the author claims that these errors do not have much impact, you will find that in this project, perhaps the data itself is not important at all.

Why should I pay attention to these details? Is it just for malicious slander? Or is it because Fiske smashed the scientific reformers, so the reformers didn't give her a good look? neither. The problem is not that Fiske's data processing is wrong or that she is a bad journal editor, but that she is still using a paradigm that should no longer be used. This paradigm should not have been used as early as Meehl discussed it in the 1960s.

I don't mean that Fiske's no research is reproducible or that most or one-third of her research is not reproducible. I haven't done any research on these, I don't know. What I mean is that her method of doing research at least proves that she is using the paradigm 10 years ago. This paradigm was indeed a standard paradigm at the time, but now it is no longer. We can therefore understand her discomfort with contemporary society.

Fiske’s collaborators and her students seem to be using the same type of research paradigm, that is, they are not so rigorous in hypothesis testing, statistical results, and in the face of criticism.

Another point to emphasize is that statisticians are particularly important in this issue. If people like Fiske hate statistical methods, that's all ok. They can design experiments that are easier to understand. However, they did not. Their conclusion relies on the P value derived from the noise data. As long as the p value is less than 0.05, they can believe anything.

Mistakes can always breed recklessly. Once a researcher starts an error, more errors can follow. Once you don’t care about your numbers, anything can happen. I am reminded of a notorious paper by Richard Tol. It is no exaggeration to say that there are as many data points as there are in this paper, and there are as many errors as possible. It is not a joke to you! however! No matter how to modify these errors, he has not modified his conclusion! It's as if his conclusion was decided in advance!

In fact, I am not saying that these people are bad guys. Of course, they can cut a little bit here and a little bit there, or occasionally have a few errors, but these are all technical problems-I guess they should think so. For Cuddy, Norton, and Fiske, they may have to take a step back and think that everything they did in the previous years was a mistake, which may require a lot of courage. They may never do this.

I also wrote this long article because Fiske expressed concern about the career prospects of some of her friends in one of her articles, and felt that their future career development might be affected by the public's anger at their research errors. . But please remember that if they do this, there will be many cautious young researchers who cannot be promoted and cannot publish their research results, because these young people simply cannot compete fairly with these published but flashy articles. .

Another unhappy thing is that Fiske eagerly believes that her own principles need to be understood accordingly. On the one hand, she strongly condemns those who criticize her as "unthinking trash", "excessive ***" and "malicious confrontation", and insists that "editor review" and "peer review" are very important. of. However, on the other hand, she posted these views on a forum without peer review. This forum cannot even publish comments. Her colleague also insinuates that some people are "methodological terrorists." It seemed that she was the one who was really talking nonsense.

All in all, Fiske and her friends and students have been using this method to earn them fame and success. And questioning these methods and the results brought by these methods is obviously not so pleasant.

Fiske also doesn't like social media, which I can understand completely. After all, she has a great influence on the traditional media. She can be on the news in the newspaper or even on Ted Talk. The traditional media is just like her good friend. She has no control over social media. People like Fiske usually spend their entire lives pursuing publication and citation to build their academic wealth and kingdom, so it must be a painful thing to watch them disintegrate.

But for now, let us temporarily forget about the academic career and just talk about the research itself. In fact, our main goal is to do a good research, but when we don’t pay attention to errors or always think that we are always right, we can’t do a good job of research. There is no problem with making mistakes. I myself have published articles that need to be retracted later, so logically I am not qualified to criticize those unrigorous data analysis. But when others point out my mistakes, I will thank them. It is these suggestions that make my research better. I suggest Fiske can do the same.

For me, the part about Fiske is almost finished. Fiske once wrote a sentence: "The progress of psychology research benefits from everyone's cooperation and also from constructive criticism". Here she emphasized "constructiveness." Maybe we have different understandings of the word "constructive", but I hope we can reach a consensus on one point, that is, "pointing out errors in published articles and conducting repetitive research" is also constructive.

Only when we admit our mistakes can we get a boost from it. Debugging itself is a common process. If you agree to a code but I find a bug, I am not maliciously fighting against you, I am cooperating with you. If you portray me as an "adversary" to avoid making mistakes, it is your own problem.

Finally, I will end this passage with the biggest difference between me and Fiske. Fiske likes decent private discussions, and I like open discussions. From a personal point of view, I am not a fan of Twittter either, because small-scale replies often encourage tearing. Relatively speaking, I like the blog and the comments on the blog. Because there is enough space on the blog for us to fully discuss what we are discussing.

So I posted this blog post on my blog, anyone can reply. You read that right, anyone. Susan Fiske can too. And those who are interested in psychology but did not have the opportunity to publish unpeer-reviewed articles on the APS non-brilliant school tenured professors.

原文：What has happened down here is the winds have changed

You can go to the econometric community for further exchanges. Our community’s overseas Scholars have already accounted for a considerable proportion.

An unprecedented crisis in social science, comparable to the 2008 financial crisis

Guess you like