What's Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers

Some fairly harsh criticisms about the replicability of social science publications. While I’m not sure if the author is particularly qualified to make such criticisms, he does at least seem to be impressively capable of predicting studies replicability (compared to market results, not actual replications):

Over the past year, I have skimmed through 2578 social science papers, spending about 2.5 minutes on each one. This was due to my participation in Replication Markets, a part of DARPA’s SCORE program, whose goal is to evaluate the reliability of social science research. 3000 studies were split up into 10 rounds of ~300 studies each. Starting in August 2019, each round consisted of one week of surveys followed by two weeks of market trading. I finished in first place in 3 out 10 survey rounds and 6 out of 10 market rounds. In total, about $200,000 in prize money will be awarded.

These are his suggestions for institutional improvement:

  • Earmark 60% of funding for registered reports
  • Earmark 10% of funding for replications
  • Earmark 1% of funding for progress studies
  • Increase sample sizes and lower the significance threshold to .005
  • Ignore citation counts.
  • Open data, enforced by the NSF/NIH.
  • Financial incentives for universities and journals to police fraud.
  • Why not do away with the journal system altogether?
  • Have authors bet on replication of their research.

And what individual researchers can start doing today:

  • Just stop citing bad research
  • Read the papers you cite.
  • When doing peer review, reject claims that are likely to be false.

His conclusion:

The importance of metascience is inversely proportional to how well normal science is working, and right now it could use some improvement. The federal government spends about $100b per year on research, but we lack a systematic understanding of scientific progress, we lack insight into the forces that underlie the upward trajectory of our civilization. Let’s take 1% of that money and invest it wisely so that the other 99% will not be pointlessly wasted. Let’s invest it in a robust understanding of science, let’s invest it in progress studies, let’s invest it in—the future.


Great suggestions! :slight_smile:

Vox Future Perfect based their last newsletter on the blog post. (copied in full as I can’t find a page for it)

Much ink has been spilled over the replication crisis in the last decade and a half, including here at Vox. Researchers have discovered, over and over, that lots of findings in fields like psychology, sociology, medicine, and economics don’t hold up when other researchers try to replicate them.

This conversation was most famously kicked off by John Ioannidis’s 2005 article “Why Most Published Research Findings Are False,“ but since then many researchers have explored it from different angles. Why are research findings so often unreliable? Is the problem just that we test for “statistical significance” in a nuance-free way? Is it that null results (that is, when a study finds no detectable effects) are ignored while positive ones make it into journals?

Is it that social science researchers don’t have the statistics training to adjust for multiple comparisons and other common statistical mistakes? And now that we’ve realized the extent of errors in published papers, will it be easy to improve?

Some recent evidence has led me to a depressing conclusion: The reasons that lead to unreliable research findings are routine, well understood, predictable, and in principle pretty easy to avoid. And yet we’re not actually improving at all.

That’s the takeaway from a recent study that found that lay people do quite well at predicting which published research will or won’t replicate, and a fascinating blog post by Alvaro de Menard, a participant in DARPA’s replication markets project, a project by the US military to design better tools for predicting which research will hold up.

The conclusion both of them reach about the replication crisis: It’s not very mysterious.

Just carefully reading a paper — even as a layperson without deep knowledge of the field — is sufficient to form a pretty accurate guess about whether the study will replicate.

Meanwhile, DARPA’s replication markets found that guessing which papers will hold up and which won’t is often just a matter of looking at whether the study makes any sense. Some important statistics to take note of: Did the researchers squeeze out a result barely below the significance threshold of p = 0.05? (A paper can often claim a “significant” result if this threshold is met, and many use various statistical tricks to push their paper across that line.) Did they find no effects in most groups but significant effects for a tiny, hyper-specific subgroup?

“Predicting replication is easy,” Menard writes. “There’s no need for a deep dive into the statistical methodology or a rigorous examination of the data, no need to scrutinize esoteric theories for subtle errors—these papers have obvious, surface-level problems.”

And the bad news is that this hasn’t changed anything! We’ve known for at least a decade about the replication crisis. Many researchers have worked full time on explaining the statistical mistakes that make papers unreliable, proposing and pushing for new methods that should work better, and tracking the reasons papers are retracted or refuted.

Here at Vox, we’ve written about how the replication crisis can guide us to do better science. And yet blatantly shoddy work is still being published in peer-reviewed journals despite errors that a layperson can see. Journals effectively aren’t held accountable for bad papers — many, like the Lancet, have retained their prestige even after a long string of embarrassing public incidents where they published research that turned out fraudulent or nonsensical.

Furthermore, once published, the obviously bad papers are cited by others. Menard finds almost no correlation between a paper’s probability of replicating and its number of citations. It’s hard to guess why, but he theorizes that many scientists don’t thoroughly check — or even read — papers once published, expecting that if they’re peer-reviewed, they’re fine. So bad papers are published — and once they’re published, no one is looking closely enough to care that they’re bad papers.

That’s discouraging and infuriating. It suggests that the replication crisis isn’t one specific methodological reevaluation, but a symptom of a scientific system that needs rethinking on many levels. We can’t just teach scientists how to write better papers, we also need to change the fact that those better papers aren’t cited more often than bad papers, that bad papers are almost never retracted even when their errors are visible to any lay reader, and that there are no consequences for bad research.

We need a more sophisticated understanding of the replication crisis, not as a moment of realization after which we were able to move forward with higher standards, but as an ongoing rot in the scientific process which a decade of work hasn’t fixed.

There should be some caveats. Science has done, and is still doing, great things. Many scientists do important and valuable work — and in a way, the fact that bad papers are identifiable by lay people is good news because it means we have the tools to identify and appreciate the good stuff, too.

Our scientific institutions are valuable and we need, more than ever, the tools they’ve built to help us understand the world. There’s no cause for hopelessness here, even if some frustration is very thoroughly justified. Science needs saving, sure — but science is very much worth saving.


I loved that blog post. He specifically kicks on Criminology, claims it was the absolutely worst empirical field. That’s very close to my field, legal/forensic psychology. Not much better.

1 Like