MIT Technology Review: AI is wrestling with a replication crisis

Maybe AI can learn something from the openness, replication and reproducibility issues that psychology and biomedicine have faced. Interesting to note that a lot of the ‘problematic’ research practices come from researchers in private industry. Industrial labs (e.g. Bell Labs, Xeorx PARC) have previously been acclaimed for the research they did - I wonder if a retrospective study would find their research practices were more similar to the ideals of Academia or current industry.

Haibe-Kains and his colleagues are among a growing number of scientists pushing back against a perceived lack of transparency in AI research. “When we saw that paper from Google, we realized that it was yet another example of a very high-profile journal publishing a very exciting study that has nothing to do with science,” he says. “It’s more an advertisement for cool technology. We can’t really do anything with it.”

At least, that’s the idea. In practice, few studies are fully replicated because most researchers are more interested in producing new results than reproducing old ones. But in fields like biology and physics—and computer science overall—researchers are typically expected to provide the information needed to rerun experiments, even if those reruns are rare.

Do such efforts make a difference? Pineau found that last year, when the checklist was introduced, the number of researchers including code with papers submitted to NeurIPS jumped from less than 50% to around 75%. Thousands of reviewers say they used the code to assess the submissions. And the number of participants in the reproducibility challenges is increasing.

Haibe-Kains is less convinced. When he asked the Google Health team to share the code for its cancer-screening AI, he was told that it needed more testing. The team repeats this justification in a formal reply to Haibe-Kains’s criticisms, also published in Nature: “We intend to subject our software to extensive testing before its use in a clinical environment, working alongside patients, providers and regulators to ensure efficacy and safety.” The researchers also said they did not have permission to share all the medical data they were using.

It’s not good enough, says Haibe-Kains: “If they want to build a product out of it, then I completely understand they won’t disclose all the information.” But he thinks that if you publish in a scientific journal or conference, you have a duty to release code that others can run. Sometimes that might mean sharing a version that is trained on less data or uses less expensive hardware. It might give worse results, but people will be able to tinker with it. “The boundaries between building a product versus doing research are getting fuzzier by the minute,” says Haibe-Kains. “I think as a field we are going to lose.”

As more research is done in house at giant tech companies, certain trade-offs between the competing demands of business and research will become inevitable. The question is how researchers navigate them. Haibe-Kains would like to see journals like Nature split what they publish into separate streams: reproducible studies on one hand and tech showcases on the other.

Pineau believes there’s something to that. She thinks AI companies are demonstrating a third way to do research, somewhere between Haibe-Kains’s two streams. She contrasts the intellectual output of private AI labs with that of pharmaceutical companies, for example, which invest billions in drugs and keep much of the work behind closed doors.


For sure, if the data are not big enough. Any kind of ANN algorithm will failed miserably. Take for example in paleoclimate data. I just played around with 300-year tree-ring water isotope data and trying to model them with LSTM to detect time-series anomaly, the algorithm tend to overfit no matter how I tune the model. Perhaps I’m not capable enough to play around with ‘AI’ :sweat_smile: