Monthly Reading Recommendations

I was discussing Peterson and Panofsky’s paper with somebody who thought it didn’t clearly articulate what scientific efficiency meant from a metascience perspective, as it simply stated:

Metascientific activists have conceptualized efficiency in terms of improving the proportion of replicable claims to nonreplicable claims in the literature (e.g., Ioannidis, 2012).

Which was set against the status quo process for scientific progress:

a biologist at MIT who contrasted these organized replication efforts with what he viewed as the current ‘Darwinian process […] which progressively sifts out the findings which are not replicable and not extended by others’. Under this alternative theory of scientific efficiency, there is a natural process in which researchers produce many claims. Some may be flat wrong. Some may be right, yet hard to reproduce, or only narrowly correct and, therefore, be of limited use. However, some provide robust and exciting grounds to build upon and these become the shoulders on which future generations stand (Peterson and Panofsky, 2021)

But one point that came up is that surely reproducibility, whether it comes from directed efforts or natural selection, isn’t enough to ensure efficient scientific progress if you aren’t testing hypotheses that will lead to useful theoretical and/or practical progress in the first place. (note the papers first point is essentially we don’t know what progress is in basic science, see my post above)

This reminded me of the 2009 original article about avoidable research waste which proposed four stages of research waste: 1) irrelevant questions, 2) inappropriate design and waste, 3) inaccessible or incomplete publications, 4) biased or unusable reports (inefficient research regulation and management was later inserted at position 3). This paper is known for estimating that 85% of investment in biomedical research is wasted, but this only takes into account losses at stages 2, 3, and 4. It is these three stages that are then addressed by the two efficiency promoting manifestos cited by Peterson and Panofsky (Ioannidis et al. 2015 and Munafò et al. 2017) under the themes of improved Methods, Reporting and Dissemination, Reproducibility and Evaluation, all of which are supported by Incentives. Figure 1 of the latter manifesto does show Generate and specify hypothesis in a circular diagram of the scientific method, but in the context of scientific reproducibility, the discussion focuses on the risks that uncontrolled cognitive biases pose to hypothesising:

a major challenge for scientists is to be open to new and important insights while simultaneously avoiding being misled by our tendency to see structure in randomness. The combination of apophenia (the tendency to see patterns in random data), confirmation bias (the tendency to focus on evidence that is in line with our expectations or favoured explanation) and hindsight bias (the tendency to see an event as having been predictable only after it has occurred) can easily lead us to false conclusions.

Besides the metascience manifestos above, a 2014 Lancet series on increasing value and reducing waste in biomedical research also provided recommendations to address each stage of research waste. The first article in the series considered the problem of choosing what to research but primarily set this out as a challenge for funders and regulators when setting research priorities. While some suggestions are made that could be useful for researchers working doing clinical, applied or even use-inspired studies (namely, consider the potential user’s needs) the most broadly applicable advice for individual researchers seems to be using systematic and metareviews to ensure that existing knowledge is recognized and then used to justify additional work.

I feel that the question of what to research (particularly in basic research and for the individual researcher) has been neglected by metascientific reformers and their current focus on improving replicability. Don’t get me wrong, replicability is important as producing unreplicable results from testing innovative hypotheses doesn’t mean much, but I think the two aspects of efficient science need to move forward together.

Refreshingly, a recent article that introduced the Society of Open, Reliable, and Transparent Ecology and Evolutionary biology notes that promoting good theory development is an outstanding question for meta-research and provides a reference to the beguiling titled paper Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. I’ve yet to look at this last paper and its citations in detail, but I still wonder if I’ve missed something. Has work in metascience really not looked into problem selection as much as the other stages of research waste? Or is this being addressed using a different terminology or by a different field? Or do we continue to really on researchers developing the tacit skill of selecting good research questions during their training?

1 Like

I discovered this months preprint Empowering early career researchers to improve science in a session of the name at Metascience 2021 (recording). I was particularly impressed with the diverse range of perspectives (both in terms of early career researchers (ECR) demographics and types of initiatives) that the organizers drew together with an asynchronous unconference (which they reflect on here).

Besides catagorizing reform efforts into: ‘(1) publishing, (2) reproducibility, (3) public involvement and science communication, (4) diversity and global perspectives, (5) training and working conditions for early career researchers (ECRs), and (6) rewards and incentives.’ (section 1)

The preprint also discusses 7 why reasons ECR need to be involved in scientific improvements (section 2, briefly summarized):

  1. ECRs are future research leaders and should be able to shape the future of the research enterprise,
  2. ECRs are a more diverse group than senior scientists,
  3. ECRs may more open to new ideas and less ‘set in their ways’ than senior scientists,
  4. ECRs may still be motivated by unchallenged idealism (i.e. they haven’t yet become jaded and career-driven PIs),
  5. ECRs are at the forefront of technical advances and tool development,
  6. Some ECRs have more time and energy to commit to reform initiatives,
  7. ECRs represent the majority of the scientific workforce.

And six obstacles faced by ECRs involved in efforts to improve research culture (section 3, also summarized):

  1. Scientific improvement initiatives are rarely rewarded or incentivized and aren’t seen as a priority for career progression: ‘This contributes to an endless cycle, where ECRs who work to improve science are pushed out before securing faculty positions or leadership roles where they gain more ability to implement systemic changes.
  2. Limited funding and resources are available for ECRs (isn’t this true for everybody?) working on scientific reform,
  3. ECRs are generally excluded from decision making in existing institutions,
  4. ECRs often can’t set aside time for scientific imrovement work, their positions/career stage is unstable, and their supervisors may see scientific improvement as a diversion from their main research projects,
  5. ECRs are perceived as lacking the required experience to improve science,
  6. The western scientific culture creates added challenges from ECRs who are from marginalized groups.

Yet other stakeholders and using good practices can help ECRs overcome these obstacles! A list of suggested actions that institutions (e.g. universities, funders, publishers, academic societies, and peer communities) and senior individuals (e.g. supervisors) can take to support ECRs who are working on reform projects is provided in section 4, and the paper ends with list of lessons learned by ECRs who have previously worked on improving science (section 6, subsection headings):

  • Know what has been done before
  • Start with a feasible goal
  • Collaborate wisely
  • Work towards equity, diversity, and inclusion
  • Build a positive and inclusive team dynamic
  • Anticipate concerns or resistance to change
  • Be persistent
  • Plan for sustainability

Additional (and more detailed) tips are also included in the document ‘Tips and tricks for ECRs organizing initiatives’ on OSF.

Section 5 discusses additional obstacles ECRs in countries with limited funding face and mentions ‘In some countries, postdocs and occasionally PIs lack institutional affiliations, which can be an added challenge when trying to initiate systemic change.’

IGDORE actually provides free institutional affiliation to researchers and researchers in training. IGDORE currently has affiliates on all continents and we’d be very happy to provide institutional support services for ECRs working on scientific reform (or field-specific research for that matter) who don’t have access to an institutional affiliation in their own country. (The Ronin Institute is another option for getting an institutional affiliation.) Additionally, this very forum (On Science & Academia) can also provide a place for ECRs who want to have more nuanced discussions about scientific reform than is possible/probable on Twitter or Slack (we can make public/private categories for specific initiatives, just message a forum moderator or admin about your groups needs). Readers should feel free to point ECRs towards IGDORE and this forum if they think one or both could assist them.

To conclude:

ECRs are important stakeholders working to catalyse systemic change in research practice and culture. The examples presented reveal that ECRs have already made remarkable progress. Future efforts should focus on incentivizing and rewarding systemic efforts to improve science culture and practice. This includes providing protected time for individuals working in these areas and amplifying ECR voices and meaningfully incorporating them into decision-making structures. ECRs working on improving science in communities or countries with limited research funding should be supported by organizations with access to greater resources to improve science for all. We hope that the tools, lessons learned, and resources developed during this event will enhance efforts spearheaded by ECRs around the world, while prompting organizations and individuals to take action to support ECRs working to improve science.

What about the OS&A forum readers? A few of you are involved in scientific reform initiatives (perhaps more so outside of traditional academia than inside of it). What resonates here? What would you add?

As a Global Board member at IGDORE I’ve personally noticed that having well thought out plans and goals is important, as is developing a strong diverse team (with respect to volunteers) and planning for leadership sustainability (tbh we’re still working to improve all of these points). Persistence is critical even in the most basic things like building up newsletter readership :rofl: Interestingly, global/language diversity of affiliations seems to have come relatively easily to IGDORE, although this may be specific to our particular project. An additional point that I think has been useful for IGDORE but didn’t see mentioned prominently in the preprint is forming cooperations/collaborations between organisations/initiatives working on similar or allied projects - IGDORE has several organisational collaborations and I’ve found engaging with these other organisations has also helped my work at IGDORE.

NOTE: This article is now published in PLoS Biology: Recommendations for empowering early career researchers to improve research culture and practice


Nice find! Will check this out.


This months recommendation provides a very practical (and maybe even fun!) exercise to avoid errors in your research (and another h/t goes to Nowhere Lab for this)

Error Tight: Exercises for Lab Groups to Prevent Research Mistakes

The purpose of this project is to provide hands-on exercises for lab groups to identify places in their research workflow where errors may occur and pinpoint ways to address them. The appropriate approach for a given lab will vary depending on the kind of research they do, their tools, the nature of the data they work with, and many other factors. Therefore, this project does not provide a set of one-size-fits-all guidelines, but rather is intended to be an exercise in self-reflection for researchers and provide resources for solutions that are well-suited to them.

The exercise has six steps and in my opinion step 1, which involves reading a table with examples of things that have actually gone wrong at different stages of psychology research and suggested ways to avoid them, is possibly the greatest contribution of the preprint, because as noted:

A recurring theme when reading about the errors others have made is that mistakes happen in unexpected places and in unexpected ways. Therefore, reading examples of ways that others have made mistakes may be fruitful in stoking your creativity about where mistakes may happen in your process.

If I had been asked to come up list of things that (I knew) had gone wrong in my research I doubt it would have been half as long and although I work in a different field, reading through the table did prompt me to think of a few things that could go wrong in my work that I don’t usually check. Could a central database of error case reports for a variety of fields be a useful resource for an Error Tight scientific community?

“You must learn from the mistakes of others. You can’t possibly live long enough to make them all yourself.” - Samuel Levenson


I absolutely loved the recommended reading of the month, thanks!

I am going to get a tattoo with this sentence, I think :wink:

1 Like

Not an article, but I nevertheless found this piece interesting to read: The 20% Statistician: Not All Flexibility P-Hacking Is, Young Padawan (of course the Star Wars meme contributed LOL)

People in psychology are re-learning the basic rules of hypothesis testing in the wake of the replication crisis. But because they are not yet immersed in good research practices, the lack of experience means they are overregularizing simplistic rules to situations where they do not apply. Not all flexibility is p-hacking […]

Perhaps of interest to @rebecca & @Enrico.Fucci :slight_smile:


Thanks for sharing @pcmasuzzo! This reminded me of a recent article from @Lakens and colleagues: Why Hypothesis Testers Should Spend Less Time Testing Hypotheses which I think provides some great advice about things to consider before one even thinks about doing a hypothesis test.

A modern student of psychology, wanting to learn how to contribute to the science of human cognition and behavior, is typically presented with the following procedure. First, formulate a hypothesis, ideally one deductively derived from a theory. Second, devise a study to test the hypothesis. Third, collect and analyze data. And fourth, evaluate whether the results support or contradict the theory. The student will learn that doubts about the rigor of this process recently caused our discipline to reexamine practices in the field. Excessive leniency in study design, data collection, and analysis led psychological scientists to be overconfident about many hypotheses that turned out to be false. In response, psychological science as a field tightened the screws on the machinery of confirmatory testing: Predictions should be more specific, designs more powerful, and statistical tests more stringent, leaving less room for error and misrepresentation. Confirmatory testing will be taught as a highly formalized protocol with clear rules, and the student will learn to strictly separate it from the “exploratory” part of the research process. Seemingly well prepared to make a meaningful scientific contribution, the student is released into the big, wide world of psychological science.

But our curriculum has glossed over a crucial step: The student, now a junior researcher, has learned how to operate the hypothesis-testing machinery but not how to feed it with meaningful input. When setting up a hypothesis test, the junior researcher has to specify how their independent and dependent variables will be operationalized, how many participants they will collect, which exclusion criteria they will apply, which statistical method they will use, how to decide whether the hypothesis was corroborated or falsified, and so on. But deciding between these myriad options often feels like guesswork. Looking for advice, they find little more than rules of thumb and received wisdom. Although this helps them to fill in the preregistration form, a feeling of unease remains. Should science not be more principled?

We believe that the junior researcher’s unease signals an important problem. What they experience is a lack of knowledge about the elements that link their test back to the theory from which their hypothesis was derived. By using arbitrary defaults and heuristics to bridge these gaps, the researcher cannot be sure how their test result informs the theory. In this article, we discuss which inputs are necessary for informative tests of hypotheses and provide an overview of the diverse research activities that can provide these inputs.


Thanks to @Wolfgang.Lukas for pointing me towards this months paper Aspiring to greater intellectual humility in science (preprint). The article makes a good case for intellectual humility, that is, owning our limitations: “owning one’s intellectual limitations characteristically involves dispositions to: (1) believe that one has them, and to believe that their negative outcomes are due to them; (2) to admit or acknowledge them; (3) to care about them and take them seriously; and (4) to feel regret or dismay, but not hostility, about them.”, although unfortunately the current incentive structure of academia can discourage it:

In an ideal world, scientists have an obligation to put the limitations and uncertainty in their findings front and center. Although there are positive developments such as journals asking for more honest papers, and a growing number of journals accepting Registered Reports (in which the decision to publish a paper is made independent of the results), there are still many forces pushing against intellectual humility. Exaggeration is beneficial for the individual (at least in the short run), but it can be detrimental for the group, and eventually for science as a whole, given that credibility is at stake. The downward spiral in this collective action dilemma is not easily reversed, and some people are in a more secure position to make principled but disincentivized choices. Given that scientists have professional obligations and ethics to live up to, intellectual humility should be a factor in the decisions we all make. Researchers should not hide behind a flawed system, the incentive structure, or their lower-ranked position (see, e.g., ref. 38). Although they certainly play a role, these factors do not release us from our moral and professional obligations to own the limitations of our work by putting those limitations front and center, and incorporating their consequences into our conclusions.

But this article stands out by not just calling for more intellectual humility, but also providing a ‘a set of recommendations on how to increase intellectual humility in research articles and highlight the central role peer reviewers can play in incentivizing authors to foreground the flaws and uncertainty in their work, thus enabling full and transparent evaluation of the validity of research.’. These recommendations are:

  1. Title and abstract
  • 0.1 The abstract should describe limitations of the study and boundary conditions of the conclusion(s).

  • 0.2 Titles should not state or imply stronger claims than are justified (e.g., causal claims without strong evidence).

  1. Introduction
  • 1.1. The novelty of research should not be exaggerated.
  • 1.2 Selective citation should not be used to create a false sense of consistency or conflict in the literature.
  1. Method
  • 2.1 The Methods section should provide all details that a reader would need to evaluate the soundness of the methods, and to conduct a direct replication.
  • 2.2. The timing of decisions about data collection, transformations, exclusions, and analyses should be documented and shared.
  1. Results
  • 3.1 Detailed information about the data and results (including informative plots and information about uncertainty) should be provided.

  • 3.2 It should be transparent which analyses were planned and where those plans were documented. Weaker conclusions should be drawn to the extent that analyses were susceptible to data-dependent decision-making.

  • 3.3 Inferential statistics should not be used in a way that exaggerates the certainty of the findings. Alternatives to dichotomous tests should be considered.

  1. Discussion
  • 4.1 The statistical uncertainty of results should be incorporated into the narrative conclusions drawn from the results.

  • 4.2 The research summary should capture the full range of results (e.g., include our “most damning result”).

  • 4.3 Causal claims should be only as strong as the internal validity of the study allows.

  • 4.4 Claims about generalizability should be only as strong as the sampling of participants, stimuli, and settings allows.

  • 4.5 All conclusions should be calibrated to the confidence in the construct validity of the measures and manipulations.

  • 4.6 Alternative interpretations should be presented in their strongest possible form (“steelmanned”).

  • 4.7 Discussion of the limitations should be incorporated throughout the discussion section, rather than bracketed off in a subsection.

  1. Post publication guidance for authors
  • 5.1 Insist that press releases and reporters capture the limitations of the work, and correct outlets

  • that exaggerate or misrepresent.

  • 5.2 Encourage criticism, correction, and replication of our work, and respond non-defensively when errors or contradictory evidence are brought to light.

  • 5.3 When appropriate, retract papers, issue corrections, or publish “loss of confidence” statements.

Moreover, even if junior academics are cautious of embracing intellectual humble practices in case it puts them at a disadvantage when publishing, they could still promote it while reviewing other’s papers (and no, the authors say we shouldn’t consider that to be hypocritical!):

But many may feel they are not in the position (yet) to “do the right thing.” That is an understandable position, given the non-scientific interests most of us must factor into our decisions. Luckily, reviewing provides an almost cost-free opportunity for all of us to contribute to incentivizing intellectual humility. Even if we are not always able to apply these practices as authors, out of fear of lowering our chances of success, we can flip the script for the authors whose manuscripts we review, and make their success at least partly dependent on the amount of humility they show in their papers, thus resulting in more honest and more credible papers. In our view, this is not hypocritical, but trying to act in accordance with our norms, within the limits of our capabilities. Of course, these same reviewers would have to be ready and willing to accept reviewers’ suggestions to be more intellectually humble if the roles are reversed. We suspect that most people who are willing to promote intellectual humility as reviewers would be happy to do so when they find themselves on the receiving end of similar reviews.

Note - I renamed this thread Monthly Reading Recommendation from Open Science Reading Recommendation. A lot of the articles I’m suggesting stray outside of what is usually considered Open Science, cross Metascience and end up in Philosophy of Science :smiley:


This months recommendation is much more empirically focused than many of the articles I’ve indicated previously. How do researchers approach societal impact? (h/t @Daniel-Mietchen) ‘examines how researchers approach societal impact, that is, what they think about societal impact in research governance, what their societal goals are, and how they use communication formats. Hence, this study offers empirical evidence on a group that has received remarkably little attention in the scholarly discourse on the societal impact of research—academic researchers. Our analysis is based on an empirical survey among 499 researchers in Germany conducted from April to June 2020.

Societal impact has become an important point when setting research priorities and the articles provides a good introduction to two discourses that have been used for framing it ‘1) the discourse in communication and science and technology studies (STS) on the relationship between science and society and 2) the discourse on societal impact measurement in scientometrics and evaluation research

The study explored three research questions:

  1. RQ1 : What are researchers’ opinions on societal impact?
  2. RQ2 : Which societal goals do researchers aim to achieve with their research?
  3. RQ3 : Which formats do researchers use to achieve societal impact?

And analysed these in terms of three dimensions: content (discipline and applied vs. basic), organizational and individual. One of the organizational categories is non-university research institutes (as well as universities and universities of applied sciences) and while I assume that most or all of the research institutes represented in this study have physical workplaces, I was interested to see how the institute’s results differed from universities to put into context my own experiences at IGDORE (I mostly quote the results describing differences in the organization dimension below).

RQ1—opinion: Individual commitment without an institutional mandate

Researchers at applied universities agree that societal relevance should have more weight in evaluation more often than those at independent institutes and universities—the approval rates are 62%, 49%, and 53% respectively. Researchers at universities show the lowest agreement to the statement that knowledge transfer plays an important role at their institution: Only 19% approve compared to 36% at applied universities and 37% at independent institutes. Furthermore, researchers at universities particularly disagree with the statement that their communication departments are able to reach relevant stakeholders in society: Only 15% approve compared to 28% at applied universities and 44% at independent research institutes.

RQ2—goals: Disciplines define societal goals

When looking at the different types of organizations, it is noticeable that researchers from applied universities are most economically oriented: 41% of the researchers from universities of applied sciences indicate that they want to contribute to economic value creation (19% at nonuniversity institutions and 17% at universities). In contrast, researchers from independent institutes are most the policy-oriented ones: 54% of the researchers from independent institutes aim to contribute to political decision-making (31% at universities and 37% at applied universities).

RQ3—formats: University researchers are the least active

Researchers from applied universities are the primary users of advisory formats: 57% of researchers at applied universities have used advisory formats compared to 40% of researchers at independent institutes and only 26% of the researchers at universities. Only 28% of the researchers from universities have used collaboration formats, compared to 53% of researchers from applied universities and 35% of researchers from independent institutes. As Table 2 shows, university researchers have remarkably low scores on every communication format.

Table 2 also indicates that researchers at independent institutes are the users of events and social media for communication.

In the discussion, the authors note that there is appears to be some mismatch between the generally high importance attributed to societal impact by researchers vs. how effectively they believe their institution conducts knowledge transfer and communication with stakeholders in society (this made me recall another working paper which finds that technology transfer offices and US universities don’t operate very efficiently):

it is remarkable that the majority of researchers (89%) consider societal engagement to be part of scientific activity. More than half of the researchers (53%) agree that societal impact should be given more weight in evaluations. Even though the majority of researchers regard public engagement as part of scientific work, they are not equally positive about whether societal impact should have more weight in evaluations. One reason for this discrepancy may be that researchers fear that evaluations will lead to additional work or that they will not adequately record their transfer activities [23, 60, 64]. In addition, it is striking that only 27% of the respondents assume that knowledge transfer plays an important role at their institution; also 27% n believe that the institutional communication department is managing to reach relevant stakeholders in society. Humanities scholars (15%) and university researchers (15%) particularly doubt that their communication departments are reaching relevant societal stakeholders. This mirrors previous findings suggesting a certain decoupling between central transfer infrastructures and researchers [38, 39] and leads us to hypothesize that there is a certain mismatch between individual and institutional commitment.

Although the article is mainly descriptive, the authors do propose some practical actions:

First, considering the discontent with institutional communication departments, it might be worthwhile to implement decentralized support structures on the mesolevel of research organizations. This could more adequately address the complexities of the sciences and their many publics [5, 6, 57, 63]. The findings further suggest that, where applicable, organizational factors (e.g., institutional investments in transfer, training offerings, support infrastructures) should be more strongly incorporated into assessments of societal impact—for example. through formative evaluations [88]. Second, our results suggest that it is strongly advisable that evaluation exercises are responsive to disciplinary differences. For example, if economic and technical impact were the sole basis for assessing societal impact, social sciences and humanities scholars would be discriminated against [6, 63]. Our framework for societal goals and our results can also be the basis for disciplinary self-understanding (e.g., in learned societies), in that they can stimulate a normative discussion about good transfer and its evaluation. Third, considering the comparatively low importance of social media as a means of communicating about research, care should be taken not to overuse online discourse as a way of easily generating impact proxies.

As for comparing the results for independent institutes to my own experiences, I generally agree that IGDORE researchers are probably more interested in knowledge transfer than many academics but, unfortunately, we don’t (yet) provide any communication services to facilitate this. IGDORE researchers also seem to be quite active in trying to influence research policy, especially policies related to open science, and when I joined IGDORE, I was actually surprised at how active members of the community were on social media! So overall I think the results of this survey are generally representative of the differences I’ve noticed between IGDORE researchers and those in academia.

This month’s paper is Self-correction in science: The diagnostic and integrative motives for replication (preprint), another thought-provoking article from Peterson and Panofsky. The essential argument is that replications come in both diagnostic and integrative forms:

the goal of integrative replication is to reproduce the ends of research while being pragmatic about the means, whereas the goal of diagnostic replication is to faithfully reproduce the means while remaining agnostic about the ends

and that 'that current debates, as well as research in the science and technology studies, have paid little heed to a key dimension [integrative forms] of replication practice.’ Interviewing sixty members of Science’s Board of Reviewing Editors lead the authors to the following six theses:

  1. Strictly diagnostic replications are rare, but replications motivated by the desire to integrate are common.
  2. Integrative replication attempts provide varying degrees of diagnostic evidence.
  3. Integrative replication provides stronger diagnostic evidence when task uncertainty is lower.
  4. Integrative replication provides weaker diagnostic evidence when task uncertainty is higher.
  5. When diagnostic replication requires special effort, experimentalists often embrace a logic of investment rather than a logic of truth.
  6. When diagnostic replication is difficult and ambiguous, researchers may prefer an organic mode of self-correction.

The reference to task uncertainty was important for fields goals for replications and deserves further clarification: ‘High task uncertainty is characteristic of experimental contexts in which significant variables are either unknown or uncontrollable and/or experimental techniques and technologies are either unstandardized or unstandardizable. In conditions of low task uncertainty, on the other hand, variables are known and controlled and experimental techniques and technologies are standardized and predictable.’ A common generalization is that many subfields of physics have low task uncertainty while many areas of biology have high task uncertainty, although the article notes that there are also exceptions.

The article describes the rationale for each thesis in detail, but I think that the implications that article notes out for science policy related to replications are the most useful to highlight:

By encouraging replications and making them easier to perform, metascientific activists hope that replication will move from the realm of a possible, but rarely actualized, deterrent to something with real teeth. Given our analysis, such policies would be especially effective in fields with low task uncertainty, where replication is interpreted to be most diagnostic. Yet, respondents in these fields were the least likely to endorse the need for such developments since the threat of a potent and unambiguous diagnostic replication is seen as sufficient to discourage bad behavior in many of these fields. Conversely, in fields with greater perceived task uncertainty – including much of the biomedical and behavioral research fields that most replication activism targets – replication initiatives are likely to have less impact because these replications are more susceptible to falling into the experimenters’ regress.

This highlights the tension that can arise between the two motives for replication. Researchers have an inherent motivation during integrative replication. The goal is to ‘get it to work’ in order to extend their own research capacities. When replication is undertaken purely for diagnostic reasons, the motivations are unclear. What would motivate researchers to stop their own research in order to explicitly test a finding that is not integral to their own projects? Rather than ‘get it to work’, scholars conducting purely diagnostic replication attempts may, in fact, have a perverse incentive to have it fail. At the very least, they may lack the motivation to do it correctly. This can create a culture of paranoia which, while in line with the abstract ideal of ‘organized skepticism’, reflects a mistrust that is actually quite unusual in the history of science

Having never previously considered the value of integrative replications (despite having performed them regularly in my own research), I do think that the role of such integrative work deserves more consideration in relation to the current replication crises. Yet the paper’s final paragraph notes that it is not yet clear how the data from failed integrative replications can be used to identify unreplicable results in the literature:

Metascientists have argued that having failed studies languishing in file drawers means that we are getting an incomplete picture of the data in a field. Our respondents made clear that, in many cases, they choose not to share such failures because the data was never meant to be diagnostic. They were quick and sloppy attempts to try something out, not sober processes of verification, and what might constitute a publicly sharable output is not clear. This raises significant and complex questions regarding what sorts of failures should count and which should not be included in metanalytic analyses.

This made me recall part of @bastien.lemaire’s recent review article on the file drawer effect:

Null but conclusive findings have the power to dissuade researchers from unfruitful avenues. Contrarily, null but inconclusive findings are hardly interpretable. However, they are useful in laying down scientific questions that remain to be tackled by the scientific community. Surprisingly or not, once published and grouped in systematic reviews, those underpowered studies can be lifesaving. This is well illustrated by the Stroke Unit Trialists’ collaborative systematic review who gathered a series of null and underpowered studies[26] which once together demonstrated that having a stroke unit care reduces the odds of death by 17% (95% confidence interval 4-29%) … true (or diagnostic) replications are not always necessary to validate or invalidate previously described results. Indeed, the convergence of evidence through triangulations of experiments can act as indirect replications and, at the same time, push the boundaries of knowledge.

Formal protocols for combining the evidence from failed integrative replications (possibly with other null or underpowered results) do seem like they could indeed help resolve many questions about research replicability (particularly in fields with high task uncertainty) without requiring direct/diagnostic replications to be conducted in every case.

This month’s article is Amateur hour: Improving knowledge diversity in psychological and behavioural science by harnessing contributions from amateurs (preprint) from Mohlhenrich & Krpan, who are the founders of Seeds of Science @SeedsofScience, which was previously described in this forum post (and shout out to Mohlhenrich, who is actually an amateur researcher himself). They claim (and I generally agree) that:

Psychological and behavioral science (PBS) suffers from a lack of diversity in its key intellectual and research activities (Krpan, 2020; Medin, Ojalehto, Marin, & Bang, 2017). This low “knowledge diversity” is reflected in numerous aspects of the field—certain research topics (e.g., those that may be easily publishable) are prioritized over other important but less desirable topics (e.g., those that are not heavily cited or easy to publish); some methodologies such as experimentation are widely used whereas less common methods (e.g., self-observation) are neglected; short-term projects with quick gains are prioritized over the long-term ones; some participant populations are understudied (e.g., non-WEIRD samples; i.e., non-western, educated, industrialized, rich and democratic, Henrich, Heine, & Norenzayan, 2010); and theorizing is driven by arbitrary conventions and overly reliant on available research findings while avoiding speculation that could lead to new insights (Krpan, 2020, 2021a; Medin et al., 2017; Stanford, 2019).

Several strategies for long term systematic change have already been proposed to increase knowledge diversity in PBS. But the authors propose an alternative, short term strategy: ‘harnessing contributions from amateurs who can explore the diverse aspects of psychology that are neglected in academia’ and go on to describe (and provide examples of) five types of amateurs who can be categorized on the basis of their expertise distance and expertise level (Figure 1):

  1. Independent scientists
  2. Outsiders
  3. Undergraduate students
  4. Quantified self practitioners
  5. Citizen scientists

They then identify six types of problems (‘blind-spots’) that are not well incentivized in academia but would be suitable for amateur research (Table 1):

  1. Long-term projects
  2. Basic observational research
  3. Speculation
  4. Interdisciplinary projects
  5. Aimless projects
  6. Uncommon research areas

Finally, the authors provide suggestions across five areas for facilitating amateur participation in PBS, including:

  1. Encouraging non-traditional academic relationships
  2. Creating a digital amateur research hub
  3. Providing editorial support for amateurs at academic journals
  4. Founding an amateur PBS institute to support their work (joining the Ronin Institute or IGDORE is also an option for amateur researchers seeking institutional support)
  5. Reducing scepticism of professional PBS academics towards amateurs

To make the case for amateurs increasing knowledge diversity, the authors note:

The blind spots we have identified (and further ones that we are not even aware of, the “unknown unknowns”) arise from constraints that have both functional and mental aspects. For example, a PBS researcher may be discouraged from pursuing a long, aimless project (perhaps one that deals with a taboo subject) in a functional sense (e.g., they will not get jobs or tenure if they do not publish), but also in the mental sense—being systematically disincentivized to undertake such projects over time may influence them to adopt a mode of thinking that makes it difficult to spontaneously generate ideas for “blind spot” research. The main argument we are making in this article is that amateurs can more easily address the blind spots that hamper knowledge diversity than professionals because they are free from the functional constraints and are therefore also less likely to be hampered by the mental constraints.

As an independent scientist (well as a neuroscientist working on biophysics, I’d probably fit better with the ‘outsider’ category in this article), I’ve noticed that amateur/independent researchers often seem to come up with interesting ideas and innovative approaches for new research projects (myself included, although I may be a bit biased…). I don’t know if this is because we’ve been liberated from any mental constraints of academia, and there can certainly be other functional constraints (such as balancing salaried work w/ research, getting journal access, etc.), but I think it would be good to consider further. If there was evidence for amateurs being able to come up with ideas to address blind spots more easily than academic researchers, it would make a strong case to increase support for amateur research contributions (which wouldn’t be hard, as there isn’t much currently).

I think this paper is well worth reading for anybody interested in increasing knowledge diversity, regardless of whether they work inside or outside academia. If nothing else, both groups are likely to pick up ideas for new strategies they could use to generate research ideas and new ways they could seek support from the other group.

1 Like

Thanks for the kind words and glad you enjoyed the paper, Gavin! Happy to answer any questions on here.

1 Like

This month’s article is Biosecurity in an age of open science by Smith and Sandbrink, which discusses a valid criticism of Open Science - that in some fields making research open might lead to risks that are greater than any benefits that openness can bring. The idea of openness having negative consequences might sound strange to many advocates of OS but, thankfully, the scope of concern is (currently) limited to subfields in the life sciences, like synthetic mammalian virology, where the misuse of research findings could have widespread negative consequences:

Certain life sciences research may be misused and increase the risk from deliberate biological events. For example, though advances in viral engineering may be important in areas like vaccine design and cancer therapy, they could be applied to engineer pathogens with increased virulence or transmissibility. Deliberate release of such pathogens could result in a pandemic of unprecedented severity. Research with the greatest misuse potential has been labelled dual-use research of concern (DURC), defined by the National Institutes of Health in the US as “life sciences research that, based on current understanding, can be reasonably anticipated to provide knowledge, information, products or technologies that could be directly misapplied to pose a significant threat with broad potential consequences to public health and safety”

The article identifies the potential for misuse in four common OS practices:

  1. Open code: “Openly shared computational methods may therefore make pathogen engineering more accessible by reducing or even removing the need for laboratory expertise and equipment.
  2. Open data: “publicly available datasets could be used by malicious actors to inform the enhancement of pandemic pathogens. Beyond the generation of datasets with greater potential for misuse, improved computational methods mean that data can be more effectively used for malicious bioengineering
  3. Open methods: “Publication of detailed methods, for example, for the synthesis and engineering of pandemic pathogens, may also increase the risk of accidents and misuse. Detailed protocols may lower the tacit knowledge required to perform certain procedures, making them more accessible to bad actors, inappropriately qualified personnel, or personnel working in inappropriate facilities
  4. Preprinting: “Preprints may therefore remove the “gatekeeper” role that journals could play in mitigating risks from the publication of research with potential for misuse. Authors may select preprint servers that do not screen research. … there are examples where journals and editors have been important in evaluating risks from publication … Preprints may therefore increase the probability that dangerous methods or results are described publicly.

Having encountered the conflict between OS and dual-use research previously, I initially viewed this as being difficult to resolve while maintaining openness, but I was happy to see that the authors provided suggestions for strategies to mitigate risks from these practices (rather than abandoning them completely) and even sometimes encourage openness (see Figure 1). Specifically, using Open Code, Data and Methods practices in dual-use research could be facilitated by developing access-controlled repositories that “facilitate interoperability, reuse, and findability through enforcing or encouraging standards for metadata with common vocabularies and appropriate documentation”, while existing preprint servers could coordinate their screening to ensure that “If an article is flagged by at least one server as potentially concerning, other servers could agree not to post that article until it was appropriately peer-reviewed.

OS practitioners will also be happy to see that preregistration is identified as an OS practice that could be encouraged in dual-use areas as it is unlikely to create any additional concern, while it provides the opportunity for oversight that may mitigate the existing risk early in the research lifecycle: “It seems likely that greater consideration of the research before it is started, as encouraged by preregistration, could help to mitigate misuse risks. Currently, biosecurity risk assessment and management is not consistently conducted at any stage throughout the research lifecycle; preregistration could encourage greater consideration of risks at an early stage. Submission platforms could ask researchers to reflect on the dual-use potential of their work. In certain high-risk fields, platforms could request that details of hazard assessment be provided, which could be incentivised by journals requesting evidence of such assessments on publication.”. However, the authors note that the format of preregistrations is generally focused on confirmatory research, while it is exploratory studies may be more likely to present dual risks. I’ve also found it difficult to find resources on preparing preregistration for my own exploratory (but not dual-use!) research, and I feel that this an area of open science methodology that deserves further development and could be useful to a broad range of life scientists (not just those working in dual-use areas). Regardless of the format, “interventions aimed at encouraging review at the conception of research seem particularly promising.

Overall, the article provides a new perspective on balancing the benefits of Open Science against an unusual but valid ‘cost’ (that of increased misuse risk), and I think most would agree that “Open science and biosecurity experts have an important role to play in enabling responsible research with maximal societal benefit.


This month there is a double recommendation! Innovations in peer review in scholarly publishing: a meta-summary and Innovating peer review, reconfiguring scholarly communication: An analytical overview of ongoing peer review innovation activities are two complementary preprints, both from the Peer Review project at the Research on Research Institute (RORI). The first article provides a ‘review of reviews’ of six previous literature reviews (including two from @jon_tennant) on innovations in peer review, while the second article reports an inductively developed taxonomy of peer review innovation based on the responses to a survey sent to scholarly communication organisations. Together, the two articles provide an excellent overview of the state of peer review innovation and include far more material than I can summarise here, so I will simply present the frameworks in which the articles present their results and then comment on their conclusions.

The meta-summary places peer review innovations in the context of three categories and a variety of subcategories

  1. Approaches to peer review

    1. Open/masked peer review

    2. Pre/post publication review

    3. Collaboration and decoupling

    4. Focussed and specialised review

  2. Review focused incentives

    1. Reviewer incentives

    2. Reviewer support

  3. Technology to support peer review

    1. Current uses

    2. Potential models

An interesting point in the meta-summary is the description of the peer review innovation evaluations covered in Bruce et al (2016): ‘The authors found, based on these outcome measures, that compared with standard peer review, reviewer training was not successful in improving the quality of the peer review report and use of checklists by peer reviewers to check the quality of a manuscript did not improve the quality of the final article. However, the addition of a specialised statistical reviewer did improve the quality of the final article and open peer review was also successful in improving the quality of review reports. It did not affect the time reviewers spent on their report. Open peer review also decreased the number of papers rejected. Finally, blinded peer review did not affect the quality of review reports or rejection rates.

The survey overview led to a slightly more complex taxonomy with five main elements:

An interesting point from the overview is about the participation of patients in reviews at biomedical journals: ‘Our sample contains two examples that involve patients as reviewers for journals. Both obviously focus on biomedical research with a particular emphasis on its practical or clinical relevance - one is the BMJ, and the other Research Involvement and Engagement (published by BMC). Whereas in the former, patient reviewers are invited on a selective basis depending on the submission, the latter journal foresees the regular use of two patient reviewers and two academic reviewers for all manuscripts.

The overview notes that the diverse range of peer review innovations they have catalogued pull in mutually opposed directions. These oppositions include:

  1. Efficiency vs. rigour: ‘numerous innovations in the categories “role of reviewers” and “transparency of review” aim to increase the efficiency of peer review, which can to some extent be seen as a remedy to the growing amount and cost of review work’ ← → ‘Our data firstly suggests that many innovations in the categories “objects” and “nature of review” amount to promoting more rigorous quality control, namely by multiplying the objects of and occasions for review.
  2. Singular vs. pluralistic considerations of quality: ‘Registered reports assume a specific understanding of how the research process should be organized, based on an epistemological ideal of particular forms of experimental science.’ ← → ‘Innovations that remove social and disciplinary boundaries to reviewing … [and] deanonymize the review process … encourage a deliberative approach where potentially opposed speakers can explicitly address each other, even though there is also a risk that reviewers may not feel comfortable giving their frank opinion in a deanonymized setting.
  3. Transparency vs. objectivity: ‘making review reports and reviewer identities transparent is now a widely offered possibility … assuming that disclosing identities of authors and reviewers is useful for accountability in peer review’ ← → ‘there are also some signs of a trend towards abandoning mandatory disclosure of reviewer identities (BMC) and towards double-blind peer review (IOP Publishing) … presupposing that objectivity of peer review requires anonymity of authors and reviewers

While the overview mediates the opposing directions of peer review development by calling for coordination between scholarly communication innovators (an activity that the RORI would be well-positioned to facilitate), the meta-summary also identified a strong conclusion that there is the need ‘for a wider reflection on the peer review process as a research community, with both Barroga (2020) and Tennant (2018) underscoring the need to consider what different stakeholders bring to the peer review process and which role they inhabit.’ It seems that ‘coordination’ should not just be limited to scholarly communication organisations but extend into consultation with peer review stakeholders in the broader academic community. Furthermore, the meta-summary also reports ‘Three reviews (Bruce et al, 2016; Horbach & Halffman, 2018; Tennant et al, 2017) conclude that there is a lack of empirical evidence to assess the effectiveness of innovations in peer review.’ This lack of empirical evidence provides little basis for assessing trade-offs made by moving along the opposing directions of innovation, and empirical studies should be prioritised if the evidence base can keep up with the rapid innovation.

Despite the lack of empirical evidence and opposing directions of development, it seems to be a promising time for peer review innovation as the meta-summary notes: ‘The fact that there are enough review articles to warrant a review of reviews, indicates the growing maturity of the field of peer review research.

Ludo Waltman let me know that there is a newer article in this series that I missed: How to improve scientific peer review: Four schools of thought. Based on the inquiry into peer review innovation in the preceding articles, this article proposes four schools of thought that shape the innovation landscape:

  1. The Quality & Reproducibility school
  2. The Democracy & Transparency school
  3. The Equity & Inclusion school,
  4. The Efficiency & Incentives school

The title of this month’s article says a lot: Correction of scientific literature: Too little, too late! The article is short and well worth reading through, so I’ll also keep this recommendation short and to the point.

Essentially, the COVID-19 pandemic lead to a lot of fast and high-profile science getting published, some of which cut corners in terms of quality control and transparency. Yet, the article notes that the traditional response to poor quality of fraudulent research are too little: ‘Nowadays, preprints and peer-reviewed research papers are rapidly shared on online platforms among millions of readers within days of being published. A paper can impact worldwide health and well-being in a few weeks online; that it may be retracted at some point months in the future does not undo any harm caused in the meantime. Even if a paper is removed entirely from the publication record, it will never be removed from the digital space and is still likely to be cited by researchers and laypeople alike as evidence. Often, its removal contributes to its mystique.’ And because papers are shared so fast, the traditional responses usually come too late: ‘Identifying flaws in a paper may only take hours, but even the most basic formal correction can take months of mutual correspondence between scientific journal editors, authors, and critics. Even when authors try to correct their own published manuscripts, they can face strenuous challenges that prompt many to give up.

A key point highlighted by the article is that scientific critics are rarely rewarded, and often penalized or stigmatized, for their work to correct scientific errors. Indeed, the authors of this article speak from personal experience: ‘[We] have all been involved in error detection in this manner. For our voluntary work, we have received both legal and physical threats and been defamed by senior academics and internet trolls.’ I agree with their position that ‘Public, open, and moderated review on PubPeer [8] and similar websites that expose serious concerns should be rewarded with praise rather than scorn, personal attacks, or threats

The article provides several recommendations to facilitate faster and more visible scientific correction, and importantly three more aimed at destigmatizing the work of error correctors:

  • Rewarding scientific error correction during assessments for hiring, promotion and funding.
  • Train scientists to recognize mistakes and scientific institutions and funders to value error-checking.
  • Provide legal protection for scientific critics who raise concerns in a professional and non-defamatory manner.

Is this enough? it seems like the least the scientific community could do…

This month’s recommendation is for the light-hearted Night Science series of editorials written by Yanai and Lercher. They started by introducing the concept of Night Science, the unstructured and apparently haphazard search for possible hypothesis and theories, as the counterpart to Day Science, where hypotheses are rigorously tested through experimentation. Later articles in the series have explored:

I have thoroughly enjoyed reading all the Night Science series so far, but will limit myself to sharing some thoughts on Yanai and Lercher’s latest article, What puzzle are you in? (I’d love to discuss them all, but I realised I’d committed to more than I wanted to last time I tried to do this recommendation for two articles, let alone a series!)

The authors start by presenting several examples of artificial puzzles and dimensions along which they can be classified which can also be useful for natural problems solved by researchers:

Despite their complexity, nature’s puzzles can be classified in the same way as puzzles humans invented for entertainment: jigsaw puzzles [Class I], logical puzzles [Class II], puzzles where we need to find connections to phenomena outside the problem description [Class III], and puzzles that require us to think outside the box [Class IV], often by identifying and dropping implicit assumptions. These archetypes can be distinguished along with two dimensions: whether they are closed-world or open-world and whether the solutions require either making connections or deeper insights into the problem structure.

Some specific examples were given for scientific discoveries that match these puzzles are:

  1. Jigsaw puzzles - genome assembly and protein crystallography
  2. Logical puzzles - comma-free coding of codons into proteins
  3. Outside connections - Gödel’s incompleteness theorems and natural selection
  4. Out of the box - CRISPR

But the article makes an important observation about the real practice of science: ‘When we actively work on a scientific problem, we have no way to be certain what kind of a puzzle we are in, or if the puzzle as we see it even has a solution. Solving research puzzles is a hierarchical problem. You not only have to find the solution to a puzzle that belongs to one of the four classes. You also have to solve the meta-puzzle of discovering what class of puzzle you are in.’ And frustratingly (illustrated with examples of the authors’ own genomics research): ‘At any instance, the puzzle may switch, making you realize that you are in a different kind of puzzle than you originally thought.

The authors also speculate that there may be common patterns for how research problems switch between classes of puzzles, which seems like it could be a useful approach to explore further for systematising scientific problem-solving. Treating research like puzzles might ultimately be both an enjoyable and productive strategy to approach the ambiguity inherent in it:

Adopting the mindset of a puzzle solver may help us to reframe this uncertainty—we may view it as part of a playful process, allowing us to have an open mind and to not stick rigidly to the project’s original framing. Without this playful, puzzle-solving attitude, we may not only limit the joy of doing science. We may also miss out on quite a few insights, big or small.

The dimensions of puzzle classification prompted me to think about how this related to deductive and inductive reasoning. I initially expected these modes of reasoning would correspond to closed and open-world puzzles, respectively. Indeed, solving closed-world puzzles clearly seems to be deductive, while reframing open-world puzzles does seem inductive (as would be the meta-puzzle of determining which puzzle class you are working with). Yet, (at least to me) the examples provided for finding connections in open-world puzzles appear to be split between both types of reasoning: relabelling the pot and Gödel’s proofs both still seem to be applications of deduction, while Darwin’s theory is more obviously induction. I feel it is in the spirit of the series to suggest that looking at additional examples of this class of puzzles could be a useful exercise to refine the framing of the puzzle classification dimensions. (or maybe I have just done a bad job of classifying reasoning required for the Class III puzzle examples!)

I’d also like to add a shoutout to a class of puzzles I enjoy (and struggle with…): Bongard problems. These are games of inductive reasoning popular in computer science, and there is a large collection of problems available here if the reader would like to try some. (David Chapman also has a nice post relating Bongard problems to his idea of meta-rationality, which treats some ideas that are similar to the discussion of the meta-puzzle of puzzle class identification in this article)

If you are too excited about Night Science to wait for the next article, then I also recommend the Night Science podcast that the authors host.


This month article is Ten simple rules for implementing open and reproducible research practices after attending a training course preprint from Heise et al. Many people attend training courses about robust research and open science at conferences or other events, but learning the material is just the first step—implementing the practices in your research often leads to unexpected challenges to overcome (particularly if your colleagues aren’t as enthusiastic about making changes as you are!) This article presents ten clear and concise rules to help everybody make the most of their robust research training:

  1. Join a robust research community to access expertise and support (this forum and IGDORE are two possibilities)
  2. Shortlist the practice you’d like to try implementing first in a project you are currently working on
  3. Discuss the changes you want to make with your research team
  4. Prepare for concerns your colleagues may have and address them constructively
  5. Set up an implementation plan after your team has reached agreement
  6. Compromise if needed and stay patient while working towards long-term improvements
  7. Make your changes sustainable by creating documentation and peer support structures
  8. Continue developing your competencies and seek recognition for doing so
  9. Practice self-care and avoid burnout
  10. Find future employers/colleagues who share your values and will utilize your robust research skills!

As a bonus, the article also provides 10 tips to help course organizers prepare their participants for the challenges of implementation:

  1. Consider the background of your participants when designing course material
  2. Cover a range of topics so different participants all find something of interest
  3. Talk about how the course content relates to institutional and funder policies
  4. Train participants in ‘soft-skills’ they can use to encourage behavioral change in their research team
  5. Allow time for participants to start implementing the practices they are being taught
  6. Avoid overwhelming your audience by breaking the training up into a series of short sessions
  7. Make the teaching material and resources reusable, so participants can host their own training events
  8. Create communities and networks for alumni to stay in touch with each other and the course organizers
  9. Organize times for participants to focus on implementing specific practices
  10. Plan to host follow-up events to keep the momentum going!

Check the article for more details on all the points above.


This month’s article is Reimagining peer review as an expert elicitation process by Marcoci et al. I came across it when interacting with several of the authors on another peer review project and thought the idea of using structured expert elicitation as a peer review method very interesting. Indeed, it seems to go well beyond how structured reporting, cross-reviewer commenting and collaborative reviews are described in previous research on peer review innovation conducted by the RoRI (which I recommended here a few months ago), and may provide a more robust extension of the ‘discussion during review’ model being used at several journals (see Horbach and Halffman 2018).

A structured expert elicitation process 'can demonstrably improve the quality of expert judgements, especially in the context of critical decisions’. The authors base their recommendations on their ‘collective experience developing and implementing the IDEA protocol (Investigate—Discuss—Estimate—Aggregate) for structured expert elicitation in diverse settings including conservation, intelligence analysis, biosecurity, and, most recently, for the collaborative evaluation of research replicability and credibility’. The latter setting refers to the well known repliCATS project, in which the IDEA protocol ‘has been shown to facilitate accurate predictions about which research findings will replicate by prompting experts to investigate and discuss the transparency and robustness of the findings in a structured manner.

A summary of the basic steps of the IDEA protocol is (from Hemming et al 2017):

A diverse group of experts is recruited to answer questions with probabilistic or quantitative responses. The experts are asked to first Investigate the questions and to clarify their meanings, and then to provide their private, individual best guess point estimates and associated credible intervals (Round 1). The experts receive feedback on their estimates in relation to other experts. With assistance of a facilitator, the experts are encouraged to Discuss the results, resolve different interpretations of the questions, cross-examine reasoning and evidence, and then provide a second and final private Estimate (Round 2). Notably, the purpose of discussion in the IDEA protocol is not to reach consensus but to resolve linguistic ambiguity, promote critical thinking, and to share evidence. This is based on evidence that incorporating a single discussion stage within a standard Delphi process generates improvements in response accuracy. The individual estimates are then combined using mathematical Aggregation.

The present article ‘outline[s] five recommendations focusing on individual and group characteristics that contribute to higher quality judgements, and on ways of structuring elicitation protocols that promote constructive discussion to enable editorial decisions that represent a transparent aggregation of diverse opinions’. These are:

  • Elicit diverse opinions: Leverage the wisdom of the crowd by incorporating reviewers with diverse backgrounds and perspectives
  • Challenge conventional definitions of expertise: The judgement of individual or small groups of experts isn’t always very good, but aggregating the feedback of larger groups of reviewers, drawn from outside traditional expert reviewer pools, may provide more accurate decisions
  • Provide structure: Quantitative estimates of research quality can be aggregated mathematically and quantifies uncertainty in the reviewer judgements
  • Encourage and facilitate interaction: Group discussion often identifies errors and leads to novel ideas that individuals wouldn’t reach by themselves.
  • Anonymise judgements: Social influences can undermine the wisdom of the crowd (i.e. group think)

While the IDEA protocol has been able to increase the collective accuracy of expert judgements in a variety of settings, ‘[t]o what extent similar effects can be achieved in peer review is an empirical question that remains unaddressed.’ I would certainly be excited to hear about a journal experimenting with peer review based on the IDEA protocol, although, as the article concludes, it ‘will require some editorial bravery’!

1 Like

This month we will be looking at Open science and public trust in science: Results from two studies by Rosman et al. The article reports the results of two empirical studies on how open science practices influence public trust in science, and also provides an excellent introduction that thoroughly covers this topic.

Many open science advocates would say that open science makes research more trustworthy, but ‘despite increasing adherence to open science practices (OSPs) in the scientific community, little is known about the public’s expectations of such practices and their effects on the perceived trustworthiness of science.’ Indeed, ‘the few experimental studies on the relationship between OSPs and trust in science have yielded rather inconclusive results’, with one finding open science badges increased trust in scientists, two other studies providing inconclusive results and, finally, a study found the informing participants about the replicability crisis in psychology (including proposed reforms) reduced their trust in future research.

The current article built on the previous work by, firstly, replicating survey results on the beneficial effects of OSPs on trust and, secondly, using an experimental study using a vignette-based manipulation to test if OSPs were causally related to trust. Additionally, the research extended on prior research by addressing the field specificity of the relationship between OSP practices and trust (between science as a whole, psychology, and medicine), and the influence of whether research was publicly or privately funded (including whether OSPs buffered the trust-damaging effect of private funding). To their credit, the authors clearly practice the OSPs they study, and the article contains links to their preregistration, materials, data, and code.

The survey found:

  • An overwhelming majority of our sample found it important that researchers make their findings openly accessible and that they implement OSPs
  • a large proportion of participants indicated that their trust in a scientific study would increase if they saw that researchers made their materials, data, and code openly accessible.

However, the experimental study was less conclusive:

  • there are some indications in our data that the use of OSPs may increase trust—although it should also be noted that the corresponding effect sizes were rather small.
  • analyses yielded evidence for the effects of a study’s funding type on trust, such that publicly funded studies are trusted more than privately funded ones.
  • the trust-damaging effects of research being privately funded may be buffered by OSPs … this hypothesis was clearly not supported by our data.

After placing these finding in the context of the other existing experimental studies, the two studies ‘imply that people may well recognize open science as a trust-increasing factor, especially when directly asked about it, but that other factors such as communication strategies may play a comparatively stronger role in the development of trust in science.’ As well as focusing on communication, the discussion also notes that ‘combining increased transparency with such participatory approaches may thus be even more promising to increase trust in science compared to transparency alone.’

While the authors conclude their ‘results suggest that OSPs may well contribute to increasing trust in science and scientists’ and ‘recognize the potential role of OSPs in attenuating the negative effects of the replication and trust crisis’, reassuringly they findings also show that the public’s trust in science is actually already rather high.

1 Like