Monthly Reading Recommendations

Many members of the forum community are interested in Open Science, but it’s not always easy to keep up with the latest OS literature if you’re not a metascience researcher yourself (well, if you’re not active on Twitter…). So I wanted to encourage people to post new OS/metascience paper/preprints here, particularly if they think the article will be of general interest and stimulate discussion in the forum.

A previous example of preprint that generated some discussion is Metascience as a scientific social movement

Like the one above, articles are welcome to cover advanced topics and issues around Open Science. I’d refer anybody looking for an introduction to OS to Easing Into Open Science: A Guide for Graduate Students and Their Advisors and the material in the Intro Papers folder ReproducibiliTea Zotero library.

If a given article generates a bit of discussion then I or another moderator can split it into another thread. I’ll also pick an interesting article to feature in the Latest IGDORE Newsletter each month (and give a shout out to whoever posted it originally).


Here is a paper on journal impact factors to start things off (h/t @pcmasuzzo)

Journal impact factors, publication charges and assessment of quality and accuracy of scientific research are critical for researchers, managers, funders, policy makers, and society. Editors and publishers compete for impact factor rankings, to demonstrate how important their journals are, and researchers strive to publish in perceived top journals, despite high publication and access charges. This raises questions of how top journals are identified, whether assessments of impacts are accurate and whether high publication charges borne by the research community are justified, bearing in mind that they also collectively provide free peer-review to the publishers. Although traditional journals accelerated peer review and publication during the COVID-19 pandemic, preprint servers made a greater impact with over 30,000 open access articles becoming available and accelerating a trend already seen in other fields of research. We review and comment on the advantages and disadvantages of a range of assessment methods and the way in which they are used by researchers, managers, employers and publishers. We argue that new approaches to assessment are required to provide a realistic and comprehensive measure of the value of research and journals and we support open access publishing at a modest, affordable price to benefit research producers and consumers.

Some old arguments against the impact factor:

In 1997 Per Seglen (1997) summarized in four points why JIFs should not be used for the evaluation of research:

  1. “Use of journal impact factors conceals the difference in article citation rates (articles in the most cited half of articles in a journal are cited 10 times as often as the least cited half).
  2. Journals’ impact factors are determined by technicalities unrelated to the scientific quality of their articles.
  3. Journal impact factors depend on the research field: high impact factors are likely in journals covering large areas of basic research with a rapidly expanding but short lived literature that uses many references per article.
  4. Article citation rates determine the journal impact factor, not vice versa.”

Some existing alternatives:

A number of alternative metrics to JIF have been developed (Table 1). All of these are based on citation counts for individual papers but vary in how the numbers are used to assess impact. As discussed later, the accuracy of data based on citation counts is highly questionable.

  • CiteScore calculates a citations/published items score conceptually similar to JIF but using Scopus data to count four years of citations and four years of published items.
  • The Source Normalized Impact Factor also uses Scopus data to take a citation/published items score and normalizes it against the average number of citations/citing document.
  • The Eigenfactor (EF) and Scimago Journal Rank work in a manner analogous to Google’s PageRank algorithm, employing iterative calculations with data from Journal Citation Reports and Scopus respectively to derive scores based on the weighted valuations of citing documents.
  • Finally, h-indexes attempt to balance the number of papers published by an author or journal against the distribution of citation counts for those papers. This metric is frequently used and is discussed in more detail in a following section.

Some new suggestions for evaluative criteria:

If so, and recognizing that any evaluation based on a single criterion alone can be criticized, what are the criteria we should consider in order to devise a more effective system for recognition and assessment of accomplishments which also supports an equitable publishing process that is not hidden behind expensive paywalls and OA fees? The following are all metrics that could be collectively looked at to aid in assessment although as we have discussed, if used alone, all have their limitations:

  1. Contribution of an author to the paper including preprints, i.e. first author, last author, conducted experiments, analyzed data, contributed to the writing, other?
  2. Number of years active in research field and productivity
  3. Number of publications in journals where others in the same field also publish
  4. Views and downloads
  5. Number of citations as first, last, or corresponding author.

The authors’ get bonus points for including a reference to a Bob Dylan song as well!

I think that final points could be good factors to start making an assessment metric. One thing I note is that these are all quantitive factors that are easy to collect, but I wonder if there is scope to include other qualitative assessments (although these generally require more effort to create)? For instance, I think that peer usage and validation of research findings and/or output would be a very positive indicator. A replication study is basically the essence of peer validation, but in some cases, information indicating usage could be easier to get (e.g. looking at forks/contributions to a code repository that go on to be used for other papers).

Any thoughts? Or other suggestions for assessment metrics?


I came across this month’s preprint at a Nowhere Lab meeting (h/t Dwayne Lieck). The paper is quite heavy on Philosophy of Science, but I think it does a good job of showing the value of (combining) different types of replications.

A Falsificationist Treatment of Auxiliary Hypotheses in Social and Behavioral Sciences: Systematic Replications Framework

In short:

we investigate how the current undesirable state is related to the problem of empirical underdetermination and its disproportionately detrimental effects in the social and behavioral sciences. We then discuss how close and conceptual replications can be employed to mitigate different aspects of underdetermination, and why they might even aggravate the problem when conducted in isolation. … The Systematic Replications Framework we propose involves conducting logically connected series of close and conceptual replications and will provide a way to increase the informativity of (non)corroborative results and thereby effectively reduce the ambiguity of falsification.

The introduction is catchy:

At least some of the problems that social and behavioral sciences tackle have far-reaching and serious implications in the real world. Among them one could list very diverse questions, such as “Is exposure to media violence related to aggressive behavior and how?” … Apart from all being socially very pertinent, substantial numbers of studies investigated each of these questions. However, the similarities do not end here. Curiously enough, even after so much resource has been invested in the empirical investigation of these almost-too-relevant problems, nothing much is accomplished in terms of arriving at clear, definitive answers … Resolving theoretical disputes is an important means to scientific progress because when a given scientific field lacks consensus regarding established evidence and how exactly it supports or contradicts competing theoretical claims, the scientific community cannot appraise whether there is scientific progress or merely a misleading semblance of it. That is to say, it cannot be in a position to judge whether a theory constitutes scientific progress in the sense that it accounts for phenomena better than alternative or previous theories and can lead to the discovery of new facts, or is degenerating in the sense that it focuses on explaining away counterevidence by finding faults in replications (Lakatos, 1978). Observing this state, Lakatos maintained decades ago that most theorizing in social sciences risks making merely pseudo-scientific progress (1978, p. 88-9, n. 3-4). What further solidifies this problem is that most “hypothesis-tests” do not test any theory and those that do so subject the theory to radically few number of tests (see e.g., McPhetres et. al., 2020). This situation has actually been going on for a considerably long time, which renders an old observation of Meehl still relevant; namely, that theoretical claims often do not die normal deaths at the hands of empirical evidence but are discontinued due to a sheer loss of interest (1978).

As researchers whose work doesn’t directly replicate point out, a failed replication doesn’t necessarily mean a theory is falsified:

this straightforward falsificationist strategy is complicated by the fact that theories by themselves do not logically imply any testable predictions. As the Duhem-Quine Thesis (DQT from now on) famously propounds, scientific theories or hypotheses have empirical consequences only in conjunction with other hypotheses or background assumptions. These auxiliary hypotheses range from ceteris paribus clauses (i.e., all other things being equal) to various assumptions regarding the research design and the instruments being used, the accuracy of the measurements, the validity of the operationalizations of the theoretical terms linked in the main hypothesis, the implications of previous theories and so on. Consequently, it is impossible to test a theoretical hypothesis in isolation. In other words, the antecedent clause in the first premise of the modus tollens is not a theory ( T ) but actually a bundle consisting of the theory and various auxiliary hypotheses ( T , AH 1, …, AH n). For this reason, falsification is necessarily ambiguous. That is, it cannot be ascertained from a single test if the hypothesis under test or one or more of the auxiliary hypotheses should bear the burden of falsification (see Duhem, 1954, p. 187; also Strevens, 2001, p. 516).1 Likewise, Lakatos maintained that absolute falsification is impossible, because in the face of a failed prediction, the target of the modus tollens can always be shifted towards the auxiliary hypotheses and away from the theory (1978, p. 18-19; see also Popper, 2002b, p. 20).

Popper considered auxiliary hypotheses to be unimportant background assumptions that researchers had to demarcate from the theory being tested by designing a good methodology. But this is hard to do in the social sciences (my experience suggests this is probably true in many areas of biology as well):

In the social and behavioral sciences, relegating AH s to unproblematic background assumptions is particularly difficult, and consequently the implications of the DQT are particularly relevant and crucial (Meehl, 1978; 1990). For several reasons we need to presume that AH s nearly always enter the test along with the main theoretical hypothesis (Meehl, 1990). Firstly, in the social and behavioral sciences the theories are so loosely organized that they do not say much about how the measurements should be (Folger, 1989; Meehl, 1978). Secondly, AH s are seldom independently testable (Meehl, 1978) and, consequently, usually no particular operationalization qualitatively stands out. Besides, in these disciplines, theoretical terms are often necessarily vague (Qizilbash, 2003), and researchers have a lesser degree of control on the environment of inquiry, so hypothesized relationships can be expected to be spatiotemporally less reliable (Leonelli, 2018). Moreover, in the absence of a strong theory of measurement that is informed by the dominant paradigm of the given scientific discipline (Muthukrishna & Henrich, 2019), the selection of AH s is usually guided by the assumptions of the very theory that is put into test. Consequently, each contending approach develops its own measurement devices regarding the same phenomenon, heeding to their own theoretical postulations. Attesting to the threat this situation poses for the validity of scientific inferences, it has recently been shown that the differences in research teams’ preferences of basic design elements drastically influence the effects observed for the same theoretical hypotheses (Landy et al., 2020).

The proposed Systematic Replications Framework (also depicted in Fig. 2):

SRF consists of a systematically organized series of replications that function collectively as a single research line. The basic idea is to bring close and conceptual replications together in order to weight the effects of the AH pre and AH out sets on the findings . SRF starts with a close replication, which is followed by a series of conceptual replications in which the operationalization of one theoretical variable at a time is varied while keeping that of the other constant and then repeats the procedure for the other leg.

Its benefits for hypothesis testing are:

SRF reduces ambiguities implied by the DQT in original studies as well as in close and conceptual replications. Primarily, it allows for non-corroborative evidence to have differential implications for the components of the TH & AH s bundle. Thereby these components can receive blame not collectively but in terms of a weighted distribution. In cases where it is not possible to achieve this, it allows demarcating on which pairings from possible AH pre and AH out sets the truth-value of the TH is conditional. In all cases, the confounding effects deriving from the AH s can be relatively isolated. Lastly, SRF can enable that we approximate to an ideal test of a theoretical hypothesis within the methodological falsificationist paradigm by embedding alternative operationalizations and associated measurement approaches into a severe testing framework (see Mayo, 1997; 2018).

Besides replications, the SRF could also be useful for doing systematic literature reviews:

Another potential practical implication of SRF lies in using the same strategy of logically connecting different AH bundles in conducting and interpreting systematic literature reviews (particularly when the previous findings are mixed). Such a strategy can help researchers distinguish the effects that seem to be driven by certain AH s from the ones in which the TH is more robust to such influences. To put it differently, in a contested literature there are already numerous conceptual replications that have been conducted, and at least some of these replications rely on the same AH s in their operationalizations. Therefore, to the extent that they have overlaps in their AH s, their results can be organized in such a way that resembles a pattern of results that can be obtained with a novel research project planned according to SRF. The term “systematic” in systematic literature review already indicates that the scientific question to be investigated (i.e., the subject-matter, the problem or hypothesis), the data collection strategy (e.g., databases to be searched, inclusion criteria) as well as the method that will be used in analyzing the data (e.g., statistical tests or qualitative analyses) are standardized. However, for various reasons (e.g., to limit the inquiry to those studies that use a particular method), not every systematic literature review is conducive to figuring out whether the TH is conditional on particular AH sets. An SRF-inspired strategy of tabulating the results in a systematic literature review will also help researchers in appraising the conceptual networks of theoretical claims, theoretically relevant auxiliary assumptions and measurements. Thus, it can eventually help in appraising the verisimilitude of the TH by revealing how it is conditional on certain AH s, and can lead to the reformulation or refinement of the TH as well as guide and constrain subsequent modifications to it.

In closing:

The decade-long discussion on a replicability and confidence crisis in several disciplines of social, behavioral and life sciences (e.g., Camerer et al., 2018; OSC, 2015; Ioannidis, 2005) has identified the prioritization of the exploratory over the critical mission as one of the key causes, and led to proposals for slowing science down (Stengers, 2018), applying more caution in giving policy advice (Ijzerman et al., 2020), and inaugurating a credibility revolution (Vazire, 2020). All potential contributions of SRF will be part of a strategy to prioritize science’s critical mission on the way towards more credible research in social, behavioral, and life sciences. This would imply that the scientific community focuses less on producing huge numbers of novel hypotheses with little corroboration and more on having a lesser number of severely tested theoretical claims. Successful implementation of SRF also requires openness and transparency regarding both positive and negative results of original and replication studies (Nosek et al., 2015) and demands increased research collaboration (Landy et al., 2020). Ideally, this would also take the form of adversarial collaboration.

@surya re. the adversarial collaborations. It’s discussed in more detail in its own section.

I’d be interested to hear from some people involved in current psychology replication projects about their thoughts on using conceptual replications to test auxiliary hypotheses vs. just using close/direct replications.

This paper also reminded me of the old concept of strong inference, which also focuses on testing a variety of alternative hypotheses in a given study.


On reflection, I realized that this statement combined with the idea of auxiliary hypothesis reminded me a lot of Technology Readiness Level, which is a framework used to assess the progress from the observation of phenomena to it being used in a mature design that is deployed in a real-world system (originally developed for use in aerospace design by NASA, although I believe it is now being used more broadly by the European Commission to assess all types of innovation). I think that this write up by Ben Reinhardt is quite a helpful introduction: Technology Readiness Levels

How does this relate? Well, testing a hypothesis in the lab vs. using the idea reliably in the real world requires a greater understanding of the phenomena being used, and this could be considered as refining the set of auxiliary hypotheses that determine when the core hypothesis can be still observed in conditions of decreasing experimental control (although I’ve never seen it framed like this). I wonder if refining auxiliary hypotheses could be a useful framing for applied research in academia and am now quite motivated to read more about Laktos’s model of research programs

This month’s article is from the authors of the preprint Metascience as a scientific social movement which criticaly reflects on the structure of the current scientific reform movement (we previously discussed it here). Peterson and Panofsky now put forward Arguments against efficiency in science (not OA, but preprinted), which is a short response to Hallonsten’s Stop evaluating science: A historical-sociological argument.

Peterson and Panofsky note:

The arguments of the proponents of evaluations and metascientific reform are firmly rooted in the values of liberal society: transparency, accountability, and productivity. Counterarguments are easily cast as defensiveness or obscurantism. This is not just an academic problem. Scientists we interviewed told us that they felt constrained expressing their skepticism of reforms because, while reformers can draw on popular rhetoric of how science should operate, critics must wade into the murky waters of real scientific practice. … Ultimately, our goal is not to suggest that the concept of efficiency has no place in science but, rather, that efficiency is only one value in a cluster of values that includes utility, significance, elegance and, even, sustainability and justice. That efficiency is the easiest to articulate because it accords with other dominant bureaucratic and economic values should not allow it to win policy discussions by default. The fact that the argument against efficiency is challenging makes it all the more pressing to make it.

I am inclined to agree, I just skimmed Stop Evaluating Science - it is interesting, but draws a rather nebulous argument that incorporates discussions on the economization, distrust, democratization and comidification of science to ultimately end in rhetorical argument against scientific evaluation: Questions like ‘has science been productive enough?’ and ‘how can it be proven that science has been productive enough?’ shall first and foremost be answered with a rhetorical question, namely, ‘how else do you suppose that we have achieved this level of wealth and technical standard in Europe and North America?’. But as Hallonsten then notes:

In spite of the overwhelming logic of this rhetorical counter-question, and the historical evidence that supports it, champions of the view that science is insufficiently productive and must be made productive and held accountable through limitations to its self-governance and the use of quantitative performance appraisals will demand evidence that they can comprehend and, preferably, compare with their own simple and straightforward numbers. A list of counter-examples will therefore probably not suffice, since it can be discarded as mere ‘anecdotal evidence’ against which also the shallowest and most oversimplified statistics usually win.

Peterson and Panofsky present a brief argument against efficiency based on two key points. Firstly, efficiency shouldn’t be equated to scientific progress because we don’t agree on what progress is:

Our inability to chart basic scientific progress undermines the ability to measure efficiency. The notion of efficiency only makes sense in the context of established means/ends relationships. The goal is to organize the means in the optimal way to achieve the desired end. The problem is that, in the area of basic science, the end is unknown. … There is little agreement among the scientists themselves about what constitutes a significant contribution. There is reason to believe this dissensus is not a mere technical deficiency, but is a constitutive feature of the cutting edge of science (Cole, 1992: 18). Rather than clarity, these accounts underscore the complexity of conceptualizing progress in science.

Secondly, the incentivizing efficiency may have counterproductive outcomes compared to which existing inefficient practices are preferable. Incentives may be particularly difficult to apply in academic environments as:

scientific cultures are not Lego sets that can be broken down and rebuilt anew. They have organically evolved their own systems of communication and evaluation. They interpret broadly accepted, but abstract, values like skepticism, verification, and transparency in ways sensible to their particular contexts. Applying blanket rules to maximize efficiency in such systems can lead to unintended and, even, counterproductive outcomes.

The mistaken assumption of trying to make science more efficient stems from misinterpreting scientists as nothing more than value-maximizing, incentive-driven agents. Reformers in science have adopted economic language and, in so doing, have treated scientists as actors primarily motivated by material rewards (e.g., Harris, 2017; Nosek et al., 2012). This can be compared to a Mertonian account which would view them motivated by the interlocking system of scientific norms. Under an economic account, the best way to change behavior in science is to alter the incentive structure to reward or punish specific behaviors. Rational scientists will then react to those incentives and outcomes can be ensured.

The problem with incentive-based legislation has been detailed in a recent book by economist Samuel Bowles (2016). He argues that trying to engineer social systems by treating actors as thoroughly self-interested and incentive-driven ignores the useful role that preexisting cultural values play. In the reformer’s mind, newly introduced incentives and existing preferences are ‘additively separable’ from existing values. That is, if actors already value a behavior, then adding an incentive can only have a positive, cumulative effect. Yet, this need not be the case. Bowles details laboratory and field studies that show how the introduction of incentives can reduce or even reverse existing values.

I’m quite partial to the second point as I feel that grassroots cultural change in academia is more likely to lead to beneficial scientific reform than the use of top-down incentives and rules. Still, data on the effectiveness of institutional policies at promoting Open Science practices should be starting to become available, so this point may prove easier to resolve than the first.

I’m in favour of promoting Open Science, but I do think this paper was a thought-provoking critique that provided:

the beginnings of a counterargument, so that any reform dressed in the language of efficiency must address what it means by efficiency and how it might impinge on other values. Science reform should be a slow, reversable process with input from funders, institutions, those who study science, and, most importantly, the scientists themselves. And although defensiveness and obfuscation are enemies of science, resistance to reforms may have reasonable roots.

I was discussing Peterson and Panofsky’s paper with somebody who thought it didn’t clearly articulate what scientific efficiency meant from a metascience perspective, as it simply stated:

Metascientific activists have conceptualized efficiency in terms of improving the proportion of replicable claims to nonreplicable claims in the literature (e.g., Ioannidis, 2012).

Which was set against the status quo process for scientific progress:

a biologist at MIT who contrasted these organized replication efforts with what he viewed as the current ‘Darwinian process […] which progressively sifts out the findings which are not replicable and not extended by others’. Under this alternative theory of scientific efficiency, there is a natural process in which researchers produce many claims. Some may be flat wrong. Some may be right, yet hard to reproduce, or only narrowly correct and, therefore, be of limited use. However, some provide robust and exciting grounds to build upon and these become the shoulders on which future generations stand (Peterson and Panofsky, 2021)

But one point that came up is that surely reproducibility, whether it comes from directed efforts or natural selection, isn’t enough to ensure efficient scientific progress if you aren’t testing hypotheses that will lead to useful theoretical and/or practical progress in the first place. (note the papers first point is essentially we don’t know what progress is in basic science, see my post above)

This reminded me of the 2009 original article about avoidable research waste which proposed four stages of research waste: 1) irrelevant questions, 2) inappropriate design and waste, 3) inaccessible or incomplete publications, 4) biased or unusable reports (inefficient research regulation and management was later inserted at position 3). This paper is known for estimating that 85% of investment in biomedical research is wasted, but this only takes into account losses at stages 2, 3, and 4. It is these three stages that are then addressed by the two efficiency promoting manifestos cited by Peterson and Panofsky (Ioannidis et al. 2015 and Munafò et al. 2017) under the themes of improved Methods, Reporting and Dissemination, Reproducibility and Evaluation, all of which are supported by Incentives. Figure 1 of the latter manifesto does show Generate and specify hypothesis in a circular diagram of the scientific method, but in the context of scientific reproducibility, the discussion focuses on the risks that uncontrolled cognitive biases pose to hypothesising:

a major challenge for scientists is to be open to new and important insights while simultaneously avoiding being misled by our tendency to see structure in randomness. The combination of apophenia (the tendency to see patterns in random data), confirmation bias (the tendency to focus on evidence that is in line with our expectations or favoured explanation) and hindsight bias (the tendency to see an event as having been predictable only after it has occurred) can easily lead us to false conclusions.

Besides the metascience manifestos above, a 2014 Lancet series on increasing value and reducing waste in biomedical research also provided recommendations to address each stage of research waste. The first article in the series considered the problem of choosing what to research but primarily set this out as a challenge for funders and regulators when setting research priorities. While some suggestions are made that could be useful for researchers working doing clinical, applied or even use-inspired studies (namely, consider the potential user’s needs) the most broadly applicable advice for individual researchers seems to be using systematic and metareviews to ensure that existing knowledge is recognized and then used to justify additional work.

I feel that the question of what to research (particularly in basic research and for the individual researcher) has been neglected by metascientific reformers and their current focus on improving replicability. Don’t get me wrong, replicability is important as producing unreplicable results from testing innovative hypotheses doesn’t mean much, but I think the two aspects of efficient science need to move forward together.

Refreshingly, a recent article that introduced the Society of Open, Reliable, and Transparent Ecology and Evolutionary biology notes that promoting good theory development is an outstanding question for meta-research and provides a reference to the beguiling titled paper Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. I’ve yet to look at this last paper and its citations in detail, but I still wonder if I’ve missed something. Has work in metascience really not looked into problem selection as much as the other stages of research waste? Or is this being addressed using a different terminology or by a different field? Or do we continue to really on researchers developing the tacit skill of selecting good research questions during their training?

1 Like

I discovered this months preprint Empowering early career researchers to improve science in a session of the name at Metascience 2021 (recording). I was particularly impressed with the diverse range of perspectives (both in terms of early career researchers (ECR) demographics and types of initiatives) that the organizers drew together with an asynchronous unconference (which they reflect on here).

Besides catagorizing reform efforts into: ‘(1) publishing, (2) reproducibility, (3) public involvement and science communication, (4) diversity and global perspectives, (5) training and working conditions for early career researchers (ECRs), and (6) rewards and incentives.’ (section 1)

The preprint also discusses 7 why reasons ECR need to be involved in scientific improvements (section 2, briefly summarized):

  1. ECRs are future research leaders and should be able to shape the future of the research enterprise,
  2. ECRs are a more diverse group than senior scientists,
  3. ECRs may more open to new ideas and less ‘set in their ways’ than senior scientists,
  4. ECRs may still be motivated by unchallenged idealism (i.e. they haven’t yet become jaded and career-driven PIs),
  5. ECRs are at the forefront of technical advances and tool development,
  6. Some ECRs have more time and energy to commit to reform initiatives,
  7. ECRs represent the majority of the scientific workforce.

And six obstacles faced by ECRs involved in efforts to improve research culture (section 3, also summarized):

  1. Scientific improvement initiatives are rarely rewarded or incentivized and aren’t seen as a priority for career progression: ‘This contributes to an endless cycle, where ECRs who work to improve science are pushed out before securing faculty positions or leadership roles where they gain more ability to implement systemic changes.
  2. Limited funding and resources are available for ECRs (isn’t this true for everybody?) working on scientific reform,
  3. ECRs are generally excluded from decision making in existing institutions,
  4. ECRs often can’t set aside time for scientific imrovement work, their positions/career stage is unstable, and their supervisors may see scientific improvement as a diversion from their main research projects,
  5. ECRs are perceived as lacking the required experience to improve science,
  6. The western scientific culture creates added challenges from ECRs who are from marginalized groups.

Yet other stakeholders and using good practices can help ECRs overcome these obstacles! A list of suggested actions that institutions (e.g. universities, funders, publishers, academic societies, and peer communities) and senior individuals (e.g. supervisors) can take to support ECRs who are working on reform projects is provided in section 4, and the paper ends with list of lessons learned by ECRs who have previously worked on improving science (section 6, subsection headings):

  • Know what has been done before
  • Start with a feasible goal
  • Collaborate wisely
  • Work towards equity, diversity, and inclusion
  • Build a positive and inclusive team dynamic
  • Anticipate concerns or resistance to change
  • Be persistent
  • Plan for sustainability

Additional (and more detailed) tips are also included in the document ‘Tips and tricks for ECRs organizing initiatives’ on OSF.

Section 5 discusses additional obstacles ECRs in countries with limited funding face and mentions ‘In some countries, postdocs and occasionally PIs lack institutional affiliations, which can be an added challenge when trying to initiate systemic change.’

IGDORE actually provides free institutional affiliation to researchers and researchers in training. IGDORE currently has affiliates on all continents and we’d be very happy to provide institutional support services for ECRs working on scientific reform (or field-specific research for that matter) who don’t have access to an institutional affiliation in their own country. (The Ronin Institute is another option for getting an institutional affiliation.) Additionally, this very forum (On Science & Academia) can also provide a place for ECRs who want to have more nuanced discussions about scientific reform than is possible/probable on Twitter or Slack (we can make public/private categories for specific initiatives, just message a forum moderator or admin about your groups needs). Readers should feel free to point ECRs towards IGDORE and this forum if they think one or both could assist them.

To conclude:

ECRs are important stakeholders working to catalyse systemic change in research practice and culture. The examples presented reveal that ECRs have already made remarkable progress. Future efforts should focus on incentivizing and rewarding systemic efforts to improve science culture and practice. This includes providing protected time for individuals working in these areas and amplifying ECR voices and meaningfully incorporating them into decision-making structures. ECRs working on improving science in communities or countries with limited research funding should be supported by organizations with access to greater resources to improve science for all. We hope that the tools, lessons learned, and resources developed during this event will enhance efforts spearheaded by ECRs around the world, while prompting organizations and individuals to take action to support ECRs working to improve science.

What about the OS&A forum readers? A few of you are involved in scientific reform initiatives (perhaps more so outside of traditional academia than inside of it). What resonates here? What would you add?

As a Global Board member at IGDORE I’ve personally noticed that having well thought out plans and goals is important, as is developing a strong diverse team (with respect to volunteers) and planning for leadership sustainability (tbh we’re still working to improve all of these points). Persistence is critical even in the most basic things like building up newsletter readership :rofl: Interestingly, global/language diversity of affiliations seems to have come relatively easily to IGDORE, although this may be specific to our particular project. An additional point that I think has been useful for IGDORE but didn’t see mentioned prominently in the preprint is forming cooperations/collaborations between organisations/initiatives working on similar or allied projects - IGDORE has several organisational collaborations and I’ve found engaging with these other organisations has also helped my work at IGDORE.

NOTE: This article is now published in PLoS Biology: Recommendations for empowering early career researchers to improve research culture and practice


Nice find! Will check this out.


This months recommendation provides a very practical (and maybe even fun!) exercise to avoid errors in your research (and another h/t goes to Nowhere Lab for this)

Error Tight: Exercises for Lab Groups to Prevent Research Mistakes

The purpose of this project is to provide hands-on exercises for lab groups to identify places in their research workflow where errors may occur and pinpoint ways to address them. The appropriate approach for a given lab will vary depending on the kind of research they do, their tools, the nature of the data they work with, and many other factors. Therefore, this project does not provide a set of one-size-fits-all guidelines, but rather is intended to be an exercise in self-reflection for researchers and provide resources for solutions that are well-suited to them.

The exercise has six steps and in my opinion step 1, which involves reading a table with examples of things that have actually gone wrong at different stages of psychology research and suggested ways to avoid them, is possibly the greatest contribution of the preprint, because as noted:

A recurring theme when reading about the errors others have made is that mistakes happen in unexpected places and in unexpected ways. Therefore, reading examples of ways that others have made mistakes may be fruitful in stoking your creativity about where mistakes may happen in your process.

If I had been asked to come up list of things that (I knew) had gone wrong in my research I doubt it would have been half as long and although I work in a different field, reading through the table did prompt me to think of a few things that could go wrong in my work that I don’t usually check. Could a central database of error case reports for a variety of fields be a useful resource for an Error Tight scientific community?

“You must learn from the mistakes of others. You can’t possibly live long enough to make them all yourself.” - Samuel Levenson


I absolutely loved the recommended reading of the month, thanks!

I am going to get a tattoo with this sentence, I think :wink:

1 Like

Not an article, but I nevertheless found this piece interesting to read: The 20% Statistician: Not All Flexibility P-Hacking Is, Young Padawan (of course the Star Wars meme contributed LOL)

People in psychology are re-learning the basic rules of hypothesis testing in the wake of the replication crisis. But because they are not yet immersed in good research practices, the lack of experience means they are overregularizing simplistic rules to situations where they do not apply. Not all flexibility is p-hacking […]

Perhaps of interest to @rebecca & @Enrico.Fucci :slight_smile:


Thanks for sharing @pcmasuzzo! This reminded me of a recent article from @Lakens and colleagues: Why Hypothesis Testers Should Spend Less Time Testing Hypotheses which I think provides some great advice about things to consider before one even thinks about doing a hypothesis test.

A modern student of psychology, wanting to learn how to contribute to the science of human cognition and behavior, is typically presented with the following procedure. First, formulate a hypothesis, ideally one deductively derived from a theory. Second, devise a study to test the hypothesis. Third, collect and analyze data. And fourth, evaluate whether the results support or contradict the theory. The student will learn that doubts about the rigor of this process recently caused our discipline to reexamine practices in the field. Excessive leniency in study design, data collection, and analysis led psychological scientists to be overconfident about many hypotheses that turned out to be false. In response, psychological science as a field tightened the screws on the machinery of confirmatory testing: Predictions should be more specific, designs more powerful, and statistical tests more stringent, leaving less room for error and misrepresentation. Confirmatory testing will be taught as a highly formalized protocol with clear rules, and the student will learn to strictly separate it from the “exploratory” part of the research process. Seemingly well prepared to make a meaningful scientific contribution, the student is released into the big, wide world of psychological science.

But our curriculum has glossed over a crucial step: The student, now a junior researcher, has learned how to operate the hypothesis-testing machinery but not how to feed it with meaningful input. When setting up a hypothesis test, the junior researcher has to specify how their independent and dependent variables will be operationalized, how many participants they will collect, which exclusion criteria they will apply, which statistical method they will use, how to decide whether the hypothesis was corroborated or falsified, and so on. But deciding between these myriad options often feels like guesswork. Looking for advice, they find little more than rules of thumb and received wisdom. Although this helps them to fill in the preregistration form, a feeling of unease remains. Should science not be more principled?

We believe that the junior researcher’s unease signals an important problem. What they experience is a lack of knowledge about the elements that link their test back to the theory from which their hypothesis was derived. By using arbitrary defaults and heuristics to bridge these gaps, the researcher cannot be sure how their test result informs the theory. In this article, we discuss which inputs are necessary for informative tests of hypotheses and provide an overview of the diverse research activities that can provide these inputs.


Thanks to @Wolfgang.Lukas for pointing me towards this months paper Aspiring to greater intellectual humility in science (preprint). The article makes a good case for intellectual humility, that is, owning our limitations: “owning one’s intellectual limitations characteristically involves dispositions to: (1) believe that one has them, and to believe that their negative outcomes are due to them; (2) to admit or acknowledge them; (3) to care about them and take them seriously; and (4) to feel regret or dismay, but not hostility, about them.”, although unfortunately the current incentive structure of academia can discourage it:

In an ideal world, scientists have an obligation to put the limitations and uncertainty in their findings front and center. Although there are positive developments such as journals asking for more honest papers, and a growing number of journals accepting Registered Reports (in which the decision to publish a paper is made independent of the results), there are still many forces pushing against intellectual humility. Exaggeration is beneficial for the individual (at least in the short run), but it can be detrimental for the group, and eventually for science as a whole, given that credibility is at stake. The downward spiral in this collective action dilemma is not easily reversed, and some people are in a more secure position to make principled but disincentivized choices. Given that scientists have professional obligations and ethics to live up to, intellectual humility should be a factor in the decisions we all make. Researchers should not hide behind a flawed system, the incentive structure, or their lower-ranked position (see, e.g., ref. 38). Although they certainly play a role, these factors do not release us from our moral and professional obligations to own the limitations of our work by putting those limitations front and center, and incorporating their consequences into our conclusions.

But this article stands out by not just calling for more intellectual humility, but also providing a ‘a set of recommendations on how to increase intellectual humility in research articles and highlight the central role peer reviewers can play in incentivizing authors to foreground the flaws and uncertainty in their work, thus enabling full and transparent evaluation of the validity of research.’. These recommendations are:

  1. Title and abstract
  • 0.1 The abstract should describe limitations of the study and boundary conditions of the conclusion(s).

  • 0.2 Titles should not state or imply stronger claims than are justified (e.g., causal claims without strong evidence).

  1. Introduction
  • 1.1. The novelty of research should not be exaggerated.
  • 1.2 Selective citation should not be used to create a false sense of consistency or conflict in the literature.
  1. Method
  • 2.1 The Methods section should provide all details that a reader would need to evaluate the soundness of the methods, and to conduct a direct replication.
  • 2.2. The timing of decisions about data collection, transformations, exclusions, and analyses should be documented and shared.
  1. Results
  • 3.1 Detailed information about the data and results (including informative plots and information about uncertainty) should be provided.

  • 3.2 It should be transparent which analyses were planned and where those plans were documented. Weaker conclusions should be drawn to the extent that analyses were susceptible to data-dependent decision-making.

  • 3.3 Inferential statistics should not be used in a way that exaggerates the certainty of the findings. Alternatives to dichotomous tests should be considered.

  1. Discussion
  • 4.1 The statistical uncertainty of results should be incorporated into the narrative conclusions drawn from the results.

  • 4.2 The research summary should capture the full range of results (e.g., include our “most damning result”).

  • 4.3 Causal claims should be only as strong as the internal validity of the study allows.

  • 4.4 Claims about generalizability should be only as strong as the sampling of participants, stimuli, and settings allows.

  • 4.5 All conclusions should be calibrated to the confidence in the construct validity of the measures and manipulations.

  • 4.6 Alternative interpretations should be presented in their strongest possible form (“steelmanned”).

  • 4.7 Discussion of the limitations should be incorporated throughout the discussion section, rather than bracketed off in a subsection.

  1. Post publication guidance for authors
  • 5.1 Insist that press releases and reporters capture the limitations of the work, and correct outlets

  • that exaggerate or misrepresent.

  • 5.2 Encourage criticism, correction, and replication of our work, and respond non-defensively when errors or contradictory evidence are brought to light.

  • 5.3 When appropriate, retract papers, issue corrections, or publish “loss of confidence” statements.

Moreover, even if junior academics are cautious of embracing intellectual humble practices in case it puts them at a disadvantage when publishing, they could still promote it while reviewing other’s papers (and no, the authors say we shouldn’t consider that to be hypocritical!):

But many may feel they are not in the position (yet) to “do the right thing.” That is an understandable position, given the non-scientific interests most of us must factor into our decisions. Luckily, reviewing provides an almost cost-free opportunity for all of us to contribute to incentivizing intellectual humility. Even if we are not always able to apply these practices as authors, out of fear of lowering our chances of success, we can flip the script for the authors whose manuscripts we review, and make their success at least partly dependent on the amount of humility they show in their papers, thus resulting in more honest and more credible papers. In our view, this is not hypocritical, but trying to act in accordance with our norms, within the limits of our capabilities. Of course, these same reviewers would have to be ready and willing to accept reviewers’ suggestions to be more intellectually humble if the roles are reversed. We suspect that most people who are willing to promote intellectual humility as reviewers would be happy to do so when they find themselves on the receiving end of similar reviews.

Note - I renamed this thread Monthly Reading Recommendation from Open Science Reading Recommendation. A lot of the articles I’m suggesting stray outside of what is usually considered Open Science, cross Metascience and end up in Philosophy of Science :smiley:


This months recommendation is much more empirically focused than many of the articles I’ve indicated previously. How do researchers approach societal impact? (h/t @Daniel-Mietchen) ‘examines how researchers approach societal impact, that is, what they think about societal impact in research governance, what their societal goals are, and how they use communication formats. Hence, this study offers empirical evidence on a group that has received remarkably little attention in the scholarly discourse on the societal impact of research—academic researchers. Our analysis is based on an empirical survey among 499 researchers in Germany conducted from April to June 2020.

Societal impact has become an important point when setting research priorities and the articles provides a good introduction to two discourses that have been used for framing it ‘1) the discourse in communication and science and technology studies (STS) on the relationship between science and society and 2) the discourse on societal impact measurement in scientometrics and evaluation research

The study explored three research questions:

  1. RQ1 : What are researchers’ opinions on societal impact?
  2. RQ2 : Which societal goals do researchers aim to achieve with their research?
  3. RQ3 : Which formats do researchers use to achieve societal impact?

And analysed these in terms of three dimensions: content (discipline and applied vs. basic), organizational and individual. One of the organizational categories is non-university research institutes (as well as universities and universities of applied sciences) and while I assume that most or all of the research institutes represented in this study have physical workplaces, I was interested to see how the institute’s results differed from universities to put into context my own experiences at IGDORE (I mostly quote the results describing differences in the organization dimension below).

RQ1—opinion: Individual commitment without an institutional mandate

Researchers at applied universities agree that societal relevance should have more weight in evaluation more often than those at independent institutes and universities—the approval rates are 62%, 49%, and 53% respectively. Researchers at universities show the lowest agreement to the statement that knowledge transfer plays an important role at their institution: Only 19% approve compared to 36% at applied universities and 37% at independent institutes. Furthermore, researchers at universities particularly disagree with the statement that their communication departments are able to reach relevant stakeholders in society: Only 15% approve compared to 28% at applied universities and 44% at independent research institutes.

RQ2—goals: Disciplines define societal goals

When looking at the different types of organizations, it is noticeable that researchers from applied universities are most economically oriented: 41% of the researchers from universities of applied sciences indicate that they want to contribute to economic value creation (19% at nonuniversity institutions and 17% at universities). In contrast, researchers from independent institutes are most the policy-oriented ones: 54% of the researchers from independent institutes aim to contribute to political decision-making (31% at universities and 37% at applied universities).

RQ3—formats: University researchers are the least active

Researchers from applied universities are the primary users of advisory formats: 57% of researchers at applied universities have used advisory formats compared to 40% of researchers at independent institutes and only 26% of the researchers at universities. Only 28% of the researchers from universities have used collaboration formats, compared to 53% of researchers from applied universities and 35% of researchers from independent institutes. As Table 2 shows, university researchers have remarkably low scores on every communication format.

Table 2 also indicates that researchers at independent institutes are the users of events and social media for communication.

In the discussion, the authors note that there is appears to be some mismatch between the generally high importance attributed to societal impact by researchers vs. how effectively they believe their institution conducts knowledge transfer and communication with stakeholders in society (this made me recall another working paper which finds that technology transfer offices and US universities don’t operate very efficiently):

it is remarkable that the majority of researchers (89%) consider societal engagement to be part of scientific activity. More than half of the researchers (53%) agree that societal impact should be given more weight in evaluations. Even though the majority of researchers regard public engagement as part of scientific work, they are not equally positive about whether societal impact should have more weight in evaluations. One reason for this discrepancy may be that researchers fear that evaluations will lead to additional work or that they will not adequately record their transfer activities [23, 60, 64]. In addition, it is striking that only 27% of the respondents assume that knowledge transfer plays an important role at their institution; also 27% n believe that the institutional communication department is managing to reach relevant stakeholders in society. Humanities scholars (15%) and university researchers (15%) particularly doubt that their communication departments are reaching relevant societal stakeholders. This mirrors previous findings suggesting a certain decoupling between central transfer infrastructures and researchers [38, 39] and leads us to hypothesize that there is a certain mismatch between individual and institutional commitment.

Although the article is mainly descriptive, the authors do propose some practical actions:

First, considering the discontent with institutional communication departments, it might be worthwhile to implement decentralized support structures on the mesolevel of research organizations. This could more adequately address the complexities of the sciences and their many publics [5, 6, 57, 63]. The findings further suggest that, where applicable, organizational factors (e.g., institutional investments in transfer, training offerings, support infrastructures) should be more strongly incorporated into assessments of societal impact—for example. through formative evaluations [88]. Second, our results suggest that it is strongly advisable that evaluation exercises are responsive to disciplinary differences. For example, if economic and technical impact were the sole basis for assessing societal impact, social sciences and humanities scholars would be discriminated against [6, 63]. Our framework for societal goals and our results can also be the basis for disciplinary self-understanding (e.g., in learned societies), in that they can stimulate a normative discussion about good transfer and its evaluation. Third, considering the comparatively low importance of social media as a means of communicating about research, care should be taken not to overuse online discourse as a way of easily generating impact proxies.

As for comparing the results for independent institutes to my own experiences, I generally agree that IGDORE researchers are probably more interested in knowledge transfer than many academics but, unfortunately, we don’t (yet) provide any communication services to facilitate this. IGDORE researchers also seem to be quite active in trying to influence research policy, especially policies related to open science, and when I joined IGDORE, I was actually surprised at how active members of the community were on social media! So overall I think the results of this survey are generally representative of the differences I’ve noticed between IGDORE researchers and those in academia.

This month’s paper is Self-correction in science: The diagnostic and integrative motives for replication (preprint), another thought-provoking article from Peterson and Panofsky. The essential argument is that replications come in both diagnostic and integrative forms:

the goal of integrative replication is to reproduce the ends of research while being pragmatic about the means, whereas the goal of diagnostic replication is to faithfully reproduce the means while remaining agnostic about the ends

and that 'that current debates, as well as research in the science and technology studies, have paid little heed to a key dimension [integrative forms] of replication practice.’ Interviewing sixty members of Science’s Board of Reviewing Editors lead the authors to the following six theses:

  1. Strictly diagnostic replications are rare, but replications motivated by the desire to integrate are common.
  2. Integrative replication attempts provide varying degrees of diagnostic evidence.
  3. Integrative replication provides stronger diagnostic evidence when task uncertainty is lower.
  4. Integrative replication provides weaker diagnostic evidence when task uncertainty is higher.
  5. When diagnostic replication requires special effort, experimentalists often embrace a logic of investment rather than a logic of truth.
  6. When diagnostic replication is difficult and ambiguous, researchers may prefer an organic mode of self-correction.

The reference to task uncertainty was important for fields goals for replications and deserves further clarification: ‘High task uncertainty is characteristic of experimental contexts in which significant variables are either unknown or uncontrollable and/or experimental techniques and technologies are either unstandardized or unstandardizable. In conditions of low task uncertainty, on the other hand, variables are known and controlled and experimental techniques and technologies are standardized and predictable.’ A common generalization is that many subfields of physics have low task uncertainty while many areas of biology have high task uncertainty, although the article notes that there are also exceptions.

The article describes the rationale for each thesis in detail, but I think that the implications that article notes out for science policy related to replications are the most useful to highlight:

By encouraging replications and making them easier to perform, metascientific activists hope that replication will move from the realm of a possible, but rarely actualized, deterrent to something with real teeth. Given our analysis, such policies would be especially effective in fields with low task uncertainty, where replication is interpreted to be most diagnostic. Yet, respondents in these fields were the least likely to endorse the need for such developments since the threat of a potent and unambiguous diagnostic replication is seen as sufficient to discourage bad behavior in many of these fields. Conversely, in fields with greater perceived task uncertainty – including much of the biomedical and behavioral research fields that most replication activism targets – replication initiatives are likely to have less impact because these replications are more susceptible to falling into the experimenters’ regress.

This highlights the tension that can arise between the two motives for replication. Researchers have an inherent motivation during integrative replication. The goal is to ‘get it to work’ in order to extend their own research capacities. When replication is undertaken purely for diagnostic reasons, the motivations are unclear. What would motivate researchers to stop their own research in order to explicitly test a finding that is not integral to their own projects? Rather than ‘get it to work’, scholars conducting purely diagnostic replication attempts may, in fact, have a perverse incentive to have it fail. At the very least, they may lack the motivation to do it correctly. This can create a culture of paranoia which, while in line with the abstract ideal of ‘organized skepticism’, reflects a mistrust that is actually quite unusual in the history of science

Having never previously considered the value of integrative replications (despite having performed them regularly in my own research), I do think that the role of such integrative work deserves more consideration in relation to the current replication crises. Yet the paper’s final paragraph notes that it is not yet clear how the data from failed integrative replications can be used to identify unreplicable results in the literature:

Metascientists have argued that having failed studies languishing in file drawers means that we are getting an incomplete picture of the data in a field. Our respondents made clear that, in many cases, they choose not to share such failures because the data was never meant to be diagnostic. They were quick and sloppy attempts to try something out, not sober processes of verification, and what might constitute a publicly sharable output is not clear. This raises significant and complex questions regarding what sorts of failures should count and which should not be included in metanalytic analyses.

This made me recall part of @bastien.lemaire’s recent review article on the file drawer effect:

Null but conclusive findings have the power to dissuade researchers from unfruitful avenues. Contrarily, null but inconclusive findings are hardly interpretable. However, they are useful in laying down scientific questions that remain to be tackled by the scientific community. Surprisingly or not, once published and grouped in systematic reviews, those underpowered studies can be lifesaving. This is well illustrated by the Stroke Unit Trialists’ collaborative systematic review who gathered a series of null and underpowered studies[26] which once together demonstrated that having a stroke unit care reduces the odds of death by 17% (95% confidence interval 4-29%) … true (or diagnostic) replications are not always necessary to validate or invalidate previously described results. Indeed, the convergence of evidence through triangulations of experiments can act as indirect replications and, at the same time, push the boundaries of knowledge.

Formal protocols for combining the evidence from failed integrative replications (possibly with other null or underpowered results) do seem like they could indeed help resolve many questions about research replicability (particularly in fields with high task uncertainty) without requiring direct/diagnostic replications to be conducted in every case.

This month’s article is Amateur hour: Improving knowledge diversity in psychological and behavioural science by harnessing contributions from amateurs (preprint) from Mohlhenrich & Krpan, who are the founders of Seeds of Science @SeedsofScience, which was previously described in this forum post (and shout out to Mohlhenrich, who is actually an amateur researcher himself). They claim (and I generally agree) that:

Psychological and behavioral science (PBS) suffers from a lack of diversity in its key intellectual and research activities (Krpan, 2020; Medin, Ojalehto, Marin, & Bang, 2017). This low “knowledge diversity” is reflected in numerous aspects of the field—certain research topics (e.g., those that may be easily publishable) are prioritized over other important but less desirable topics (e.g., those that are not heavily cited or easy to publish); some methodologies such as experimentation are widely used whereas less common methods (e.g., self-observation) are neglected; short-term projects with quick gains are prioritized over the long-term ones; some participant populations are understudied (e.g., non-WEIRD samples; i.e., non-western, educated, industrialized, rich and democratic, Henrich, Heine, & Norenzayan, 2010); and theorizing is driven by arbitrary conventions and overly reliant on available research findings while avoiding speculation that could lead to new insights (Krpan, 2020, 2021a; Medin et al., 2017; Stanford, 2019).

Several strategies for long term systematic change have already been proposed to increase knowledge diversity in PBS. But the authors propose an alternative, short term strategy: ‘harnessing contributions from amateurs who can explore the diverse aspects of psychology that are neglected in academia’ and go on to describe (and provide examples of) five types of amateurs who can be categorized on the basis of their expertise distance and expertise level (Figure 1):

  1. Independent scientists
  2. Outsiders
  3. Undergraduate students
  4. Quantified self practitioners
  5. Citizen scientists

They then identify six types of problems (‘blind-spots’) that are not well incentivized in academia but would be suitable for amateur research (Table 1):

  1. Long-term projects
  2. Basic observational research
  3. Speculation
  4. Interdisciplinary projects
  5. Aimless projects
  6. Uncommon research areas

Finally, the authors provide suggestions across five areas for facilitating amateur participation in PBS, including:

  1. Encouraging non-traditional academic relationships
  2. Creating a digital amateur research hub
  3. Providing editorial support for amateurs at academic journals
  4. Founding an amateur PBS institute to support their work (joining the Ronin Institute or IGDORE is also an option for amateur researchers seeking institutional support)
  5. Reducing scepticism of professional PBS academics towards amateurs

To make the case for amateurs increasing knowledge diversity, the authors note:

The blind spots we have identified (and further ones that we are not even aware of, the “unknown unknowns”) arise from constraints that have both functional and mental aspects. For example, a PBS researcher may be discouraged from pursuing a long, aimless project (perhaps one that deals with a taboo subject) in a functional sense (e.g., they will not get jobs or tenure if they do not publish), but also in the mental sense—being systematically disincentivized to undertake such projects over time may influence them to adopt a mode of thinking that makes it difficult to spontaneously generate ideas for “blind spot” research. The main argument we are making in this article is that amateurs can more easily address the blind spots that hamper knowledge diversity than professionals because they are free from the functional constraints and are therefore also less likely to be hampered by the mental constraints.

As an independent scientist (well as a neuroscientist working on biophysics, I’d probably fit better with the ‘outsider’ category in this article), I’ve noticed that amateur/independent researchers often seem to come up with interesting ideas and innovative approaches for new research projects (myself included, although I may be a bit biased…). I don’t know if this is because we’ve been liberated from any mental constraints of academia, and there can certainly be other functional constraints (such as balancing salaried work w/ research, getting journal access, etc.), but I think it would be good to consider further. If there was evidence for amateurs being able to come up with ideas to address blind spots more easily than academic researchers, it would make a strong case to increase support for amateur research contributions (which wouldn’t be hard, as there isn’t much currently).

I think this paper is well worth reading for anybody interested in increasing knowledge diversity, regardless of whether they work inside or outside academia. If nothing else, both groups are likely to pick up ideas for new strategies they could use to generate research ideas and new ways they could seek support from the other group.

1 Like

Thanks for the kind words and glad you enjoyed the paper, Gavin! Happy to answer any questions on here.

1 Like

This month’s article is Biosecurity in an age of open science by Smith and Sandbrink, which discusses a valid criticism of Open Science - that in some fields making research open might lead to risks that are greater than any benefits that openness can bring. The idea of openness having negative consequences might sound strange to many advocates of OS but, thankfully, the scope of concern is (currently) limited to subfields in the life sciences, like synthetic mammalian virology, where the misuse of research findings could have widespread negative consequences:

Certain life sciences research may be misused and increase the risk from deliberate biological events. For example, though advances in viral engineering may be important in areas like vaccine design and cancer therapy, they could be applied to engineer pathogens with increased virulence or transmissibility. Deliberate release of such pathogens could result in a pandemic of unprecedented severity. Research with the greatest misuse potential has been labelled dual-use research of concern (DURC), defined by the National Institutes of Health in the US as “life sciences research that, based on current understanding, can be reasonably anticipated to provide knowledge, information, products or technologies that could be directly misapplied to pose a significant threat with broad potential consequences to public health and safety”

The article identifies the potential for misuse in four common OS practices:

  1. Open code: “Openly shared computational methods may therefore make pathogen engineering more accessible by reducing or even removing the need for laboratory expertise and equipment.
  2. Open data: “publicly available datasets could be used by malicious actors to inform the enhancement of pandemic pathogens. Beyond the generation of datasets with greater potential for misuse, improved computational methods mean that data can be more effectively used for malicious bioengineering
  3. Open methods: “Publication of detailed methods, for example, for the synthesis and engineering of pandemic pathogens, may also increase the risk of accidents and misuse. Detailed protocols may lower the tacit knowledge required to perform certain procedures, making them more accessible to bad actors, inappropriately qualified personnel, or personnel working in inappropriate facilities
  4. Preprinting: “Preprints may therefore remove the “gatekeeper” role that journals could play in mitigating risks from the publication of research with potential for misuse. Authors may select preprint servers that do not screen research. … there are examples where journals and editors have been important in evaluating risks from publication … Preprints may therefore increase the probability that dangerous methods or results are described publicly.

Having encountered the conflict between OS and dual-use research previously, I initially viewed this as being difficult to resolve while maintaining openness, but I was happy to see that the authors provided suggestions for strategies to mitigate risks from these practices (rather than abandoning them completely) and even sometimes encourage openness (see Figure 1). Specifically, using Open Code, Data and Methods practices in dual-use research could be facilitated by developing access-controlled repositories that “facilitate interoperability, reuse, and findability through enforcing or encouraging standards for metadata with common vocabularies and appropriate documentation”, while existing preprint servers could coordinate their screening to ensure that “If an article is flagged by at least one server as potentially concerning, other servers could agree not to post that article until it was appropriately peer-reviewed.

OS practitioners will also be happy to see that preregistration is identified as an OS practice that could be encouraged in dual-use areas as it is unlikely to create any additional concern, while it provides the opportunity for oversight that may mitigate the existing risk early in the research lifecycle: “It seems likely that greater consideration of the research before it is started, as encouraged by preregistration, could help to mitigate misuse risks. Currently, biosecurity risk assessment and management is not consistently conducted at any stage throughout the research lifecycle; preregistration could encourage greater consideration of risks at an early stage. Submission platforms could ask researchers to reflect on the dual-use potential of their work. In certain high-risk fields, platforms could request that details of hazard assessment be provided, which could be incentivised by journals requesting evidence of such assessments on publication.”. However, the authors note that the format of preregistrations is generally focused on confirmatory research, while it is exploratory studies may be more likely to present dual risks. I’ve also found it difficult to find resources on preparing preregistration for my own exploratory (but not dual-use!) research, and I feel that this an area of open science methodology that deserves further development and could be useful to a broad range of life scientists (not just those working in dual-use areas). Regardless of the format, “interventions aimed at encouraging review at the conception of research seem particularly promising.

Overall, the article provides a new perspective on balancing the benefits of Open Science against an unusual but valid ‘cost’ (that of increased misuse risk), and I think most would agree that “Open science and biosecurity experts have an important role to play in enabling responsible research with maximal societal benefit.


This month there is a double recommendation! Innovations in peer review in scholarly publishing: a meta-summary and Innovating peer review, reconfiguring scholarly communication: An analytical overview of ongoing peer review innovation activities are two complementary preprints, both from the Peer Review project at the Research on Research Institute (RORI). The first article provides a ‘review of reviews’ of six previous literature reviews (including two from @jon_tennant) on innovations in peer review, while the second article reports an inductively developed taxonomy of peer review innovation based on the responses to a survey sent to scholarly communication organisations. Together, the two articles provide an excellent overview of the state of peer review innovation and include far more material than I can summarise here, so I will simply present the frameworks in which the articles present their results and then comment on their conclusions.

The meta-summary places peer review innovations in the context of three categories and a variety of subcategories

  1. Approaches to peer review

    1. Open/masked peer review

    2. Pre/post publication review

    3. Collaboration and decoupling

    4. Focussed and specialised review

  2. Review focused incentives

    1. Reviewer incentives

    2. Reviewer support

  3. Technology to support peer review

    1. Current uses

    2. Potential models

An interesting point in the meta-summary is the description of the peer review innovation evaluations covered in Bruce et al (2016): ‘The authors found, based on these outcome measures, that compared with standard peer review, reviewer training was not successful in improving the quality of the peer review report and use of checklists by peer reviewers to check the quality of a manuscript did not improve the quality of the final article. However, the addition of a specialised statistical reviewer did improve the quality of the final article and open peer review was also successful in improving the quality of review reports. It did not affect the time reviewers spent on their report. Open peer review also decreased the number of papers rejected. Finally, blinded peer review did not affect the quality of review reports or rejection rates.

The survey overview led to a slightly more complex taxonomy with five main elements:

An interesting point from the overview is about the participation of patients in reviews at biomedical journals: ‘Our sample contains two examples that involve patients as reviewers for journals. Both obviously focus on biomedical research with a particular emphasis on its practical or clinical relevance - one is the BMJ, and the other Research Involvement and Engagement (published by BMC). Whereas in the former, patient reviewers are invited on a selective basis depending on the submission, the latter journal foresees the regular use of two patient reviewers and two academic reviewers for all manuscripts.

The overview notes that the diverse range of peer review innovations they have catalogued pull in mutually opposed directions. These oppositions include:

  1. Efficiency vs. rigour: ‘numerous innovations in the categories “role of reviewers” and “transparency of review” aim to increase the efficiency of peer review, which can to some extent be seen as a remedy to the growing amount and cost of review work’ ← → ‘Our data firstly suggests that many innovations in the categories “objects” and “nature of review” amount to promoting more rigorous quality control, namely by multiplying the objects of and occasions for review.
  2. Singular vs. pluralistic considerations of quality: ‘Registered reports assume a specific understanding of how the research process should be organized, based on an epistemological ideal of particular forms of experimental science.’ ← → ‘Innovations that remove social and disciplinary boundaries to reviewing … [and] deanonymize the review process … encourage a deliberative approach where potentially opposed speakers can explicitly address each other, even though there is also a risk that reviewers may not feel comfortable giving their frank opinion in a deanonymized setting.
  3. Transparency vs. objectivity: ‘making review reports and reviewer identities transparent is now a widely offered possibility … assuming that disclosing identities of authors and reviewers is useful for accountability in peer review’ ← → ‘there are also some signs of a trend towards abandoning mandatory disclosure of reviewer identities (BMC) and towards double-blind peer review (IOP Publishing) … presupposing that objectivity of peer review requires anonymity of authors and reviewers

While the overview mediates the opposing directions of peer review development by calling for coordination between scholarly communication innovators (an activity that the RORI would be well-positioned to facilitate), the meta-summary also identified a strong conclusion that there is the need ‘for a wider reflection on the peer review process as a research community, with both Barroga (2020) and Tennant (2018) underscoring the need to consider what different stakeholders bring to the peer review process and which role they inhabit.’ It seems that ‘coordination’ should not just be limited to scholarly communication organisations but extend into consultation with peer review stakeholders in the broader academic community. Furthermore, the meta-summary also reports ‘Three reviews (Bruce et al, 2016; Horbach & Halffman, 2018; Tennant et al, 2017) conclude that there is a lack of empirical evidence to assess the effectiveness of innovations in peer review.’ This lack of empirical evidence provides little basis for assessing trade-offs made by moving along the opposing directions of innovation, and empirical studies should be prioritised if the evidence base can keep up with the rapid innovation.

Despite the lack of empirical evidence and opposing directions of development, it seems to be a promising time for peer review innovation as the meta-summary notes: ‘The fact that there are enough review articles to warrant a review of reviews, indicates the growing maturity of the field of peer review research.

Ludo Waltman let me know that there is a newer article in this series that I missed: How to improve scientific peer review: Four schools of thought. Based on the inquiry into peer review innovation in the preceding articles, this article proposes four schools of thought that shape the innovation landscape:

  1. The Quality & Reproducibility school
  2. The Democracy & Transparency school
  3. The Equity & Inclusion school,
  4. The Efficiency & Incentives school