Volume 38, Issue 2 p. 278-283
FORUM
Open Access

Editors are biased too: An extension of Fox et al. (2023)'s analysis makes the case for triple-blind review

Diane S. Srivastava

Corresponding Author

Diane S. Srivastava

Department of Zoology, Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada

Correspondence

Diane S. Srivastava

Email: [email protected]

Search for more papers by this author
Joana Bernardino

Joana Bernardino

InBIO Laboratório Associado, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Instituto Superior de Agronomia, Universidade de Lisboa, Lisbon, Portugal

Search for more papers by this author
Ana Teresa Marques

Ana Teresa Marques

InBIO Laboratório Associado, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Instituto Superior de Agronomia, Universidade de Lisboa, Lisbon, Portugal

Search for more papers by this author
António Proença-Ferreira

António Proença-Ferreira

InBIO Laboratório Associado, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Instituto Superior de Agronomia, Universidade de Lisboa, Lisbon, Portugal

Conservation Biology Lab, Department of Biology, School of Sciences and Technology, UBC, University of Évora, Évora, Portugal

InBIO Laboratório Associado, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal

BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal

Search for more papers by this author
Ana Filipa Filipe

Ana Filipa Filipe

Portugal Forest Research Centre and Associate Laboratory TERRA, School of Agriculture, University of Lisbon, Lisbon, Portugal

Search for more papers by this author
Luís Borda-de-Água

Luís Borda-de-Água

InBIO Laboratório Associado, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Instituto Superior de Agronomia, Universidade de Lisboa, Lisbon, Portugal

InBIO Laboratório Associado, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal

BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal

Search for more papers by this author
João Gameiro

João Gameiro

InBIO Laboratório Associado, CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Instituto Superior de Agronomia, Universidade de Lisboa, Lisbon, Portugal

Search for more papers by this author
First published: 06 February 2024
Handling Editor: Katie Field

Abstract

en

  1. Functional Ecology conducted a randomised trial comparing single- and double-blind peer review; a recent analysis of this data found substantial evidence for bias by reviewers.
  2. We show that this dataset can also be analysed for editor bias, after controlling for both reviewer bias and paper quality.
  3. Our analysis shows that editors tend to be more likely to invite high-scoring manuscripts for revision or resubmission when the first author is a man from a country with a very high Human Development Index (HDI); first authors who were women or not from very high HDI countries were more likely to be rejected at this stage.
  4. We propose that journals consider a triple-blind review process where neither editors nor reviewers know the identity of authors, and authors do not know the identity of reviewers nor editors.

Read the free Plain Language Summary for this article on the Journal blog.

Resumo

pt

  1. A Functional Ecology realizou um ensaio com vista a comparar o processo de revisão de artigos com anonimato de revisores e com duplo anonimato (em que autores e revisores desconhecem as respetivas identidades); uma análise recente destes dados encontrou fortes evidências de enviesamento parte dos revisores.
  2. Neste artigo mostramos que os mesmos dados podem também ser analisados para avaliar o enviesamento parte de editores, quando a análise tem em conta o enviesamento dos revisores e a qualidade dos artigos.
  3. A nossa análise revela que manuscritos com uma pontuação elevada em que o primeiro autor é um homem proveniente de um país com um Índice de Desenvolvimento Humano muito elevado têm maior probabilidade de serem convidados para revisão ou ressubmissão pelo editor; primeiros autores que sejam mulheres ou que não provêm de países com IDH muito elevado têm maior probabilidade de serem rejeitados nesta fase.
  4. O nosso estudo sugere que as revistas científicas devem ter em consideração um processo de revisão com ‘triplo anonimato’, em que editores e revisores desconhecem a identidade dos autores e os autores desconhecem a identidade de revisores e editores.

1 INTRODUCTION

Science has an equity problem. Despite the mantra of scientific objectivity, there is accumulating evidence of systemic bias based on the gender, race, first language or nationality of scientists. Such bias affects not only access to and representation within science (Hughes et al., 2023; Kozlowski et al., 2022) but also the publishing process (Bancroft et al., 2022; Lee et al., 2013). However, the data compiled to demonstrate such bias can also help us evaluate potential solutions. Such is the case in the recent study by Fox et al. (2023) which used a large-scale randomised trial to both test for bias in the manuscript review process at Functional Ecology (FE) and to inform proposed solutions of mandatory versus optional double-blind review. By comparing the fate of 674 submitted manuscripts randomly assigned to a ‘single-blind’ treatment (author identities revealed to peer reviewers, but not vice versa) and sent out to review to 708 papers in a ‘double-blind’ treatment (author identities hidden to peer reviewers, and vice versa) and sent out to review, this study found substantial bias by reviewers when author identities were known to them, favouring first authors based in wealthier and English-speaking countries but neutral with respect to author gender. Based on this compelling evidence for bias by peer reviewers, FE recently decided that author identities in all of its papers would now be anonymised to reviewers. We argue here that there is an unrealised opportunity in this dataset (deposited in Fox, 2022) to extend this analysis of bias by reviewers to editors. When we did so, we found that editor bias can affect paper outcomes as much as reviewer bias, arguing for journal policies that improve equity in editorial decisions such as ‘triple-blind’ review, where both editors and reviewers do not know author identities and vice versa (Bancroft et al., 2022).

1.1 The original analysis cannot be used to assess editor bias

We begin by summarising the analytical approach of Fox et al. (2023). In their study, papers were assigned demographic variables based on first author characteristics: (1) the assumed binary gender (man, woman); (2) the social and economic wellbeing (HDI: Human Development Index, as either a categorical or continuous variable) of the country of affiliation; and (3) the use of English in the country of affiliation. In the FE review process, reviewers score papers on a four-point scale (Fox, 2022: mean score = 2.27, SD = 0.69) and provide these scores along with a written assessment to editors, who decide whether to invite the paper for revision or resubmission, or to reject it, based on the reviews and their own reading of the paper. In the Fox, 2022 dataset, 50% of the papers sent for review are subsequently invited for revision or resubmission (hereafter simply ‘resubmission’ following Fox et al.), with invited papers having a mean reviewer of score of 2.70 (SD = ±0.53) compared to 1.83 (±0.56) for papers rejected at this stage. To test for reviewer bias, Fox et al. analysed if mean reviewer scores differed more between demographic categories when reviewers knew author identities than when identities were masked (hereafter ‘review treatment’). Such an interaction was found when papers were categorised by author country but not author gender: authors based in countries with either very high HDI or English dominance received higher scores when their identity was known to reviewers (single-blinded treatment) compared to when papers were anonymised (double-blinded treatment). Since editors base their decisions to invite papers for resubmission at least in part on reviewer scores, similar results were found when the probability of being invited for resubmission was used as a response variable. We represent this causality in Figure 1a as a path from the demographic category by review treatment interaction to reviewer scores, and then a path from reviewer scores to the probability of resubmission.

Details are in the caption following the image
Causal diagrams underlying different analytical approaches for evaluating editor bias using the randomised double-blind review trial data presented in Fox et al. (2023). Measured variables are indicated with rectangles, latent variables with ovals, and the direction of causality with arrows. Editors invite papers to be resubmitted based both on reviewer scores (blue path) and the editor's own appraisal of paper quality (orange path). (a) A model reported in Fox et al. (2023), with a direct path from demography × review treatment to probability of resubmission, independent of reviewer score, may erroneously be interpreted as a test of editor bias. However, such an interpretation forgets that the review treatment is meaningless for editors, who are never blinded to author identities, and does not account for editors' independent appraisal of paper quality. (b) Our proposed approach, based only on double-blinded reviews, allows reviewer scores to directly affect editor decisions as well as simultaneously act as an unbiased estimator of the latent variable of paper quality and its effect on editor decisions. This concordance enables us to interpret any effect of author demographics on the relationship between resubmission probability and reviewer scores as evidence of editor bias. (c) By contrast, in a model where only biased scores (single-blind reviews) are included, it is impossible to test for editor bias without independent information on paper quality and how it contributes to the final decision of editors. Causal diagrams can be analysed, in part, with the generalised linear model (GLM) formulae given.

It might be imagined that, in such a causal model, bias by editors would be simply represented by any residual direct effects of the demographic category by review treatment interaction on resubmission probability (i.e. after accounting for reviewer score). The logic here would be the effects of reviewer bias on the fates of papers are already completely accounted for by reviewer scores in the model, so a significant residual effect of demographic category by review treatment interaction must reflect a different source of bias—and the only bias remaining is that by editors. Indeed, Fox et al. (2023) fit such models (Figure 1a) and found that there was a significant residual effect of the interaction between review treatment and gender but not either country HDI level or English dominance. In the absence of interpretation of these results by Fox et al. (2023), we initially assumed—perhaps like other readers—that their intention was to test for editor bias, although subsequent correspondence with the authors revealed that they thought the model inadequate for such inference. Yet it is worth asking if this model could be a robust test for editor bias, for if so Fox et al. (2023) have already provided important results relevant to the overlooked issue of editor bias.

Unfortunately, the answer is no; this is not a robust test for editor bias. The problem with this type of inference is that bias by editors is actually not expected to result in a significant demographics × review treatment interaction: the single- versus double-blind treatment does not apply to editors as ‘author identities were not blinded to editors in either treatment’. It may seem that a simple solution to the last point would be to replace the demographics × review treatment interaction with just the demographics variable, but this does not account for a further problem in combining the single-blind and double-blind data in a test of editor bias. Editors are mandated to make their decisions on resubmission based both on reviewer scores and on their direct assessment of paper quality. In the double-blind treatment, we can assume that reviewer scores are an unbiased estimate of the latent variable of paper quality and so adequately represent the combination of both pathways to editor decisions (Figure 1b). In this treatment, we can unequivocally interpret any effect of author demographics on the relationship between resubmission probability and reviewer scores as evidence of editor bias. However, in the single-blind treatment, reviewer scores are a biased estimator of paper quality, so without independent information on paper quality or how it contributes to the final decision of editors, it is impossible to test for bias by editors (Figure 1c). Specifically, even an unbiased editor would still make a slightly biased decision due to the contribution of the biased reviewer scores to their decision, but establishing this exact null expectation is not possible without more information. We thus conclude that editor bias can only rigorously be assessed using the data from Fox et al. (2023) in the double-blind treatment.

1.2 A robust analysis of editor bias

With this logic in mind, we expanded the analyses of the dataset of Fox et al. (2023) to test for editor bias. Using only the double-blind papers, we tested if author demographics affects the relationship between reviewer scores and resubmission decisions by editors. To formulate this model, we drew from the rationale provided by intersectionality theory, which posits that those holding multiple marginalised identities face unique forms of discrimination, not just those particular to each facet of their identity. To allow for non-additive intersectionality (sensu Bright et al., 2016), we include both gender and country HDI status (very high or not, as in Fox et al., 2023) and their interaction in our model, unlike the analysis of Fox et al. which examined these separately. While a combined analysis of gender and country HDI status would not have changed Fox et al.'s conclusions related to reviewer bias, it does affect our results concerning editor bias.

When using only double-blind papers, we find that the effect of reviewer score on resubmission probability significantly changes with author gender and country HDI status (Table 1; Figure 2a). Specifically, papers that received an average (mean score = 2.2 out of 4) or higher score tended to be invited for resubmission more often if the first author was a man based in a very high HDI country. By contrast, first authors who were women or not from very high HDI countries fared poorly. This effect varies in magnitude with reviewer score, but can increase the likelihood of a positive editor decision by up to 15% for men in very high HDI countries—similar in magnitude to the effects of reviewer bias (c. 11%) previously detected by Fox et al. (2023), albeit with more variation around the means (note slight overlap in confidence intervals in Figure 2c). Paradoxically, the quartile of papers that received a low score (<2) were slightly less likely to be invited for revision/resubmission if the first author was a man based in a very high HDI country, but the differences here are more muted (Figure 2b). We also considered country language (English vs. non-English dominant, as in Fox et al.) in combination with gender in a second model, and obtained a very similar result to the HDI × gender model: papers with high reviewer scores were less likely to be invited for resubmission when authors were women (gender × score: χ 1 2 = 4.05 , p = 0.04) or from countries where English was not dominant (language × score: χ 1 2 = 4.27 , p = 0.04). However, at least some of the effects of country language are due to the correlation between country HDI status and English dominance (double-blind dataset: r = 0.41, p < 10−15), and the HDI model is a better fit to the data (AIC = 634.9 for HDI model vs. 637.7 for language model).

TABLE 1. The probability of a paper being invited for resubmission is influenced by interactions between the mean review score for the paper and the gender or country Human Development Index (‘HDI’) of the first author.
χ 2 df p
Gender 3.17 1 0.07
HDI 6.50 1 0.01
Review score 175.4 1 <10 −15
Gender × HDI 1.76 1 0.18
Gender × review score 4.47 1 0.03
HDI × review score 7.28 1 0.007
Gender × HDI × review score 2.33 1 0.13
  • Note: As this analysis considers only double-blind reviews, such interactions are evidence of editor bias as shown in Figure 1b. The strength of these effects was assessed with logistic regression (generalised linear model with binomial errors and a logit link) followed by a Type 3 ANOVA. Significant p-values (alpha = 0.05) are indicated in boldface. Model formula: logit (resubmission probability) ~ gender × HDI category × reviewer score; χ2 = Chi-square test value, df = degrees of freedom, p = p-value.
Details are in the caption following the image
Our analysis of editor bias, specifically how author demographics modify the relationship between review score and resubmission probability of papers. We limit the analysis to papers scored under double-blind review and consider the interactive effects of gender and country Human Development Index (HDI). The main plot (a) predicts this relationship for the full range of review scores, whereas the subplots illustrate differences between demographic categories at the two values of review score where these differences are maximised, (b) and (c). Errors in all cases are 95% CI. The model is summarised in Table 1.

1.3 Potential reasons and solutions for editor bias

There are several possible reasons for the editor bias we detected against first authors from lower HDI countries. In their analysis of reviewer bias, Fox et al. suggest that an upward bias in scores for papers from high HDI countries may be due to ‘prestige bias’: the conscious or unconscious expectation of high quality work from such countries (Lee et al., 2013). Editors are likely just as vulnerable to these expectations as reviewers, as they are exposed to and conditioned by the same social, cultural, and institutional biases (Tóth, 2020). Additionally, because the identity of editors (unlike the identity of reviewers) is known to authors, editors may hesitate to reject papers by researchers in very high HDI countries if they fear negative consequences to their own career. Since most editors are from such countries, they are more likely to interact professionally with authors from similar countries (e.g. at conferences). Male authors from very high HDI countries may also be perceived as either more likely to retaliate or more likely to be in positions where they can retaliate (e.g. on editorial boards of other journals: Dada et al., 2022).

Although Fox et al. (2023) found no evidence for gender bias by reviewers, we did find evidence for such bias by editors. However, this editor bias was only revealed when we controlled for differences in the effect of country HDI between men and women. Interestingly, this advantage for men and for authors from very high HDI countries was revealed only for papers with average or above average reviewer scores, perhaps because the rejection of a paper despite high reviewer scores requires editors to be particularly forceful in their criticism of a paper; in this situation, editors may either be more likely to give prestige authors ‘the benefit of the doubt’ to avoid a contested decision, or may feel more vulnerable to professional retaliation by well-placed authors. Rejection of a paper with low reviewer scores, by contrast, simply requires editors to stand behind the reviewers' evaluation. There was a slight paradoxical tendency for such papers to be more often rejected when authored by men from very high HDI countries. We do not have an explanation for this pattern, and suspect it is an artefact originating from the interaction found at high reviewer scores.

In summary, our analysis shows some evidence of editorial bias at FE, at least at the resubmission stage. This bias by editors—at least for high quality papers—is of similar magnitude to the bias by reviewers reported by Fox et al. (2023) (with the caveat that specific predictions overlap slightly in 95% confidence intervals). It should be noted that editors play a large role, perhaps even larger than reviewers, in the fate of submitted papers: they decide if a paper should be sent out to review (FE has 60% rejection rate at this step, particularly reducing the representation of authors from lower HDI, non-English countries according to Fox et al., 2023), which reviewers to contact, whether to invite a revision or resubmission, whether further rounds of reviews and revisions are needed, and finally the ultimate fate of revised papers (acceptance or rejection). Here, we could only evaluate editor bias at one stage of the review process—because we do not have unbiased metrics of paper quality at the other stages—but we have no reason to think that bias does not exist at other stages or does not compound. In fact, a previous study of the FE editorial process (Fox et al., 2016) shows that gender, age and country affiliation of editors affects their selection of reviewers.

So, what can be done? We suggest that the best solution is to make the entire review process blind to reviewers, editors, and authors: a triple-blind process (e.g. Cássia-Silva et al., 2023; Conklin & Singh, 2022). Although triple-blind review is used by a few academic journals, it has not yet been implemented by any ecology and evolution journals (Smith et al., 2023). Instead, editors currently use author identities to ensure that they invite arm's length reviewers, and that the submitted publication does not overlap previous publications by the same author. The challenge to implementing triple-blind review is to find new ways to conduct these functions when author identities are unknown (see also Brodie et al., 2021). Most journals already use sophisticated manuscript submission portals which, with recent advances in machine learning and artificial intelligence, could be further programed to predict potential conflicts of interest (and suggest only reviewers with none), as well as flag substantial overlap between submitted manuscripts and previous publications. Alternatively, a reorganisation of editorial roles could ensure author identity is only provided to editors responsible for choosing reviewers and verifying manuscript uniqueness, whereas decisions to send out for review, resubmission or acceptance/rejection are made by editors without access to author identity (Brodie et al., 2021; Richardson, 2017). Lastly, we note that the onus is already on reviewers and editors to notify journals in the case of a conflict of interest, and this policy could be extended to cover cases where the reviewer and editor believe that they have guessed the identity of an anonymised author and have a conflict of interest (as implemented for the journal Ethics: Richardson, 2017). Fox et al. (2023) report that more than half of reviewers correctly guessed the identities of authors in their double-blind study.

We also urge journals to conduct randomised trials of triple-blind review; to the best of our knowledge, these are yet to be conducted. Ultimately, only by quantifying where bias occurs in the publishing pipeline can we evaluate the effectiveness of potential solutions. Best practices in such quantification include self-identification of demographic attributes (e.g. gender, race, first language, country of origin, socioeconomic status), and taking into consideration intersectional identities (Hughes et al., 2023). We realise that costs may make implementation of a triple-blind process challenging for smaller or non-profit journals. As an interim measure in such journals, increasing the diversity and training of editorial boards (Dada et al., 2022; Liévano-Latorre et al., 2020) may help reduce editor bias—although data are lacking on the effectiveness of these strategies too.

In sum, we commend Fox et al. (2023) and FE for carrying out an important experimental test of reviewer bias, but we suggest that editor bias is just as pervasive a problem and deserves the attention of journals like FE. Policies to overcome editor bias could substantially improve equity in the scientific publication process.

AUTHOR CONTRIBUTIONS

This is a reply to Fox et al. (2023), which was analysed and discussed at a journal club meeting among all authors. Diane S. Srivastava conceived the idea for the reply and drafted the initial manuscript and analysis. Ana Teresa Marques and Diane S. Srivastava made the figures, with input from Luís Borda-de-Água. Joana Bernardino integrated relevant literature, with help from Ana Filipa Filipe and António Proença-Ferreira. All authors contributed critically to the manuscript, figures and analysis and gave final approval for publication. Statement on inclusion: This manuscript aims at improving our understanding of editor bias regarding gender, country income and language and brings together authors from Portugal and Canada with even gender representation.

ACKNOWLEDGEMENTS

We thank members of the Journal Club at Instituto Superior de Agronomia (ISA), organised by LBA, for their input in discussions, plus useful suggestions from Charles Fox and one anonymous reviewer. DSS thanks ISA sabbatical hosts Pedro Segurado and Teresa Ferreira and acknowledges funding from NSERC (Natural Sciences and Engineering Research Council of Canada). JB and ATM are both co-funded by the project NORTE-01-0246-FEDER-000063, supported by Norte Portugal Regional Operational Programme (NORTE2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). Funding from the FCT (Foundation for Science and Technology, Portugal) supported A P-F (grant SFRH/BD/109242/2015), AFF (Stimulus of Scientific Employment contract 2020.03872.CEECIND) and LBA (Norma Transitória—L57/2016/CP1440/CT0022).

    CONFLICT OF INTEREST STATEMENT

    The authors have no conflict of interest.

    DATA AVAILABILITY STATEMENT

    This paper presents a re-analysis of data previously deposited in the Dryad Digital Repository: https://doi.org/10.5061/dryad.m63xsj466 (Fox, 2022). The R script used in our analysis is available at https://doi.org/10.5281/zenodo.8075130.