Exploratory and confirmatory research in the open science era

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. Journal of Applied Ecology published by John Wiley & Sons Ltd on behalf of British Ecological Society 1Norwegian Institute for Nature Research, Trondheim, Norway 2German Centre for Integrative Biodiversity Research (iDiv), Leipzig, Germany 3Institute of Biodiversity, Friedrich Schiller University Jena, Jena, Germany 4Department of Ecosystem Services, Helmholtz Center for Environmental Research UFZ, Leipzig, Germany


| RI G OROUS SCIEN CE IN APPLIED ECO LO GY
As a response to the global biodiversity loss, conservation science and applied ecological research focus on describing patterns of biodiversity change, isolating the factors causing this change and ultimately suggesting management solutions (Kareiva & Marvier, 2012).
Because biodiversity loss and ecosystem transformations are causing major challenges to present and future human societies (IPBES, 2019), the rigour of the science that underpins policy and management decisions is decisive to the well-being of future generations of humans and the fate of our planet's biodiversity. Following some high-profile publications pointing towards a reproducibility crisis in fields such as psychology (Nosek & Open Science Collaboration, 2015) and social sciences (Camerer et al., 2018), there is currently much focus in scholarly publications on the repeatability and reproducibility of scientific results (see e.g. the news feature in Nature by Baker, 2016). Applied ecological research is not immune to these challenges, but so far the discussion has not been high on the agenda within this field. One key aspect of the discussion about scientific rigour (Nosek, Ebersole, DeHaven, & Mellor, 2018) is a revaluation of the distinction between research that mainly seeks to explore patterns in the data (hereafter exploratory research) and research that tests scientific hypotheses that are clearly stated before the study is conducted (hereafter confirmatory research).
In the philosophy of science, this distinction has been extensively discussed, and following the classical paper by Platt (1964) on strong inference the importance of confirmatory research has been long appreciated. Also within conservation science and applied ecology, several authors (including Betini, Avgar, & Fryxell, 2017;Caughley, 1994;Sells et al., 2018) have called for more formal use of confirmatory research and application of the strong inference paradigm (sensu Platt, 1964). However, a rapid screening of a sample from the applied ecological literature (Box 1) suggests that most researchers within the field do not follow the strong inference paradigm (Platt, 1964;Sells et al., 2018), nor do they rely on clearly stated a-priori hypotheses that are tested with empirical data.
Here, we discuss how both exploratory and confirmatory research is needed in applied ecological research, and how both scientists, journal editors and funders should assist in the task of extracting the maximum value from different scientific approaches without blurring the distinction between exploration and confirmation.

| A mature research community should value both exploration and confirmation
One consequence of the 'Open Science' movement  is the focus on open sharing of research data (Wilkinson et al., 2016). Increasing accessibility to data allows researchers to apply an ever-widening range of models to data for exploratory science. This contrasts with the pleas for more widespread adoption of confirmatory research, where hypotheses are described a-priori

BOX 1 Hypotheses and experiments in applied ecology
To gain a rapid insight into the current state of affairs in the scientific literature in applied ecology, we randomly sampled 159 papers published in eight journals covering conservation biology, applied ecology and wildlife management. We only included studies from terrestrial ecology, that were data-driven (i.e. not reviews or pure simulation studies), that presented the results from at least one statistical test, that presented original data or data from literature surveys and focused on aspects of applied ecology relevant for biodiversity management and conservation.
Based on these studies we assessed how often (a) one or more clearly stated hypotheses were presented in the introduction, (b) multiple competing hypothesis were presented and (c) how often strict experimental designs were applied. In addition, we extracted the number of citations registered by Web of Science. A more comprehensive description of the inclusion criteria and data extraction procedures can be found in Appendix S1 in the Supporting Information.
Based on our sample of research papers, it seems that clearly stating a research hypothesis in the introduction is surprisingly rare in the literature ( Figure 1a). Overall, only about 19% of the studies presented clear hypotheses, whereas about 26% presented what we term 'implied hypotheses' or 'partly', where the hypothesis could be inferred from the text but was not presented clearly. After removing articles mainly focusing on methods development, the corresponding proportions were 23% (explicit hypotheses) and 28% (implicit hypotheses) respectively.
Presenting multiple competing hypothesis, as described in the original presentation of the strong inference paradigm (Platt, 1964) is even rarer, and is only visible in two of the studies we reviewed.
Another hallmark of science is the use of well-planned, randomized and replicated experimental manipulation to test for causal relationships (Caughley, 1994;Platt, 1964). Based on our review, however, the use of full experimental designs is rare, and only 12% of the studies we reviewed were based on randomized controlled experimental designs. In addition, 15% of the studies in our sample included Before-After-Control-Impact or Quasi-experimental protocols. The majority of the randomized controlled experiments were performed on a local spatial scale (Figure 1b), although a few studies presented landscape scale experiments. In our sample, local scale studies in general received less attention in the literature compared to studies spanning larger spatial scales when measured in terms of citation rates (Figure 1b). and then carefully tested based on empirical data collected specifically for that purpose (Caughley, 1994;Houlahan, McKinney, Anderson, & McGill, 2017). We agree with the plea for more formal testing of scientific hypotheses in applied ecological research, but would also like to highlight the fundamental role that descriptive studies documenting the state of local or global biodiversity, or the natural history of species has for conservation science (Beissinger & Peery, 2007;Pereira et al., 2013). Exploratory research could also generate new hypothesis that could formally be tested later.
Moreover, a movement towards more planetary scale assessments, such as those carried out by the Intergovernmental Panel on Biodiversity and Ecosystem Services (IPBES), makes it unfeasible for policy to rely mainly on insights gained from experimental research (Mazor et al., 2018; Box 1). Our rapid screening of the literature indeed suggests that large-scale studies often have a large impact, at least if measured through citation rates (Box 1).
Nevertheless, to avoid an ever-growing list of untested hypothesis emerging from exploratory research, we must also revaluate the fundamental (but different) role that hypothesis-testing and prediction play in applied ecological research (Houlahan et al., 2017).
Only by testing a-priori articulated hypothesis can we robustly retain or reject the potential of a scientific hypothesis to describe natural phenomena. Unfortunately, researchers do not always follow this approach, and surveys have revealed a number of questionable research practices (Fraser, Parker, Nakagawa, Barnett, & Fidler, 2018;Ioannidis, Munafò, Fusar-Poli, Nosek, & David, 2014). Such practices include 'Harking' (Hypothesis After Results Are Known), where ad-hoc postdictions are presented as if they were already planned before the study was conducted, and 'p-hacking' where researchers carelessly search for significant associations in the data (and often present them as if they were from a-priori hypotheses). Recent surveys suggest that they might be common also among ecologists and evolutionary biologists (Fraser et al., 2018). Without more frequent use of true hypothesis-testing, we risk that confirmation bias will result in overly self-confident 'storytelling' (Sells et al., 2018).
Basing management actions on such research may lead to costly mismanagement.

| Novel ways to test ecological theories
Our brief survey of the literature (Box 1; see also Betini et al., 2017;Sells et al., 2018) suggests that most research does not conform to strict hypothesis-testing. However, in the open science era, there are ample possibilities to increase the use and impact of confirmatory research, by more widely embracing new tools, methods and increased data availability.
Strict experiments in applied ecology (Box 1) are generally conducted at small spatial scales (although there are some notable exceptions, e.g. Krebs, Boutin, & Boonstra, 1995;Wiik et al., 2019). This contrasts with the fact that many ecological and policy processes operate at far larger scales (Estes et al., 2018).
Better utilization of large-scale unreplicated natural experiments could improve understanding of causal relationships in ecological systems (Barley & Meeuwig, 2017), especially the impacts of rare and extreme events (e.g. Gaillard et al., 2003). Such natural experiments provide researchers with the opportunity for a real-world test of a hypothesis, and can be seen as 'conceptual' replications where different systems and approaches are used to test the same theory. A complementary approach is to integrate findings from F I G U R E 1 In (a) the proportion of articles that reported clear hypotheses, implied or partly indicated hypotheses that were tested and articles that did not present hypotheses. In (b) the proportion of articles that used experimental, quasi-experimental/ Before-After-Control-Impact or no experimental designs are matched with the corresponding spatial scales of the studies. The small-scale manipulative experiments into analysis of large-scale observational data (Kotta et al., 2019). Such integration will necessitate closer collaboration between ecologists working at different spatial scales, and between experimentalists and modellers (Heuschele, Ekvall, Mariani, & Lindemann, 2017). The increased availability of hierarchical statistical models that integrate data from disparate sources has high potential to facilitate such an integration (Isaac et al., 2019). In the new era of open science, large amounts of data from both field surveys and experiments are now becoming available, widening the range of opportunities for data integration.
Given our reliance on observational data, more insight into causal processes could be gained by more widely applying novel statistical methods that seek to strengthen a causality inference from observational data (Law et al., 2017). Causal inference approaches force researchers to think more deeply about the direct and indirect relationships of variables in their study systems (Ferraro, Sanchirico, & Smith, 2019). These approaches include controlling for confounding factors by matching (to control observable confounders) and use of panel data and synthetic controls to control for unobservable confounders, as well as instrumental variables to eliminate unobservable confounders (reviewed by Law et al., 2017). Time-series observational data are particularly useful because they are unidirectional-cause must precede effect (Dornelas et al., 2013) and approaches such as convergent cross mapping are designed to test for causal effects (Sugihara et al., 2012).
Insights into causality should not be seen as a 'one-off' test, and an accumulation of knowledge through replication is fundamental for a robust knowledge base. Triangulation-whereby several approaches are formally applied to the same problem-is therefore useful for assessing the reliability of causal claims (Munafo & Smith, 2018). In general, a wider adoption of systematic reviews and other structured evidence synthesis methods would allow more robust assessment of the evidence base (Pullin & Stewart, 2006). In the open science era, evidence synthesis can increasingly be based on open data rather than on published effect sizes (Culina, Crowther, Ramakers, Gienapp, & Visser, 2018).

| Journal editors and reviewers should assist in the change
Journal editors play an important role in facilitating scientific rigour of the studies that underpin real-life management decisions. This could be further strengthened by creating new incentives for more honest and open reporting from the research process. We acknowledge that many of these processes are already starting to happen across the ecosystem of journals.
Pre-registration of research hypothesis has been advocated (Nosek et al., 2018), partly to distinguish between exploration and confirmation research. In the open science era, studies are increasingly based on pre-existing data, including data that have been previously analysed and with results published in scientific journals. This should not discourage a-priori hypothesis development and pre-registration (Nosek et al., 2018 Harking. Finally, we propose (as a counterpart to pre-registration of hypotheses) a model where hypotheses rising from exploratory research could also be registered so that they are readily available for testing in subsequent studies. Given the rise of global databases and repositories, such a model could make it feasible to track hypotheses to their source, and fair attribution of credit to those that originally proposed the hypothesis. It would also provide a clearer link between exploratory (hypothesis generating) and confirmatory (hypothesis testing) research.

| OUTLO O K
We should value the complimentary and important contributions of both exploratory and confirmatory studies, but be much clearer about the differences between them. In the open science era , where more and more research is based on pre-existing (and often open) data, and where large-scale studies are needed to address key conservation policy challenges, a simple plea to follow the strong inference paradigm (Platt, 1964) might not be sufficient.
However, current incentives that promote the presentation of studies that are, by design and conduct, exploratory as if they were confirmatory is a disservice to scientific progress and a delay in solving real-world problems. The open science era has already radically improved the reproducibility of research; however, we argue that a cultural shift, involving researchers, journals and funding bodies, is still needed towards full transparency and valuation of the plurality of research methods.

ACK N OWLED G EM ENTS
We are grateful to many people at our research department for fruitful discussions about this topic over the last years. We are also grateful to two referees made constructive comments on a previous