StoX: An open source software for marine survey analyses

Scientists across the globe conduct survey programs to monitor and characterize abundance, population structure, biodiversity and geographical distributions. To assess the state of marine fish and zooplankton, population surveys are often repeated annually using standardized sampling protocols and analysis techniques to establish trustworthy stock status. However, although transparency and repeatability are recognised as important principles of this process, it is often difficult to obtain comprehensive documentation of metadata and data processing steps. This is particularly challenging for workflows that include manual processing steps. StoX was principally built to process research‐vessel survey data, and we have included several standard survey estimation models. The software was developed to be robust and versatile and aimed at the open source community, such that users could easily build their own models. StoX is fully integrated with R to utilize the large number of R‐packages and enable any StoX function and stock estimation model to be controlled using R. There has been a large need for a freely available software for research–vessel survey estimation, and StoX is tested in surveys carried out in four continents and is the official tool for many important fish stock surveys. The basic workflow and transparency principles of StoX, together with a customizable GUI, makes StoX applicable for any geographically coded surveys. Future versions of StoX will include statistical models to estimate the catch composition in commercial fisheries. In fields such as conservation management, there is also a need to document the estimation methods, and additional estimation and analyses models, including biodiversity indices are currently implemented. In parallel, we envision a closer web service integration with existing international and national data centres.


| INTRODUC TI ON
To monitor change in abundance, population structure, geographical distribution (Gunderson, 1993) and biodiversity (Johannesen, Høines, Dolgov, & Fossheim, 2012) of fish and zooplankton populations (Dalpadado, Ingvaldsen & Hassel, 2003), scientific acoustic and swept area trawl surveys are conducted worldwide. These surveys are often repeated annually following standard protocols to produce time series, which are used to assess the state of fish stocks (Gunderson, 1993) and the health of ecosystems (Kirkman et al., 2016). However, data analyses are carried out using a range of software applications (Excel, R, SAS, etc.) and typically involves several manual processing steps. Consequently, it is often challenging to obtain a detailed record of the complete stock estimation procedure. In addition, the lack of streamlined data structures, documentation of parameter settings and information regarding subjective user decisions, make it nearly impossible to recalculate long time series with updated parameter settings, to test the effect of new methods and carry out sensitivity tests. This bottle neck in development, and lack of transparency in the data and methods used to produce time series, slows down the implementation of new methods, especially in multinational surveys.
A frequently used tool for other types of survey, such as underwater visual census (Bozec, Kulbicki, Laloë, Mou-Tham, & Gascuel, 2011) and terrestrial life distance sampling, is the software Distance (Thomas et al., 2010). For scientific acoustic and swept area trawl surveys, obtaining an overview of the survey estimation methods used by different research institutes is challenging. There are only a few standardized software packages that are easily accessible. Such as EchoR (Doray, 2013), which is an R-package produced by IFREMER (France) for estimation of acoustic trawl surveys and was developed specifically for IFREMER surveys. Subsequently, it requires certain input data structures and is restricted by particular survey designs which do not necessarily meet the requirements of surveys carried out elsewhere. The software Beam (Totland & Godø, 2001), which was used for acoustic survey estimates in the Norwegian Sea and in the Barents Sea, is dependent on outdated SAS software with an expensive GIS module. Furthermore, the Beam does not support transect-based acoustic estimation or swept area calculations and contains no built-in variance estimation methods. Inhouse programs are available at different laboratories, but they typically lack standardization, documentation and efficient deployment outside their respective institutions.
Several research institutes working with marine surveys in the North Sea, Norwegian Sea and Barents Sea, and surveys within the FAO-Nansen Programme (http://www.fao.org/in-actio n/eaf-nanse n/en) have requested a standard software for survey estimation. As Finally, we discuss present and future development of StoX and give an overview of upcoming modules.

| THE MODUL AR SOF T WARE DE S I G N OF StoX
StoX is designed as a tool for transparent and reproducible survey estimation across nations and survey objectives. All methods, user settings and links to input data are documented in a description file named "project.xml". To enable method flexibility, users can create calculation models (hereafter referred to as StoX models) consisting of sequentially structured StoX processes (hereafter referred to as processes), which are documented in the project.xml file. A process is a user-defined call to one function available in a library of functions defined by StoX. Any StoX model can be modified by changing parameters of the processes, adding or removing processes and by rearranging their order of execution. Different StoX functions require one or more input datasets, which can be data files or output from previously executed processes ( Figure 1).

| INPUT AND OUTPUT DATA
A StoX project is a collection of files organized in a folder structure with three subfolders ( Figure S1, Appendix S1), one for input data files, one for output files and one for the subfolder holding the project.xml file. In addition, StoX projects may require resource files such as strata polygons, which are shipped with StoX and can be F I G U R E 1 Schematic presentation of a StoX model with processes that use functions and their associated parameters. The blue arrows indicate the output and input data for each process complemented by the user. StoX projects and resource files are stored by default in the folder "workspace"> "stox", in subfolders named "project" and "reference", respectively.
Currently, the input data are XML files of three different data types; biotic, acoustic and landing files ( Figure S1, Appendix S1).

Acoustic and biotic input XML files (see Appendix S2, Supporting
Information for data formats) are available from the International Council for the Exploration of the Sea (ICES) Data Portal, Acoustic Trawl Surveys Database (http://acous tic.ices.dk/submi ssions). The acoustic XML files are also provided as output from the LSSS software (Korneliussen et al., 2016), and Echoview (https ://www.echov iew.com) has compatible output with the ICES Acoustic Trawl Survey Database. The "Landing" input folder ( Figure S1, Appendix S1) is designated for commercial fisheries landings data, which will be used in future versions to estimate catch at age from fish populations (Hirst, Storvik, Aldrin, Aanes, & Huseby, 2005). Future versions of StoX may also read oceanographic variables, for example, conductivity, temperature and fields from ecosystem models.

| THE StoX US ER INTERFACE
The StoX user interface is a platform independent tool currently building on Java. For details on installation and dependencies see Appendix S1, Supporting Information. The StoX user interface consists of several menus, windows and tabs organized to support the basic structure of the software (Figure 2). To support constructing StoX models and setting parameters, StoX provides a specialized user interface for manual steps such as tagging of acoustic positions.
The main tasks of each user interface component are described in detail in Appendix S3, Supporting Information.

| THE StoX -R INTERFACE
The StoX functionality can be accessed from R through the Rstox package, and all StoX projects can be executed using R-coding. New StoX projects can also be created using Rstox, however, some settings may depend on interactive tagging using the map StoX user interface map (Figure 2). The R-interface enables users to have easy access to input and output data, and all intermediate steps and parameters of a StoX project. Detailed description of the Rstox package is outside the scope of this paper.

| CURRENT US E OF StoX
Until now, StoX have mainly been used to estimate stock abundance of adult fish and several official international and national survey estimates of commercially and ecologically important fish stocks are produced using Stox. These include North Atlantic stocks such as herring, sprat, blue whiting, cod, haddock, lesser sandeel, boerfish, horse StoX is also used to analyse abundance of fish larvae and demersal fish assemblages, and StoX is currently being tested for surveys in Argentina, Sri Lanka, South Africa, Angola and in the Baltics.

| StoX MODEL E X AMPLE S
The workflow and StoX processes for the swept area abundance estimate of Barents Sea cod in the 1999 winter survey, and the acoustic trawl abundance estimate of herring the 2016 International ecosystem survey in the Nordic Seas are presented in Table 1.
From the output of SuperIndAbundance, the abundance estimate can be split by (or a combination of) population parameters such length, weight, sex, age etc. In Figure 3, the abundance estimate of the surveys are presented by age and length groups.

| D ISCUSS I ON
The rapid adoption of StoX as the official software for the assessment of many fish stocks in the North Atlantic (Table S2, Appendix S4) is indicative of the pressing need for a robust and freely available software that documents the process end-to-end. In parallel with the development of StoX, ICES has established databases to store acoustic and biotic trawl data to support data input files for StoX.
This provides complete transparency from data input to the survey estimates used in stock assessment models, and StoX will be integrated into the new Transparent Assessment Framework that "is a framework to organize data, methods, and results used in ICES assessments, so that they are easy to reference and re-run (http://ices. dk/marine-data/asses sment-tools/ Pages/ trans parent-asses smentframe work.aspx). A similar structure is in place at IMR for national  Identify biotic stations used in the abundance estimation and assign individuals from these stations to a proportion of the calculated abundance by length group. Each of these individuals represent an estimated abundance and is referred to as super individuals.

IndividualDataStations IndividualData SuperIndAbundance
Note: Processes only used for A or B, respectively, are marked with (*) and (+). Details about the StoX models and survey maps are presented in Appendix S5, Supporting Information. in future versions, the introduction of more template models will be available. StoX can also perform calculations over different vertical and horizontal resolutions, and is, in principle, only restricted by the resolution of the input data and computer memory. Therefore, it is possible to estimate fish density by depth channel, considering, for example, a depth dependent target strength (Ona, 2003). In addition, new methods in biological sampling using optical systems (e.g. Rosen, Jörgensen, Hammersland-White, & Holst, 2013)  we also envision a closer interaction between other user groups and developers. We hope that the flexibility of the program will encourage others to develop their own modules and interact with existing users and developers. This will ultimately further enhance the features and functions of both the StoX and Rstox software packages.

ACK N OWLED G EM ENTS
We are grateful to Are Salthaug and Sigbjørn Mehl for software test-