Volume 13, Issue 8 p. 1765-1777
RESEARCH ARTICLE
Open Access

Streamlining analysis methods for large acoustic surveys using automatic detectors with operator validation

Thomas Webber

Thomas Webber

Sea Mammal Research Unit, Scottish Oceans Institute, University of St. Andrews, St. Andrews, UK

Search for more papers by this author
Douglas Gillespie

Corresponding Author

Douglas Gillespie

Sea Mammal Research Unit, Scottish Oceans Institute, University of St. Andrews, St. Andrews, UK

Correspondence

Douglas Gillespie

Email: [email protected]

Search for more papers by this author
Timothy Lewis

Timothy Lewis

Ale Oak Cottage, Ale Oak, UK

Search for more papers by this author
Jonathan Gordon

Jonathan Gordon

Sea Mammal Research Unit, Scottish Oceans Institute, University of St. Andrews, St. Andrews, UK

Search for more papers by this author
Tararak Ruchirabha

Tararak Ruchirabha

Greenpeace International, Amsterdam, The Netherlands

Search for more papers by this author
Kirsten F. Thompson

Kirsten F. Thompson

Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter, UK

Greenpeace Research Laboratories, University of Exeter, Exeter, UK

Search for more papers by this author
First published: 15 June 2022
Citations: 2

Handling Editor: Chloe Robinson

Abstract

  1. Passive acoustic surveys are becoming increasingly popular as a means of surveying for cetaceans and other marine species. These surveys yield large amounts of data, the analysis of which is time consuming and can account for a substantial proportion of the survey budget. Semi-automatic processes enable the bulk of processing to be conducted automatically while allowing analyst time to be reserved for validating and correcting detections and classifications.
  2. Existing modules within the Passive Acoustic Monitoring software PAMGuard were used to process a large (25.4 Terabyte) dataset collected during towed acoustic ship transits. The recently developed ‘Multi-Hypothesis Tracking Click Train Detector’ and the ‘Whistle and Moan Detector’ modules were used to identify occasions within the dataset at which vocalising toothed whales (odontocetes) were likely to be acoustically present. These putative detections were then reviewed by an analyst, with false positives being corrected. Target motion analysis provided a perpendicular distance to odontocete click events enabling the estimation of detection functions for both sperm whales and delphinids. Detected whistles were assigned to the lowest taxonomical level possible using the PAMGuard ‘Whistle Classifier’ module.
  3. After an initial tuning process, this semi-automatic method required 91 hr of an analyst's time to manually review both automatic click train and whistle detections from 1,696 hr of survey data. Use of the ‘Multi-Hypothesis Tracking Click Train Detector’ reduced the amount of data for the analyst to search by 74.5%, while the ‘Whistle and Moan Detector’ reduced data to search by 85.9%. In total, 443 odontocete groups were detected, of which 55 were from sperm whale groups, six were from beaked whales, two were from porpoise and the remaining 380 were identified to the level of delphinid group. An effective survey strip half width of 3,277 and 699 m was estimated for sperm whales and delphinids respectively.
  4. The semi-automatic workflow proved successful, reducing the amount of analyst time required to process the data, significantly reducing overall project costs. The workflow presented here makes use of existing modules within PAMGuard, a freely available and open-source software, readily accessible to acoustic analysts.

1 INTRODUCTION

Researchers are increasingly using bioacoustics to monitor remote marine ecosystems. Passive acoustic monitoring (PAM) has been used to study cetaceans for several decades using a range of approaches. PAM methods allow researchers to investigate a variety of ecological questions across a range of taxa including density estimation (Leaper et al., 2000, 2003; Marques et al., 2009), spatial and temporal distributions (e.g. Merkens et al., 2019; Todd et al., 2020), effects of anthropogenic activity and potential mitigation (e.g. Baumgartner et al., 2019; Macaulay et al., 2017; Malinka et al., 2018) as well as supporting visual surveys (Barlow & Taylor, 2005; Gridley et al., 2020). Mobile sampling, using towed arrays (e.g. Gordon et al., 2020; Rone et al., 2014; Thode, 2004) or gliders (e.g. Bittencourt et al., 2018; Cauchy et al., 2020) can provide designed coverage of larger survey areas. Charter and running costs for dedicated vessels can become a major budgetary component for towed array surveys. However, the use of automated data collection systems can allow towed PAM surveys to be carried out from platforms of opportunity with little human supervision of data collection. Platforms of opportunity are vessels at sea for other purposes which can allow ancillary data collection. Typically, vessel costs are already covered by the vessel's primary task, thus platform of opportunity surveys can be very cost effective. Using platforms of opportunity largely eliminates vessel costs but highlights the importance of cost-effective analysis of large passive acoustic datasets where the project budget is now dominated by analyst time rather than survey costs.

Long-term platform of opportunity projects and autonomous static recorders can collect datasets which extend over months or years. This is compounded by the high sample rates required to capture high frequency vocalisations such as the ~130 kHz echolocation clicks of harbour porpoise (Clausen et al., 2011; Villadsgaard et al., 2007) and Cephalorhynchus dolphins (Kyhn et al., 2009), further increasing the volume of data to process.

In the absence of automated detectors, data are typically processed by an analyst viewing a short-term Fourier transform (STFT) on a spectrogram display and listening to sections of interest. This takes considerable time, particularly for high frequency data where an analyst must listen in less than real time. Semi-automated analysis reduces the manual effort required from analysts, with the analyst's time being reserved for making the final decisions on targeted data. Automation can also reduce biases and errors which often result with human analysts (Aide et al., 2013; Heinicke et al., 2015) and provide additional information from the data such as bearings calculated from time of arrival differences for signals received on multiple hydrophones. A suite of automatic processes are currently available for analysing acoustic data for a range of species, including click detectors and click classifiers (Gillespie et al., 2008; Madhusudhana et al., 2015; Miller & Miller, 2018), energy band comparisons (Klinck & Mellinger, 2011), extraction of spectral features (Gillespie et al., 2013; Lin & Chou, 2015) and more recently, machine learning methods (Bergler et al., 2019; Bermant et al., 2019; Jiang et al., 2018; Shamir et al., 2014). These methods differ in their computational requirements, performance and ability to process sounds from a range of species, the wider environment and anthropogenic sources.

All automatic detection methods are subject to errors: both false negatives where detections from the species of interest are missed, and false positives where detections are erroneously assigned to the species of interest. Choices about how to balance these errors and their impact on the overall results are dependent on the study. Density estimation methods based on distance sampling (Buckland et al., 1993) which were first developed for visual surveys, generally deal well with missed detections by directly estimating the probability of detection as a function of distance from the track line. As long as there is a high probability of detection along the survey track, missed detections (false negatives) at greater distances are of no consequence since the reduction in detections is measured by the reduction in the estimate of detection probability (González et al., 2018; Thomas & Marques, 2012). Marques et al. (2009), showed that acoustic data can also be used if a false positive rate is known, but this is likely to vary for different datasets based on the characteristics of interfering noise. Therefore, determining whether the analyst should examine all detections and remove false detections, or examine a subset and estimate the fraction of detections that were false positives, will depend on deciding a balance between endeavour and statistical robustness.

Sperm whales lend themselves well to acoustic surveys. Their loud clicks can be heard at distances of several km, and they can be tracked and localised from a moving vessel with very modest equipment using target motion methods. Several studies have published abundance estimates for sperm whales using standard line transect survey approaches (Lewis et al., 1998, 2007, 2018) or methods with small modifications (Barlow et al., 2001; Barlow & Taylor, 2005). Data are often processed with either RainbowClick (Gillespie & Leaper, 1996) or more recently PAMGuard (Gillespie et al., 2008). Both programs combine a simple transient click detector, combined with a sophisticated user interface, to enable an operator to efficiently select and group click trains on consistent bearings likely to come from an individual or closely associated group. The click detector generally produces many false positive detections, which may come from a variety of sources: other cetacean species, propellor and engine noise from the survey vessel and other craft, and other naturally occurring sounds such as breaking waves. It is therefore necessary for an operator to examine every screen page of data to eliminate false detections. However, by only displaying detections, the page length can be longer and much less cluttered than would be possible with a standard spectrogram, and additional information such as bearings to detected sounds are displayed. This makes it possible for an operator to scan data offline at many times real-time.

In this study, we took a total of 1,696 hr of continuous data collected from a hydrophone array deployed from a platform of opportunity while it made routine passages for other purposes. It is hoped that this opportunistic PAM data collection will be the start of a long-term project to collect PAM data on a wide scale, thereby contributing to world-wide cetacean population monitoring efforts. Data were processed for the extraction of multiple classes of cetacean sounds including sperm whale echolocation clicks, broadband delphinid echolocation clicks, narrow band high frequency (NBHF) echolocation clicks and delphinid whistles. We report on the distribution of these sound types and provide detection functions for sperm whales and delphinids along the survey track. Importantly, we demonstrate how a carefully selected combination of automatic and manual processing allows for the time efficient processing of large datasets.

2 MATERIALS AND METHODS

2.1 Data collection

Acoustic data were collected opportunistically on-board M/V Arctic Sunrise during passages in the Atlantic, Southern, Arctic and Indian Oceans using a towed hydrophone array (Vanishing Point Ltd; Figure 1). The data collected during this research was collected under permit number RWS-2019/40813 for all work in Antarctic waters. In other regions, no fieldwork permission was required. The array's streamer section comprised two pairs of hydrophone elements mounted within an oil (Isopar M) filled 5 m long, flexible, 35 mm diameter polyurethane tube. This was towed at the end a 350 m Kevlar-strengthened tow cable. Two hydrophones, the ‘medium frequency’ pair (Benthos AQ4 elements and Magrec HP02 preamplifiers, nominal frequency range 50 Hz to 40 kHz) were spaced 3 m apart while the ‘high frequency’ pair (Magrec HP03 hydrophone and preamplifiers units, nominal frequency range 1–200 kHz) were separated by 50 cm. Each array element was connected to one channel of a four-channel SAIL data acquisition card (St Andrews Instrumentation Ltd) where analogue filtering and gain were applied before each channel was sampled at 500 kHz. High pass filters of 10 Hz and gain of 6 dB were applied to the ‘medium frequency’ channels 0 and 1, while a high pass filter of 2 kHz and gain of 12 dB applied to each of the ‘high frequency’ channels 2 and 3. Data from the SAIL acquisition card were written as four channel .wav files using PAMGuard (Gillespie et al., 2008) (available at www.pamguard.org), which also carried out real-time acoustic processing, displayed results and logged the ship location from GPS.

Details are in the caption following the image
Schematic of the towed PAM array and recording system used onboard M/V Arctic sunrise during acoustic surveys

2.2 Click processing

The raw wav data files were reprocessed onshore in conjunction with GPS data collected during the survey using PAMGuard (version 2.01.05).

2.2.1 Detection and detector configuration

Odontocete clicks were detected on recordings from the high frequency hydrophones (channels 2 and 3) using a PAMGuard click detector module. Time of arrival differences for the signal on the two hydrophones were used to estimate an angle of arrival for each detected click relative to the hydrophone array.

To achieve a good compromise between detection efficiency and processing workload, an exploratory analysis was conducted on a representative subset of data from transits 1 and 2, to determine the PAMGuard click detector settings which enabled detection of all manually identified vocalisations while removing as many detections from noise sources as possible. This subset was representative of typical encounters identified during an initial pass through the data. The vessel propulsion system (propellor and engine noise) was the source of many false detections. PAMGuard allows detections on a certain range of bearings to be vetoed. All detections in a 40-degree sector ahead of the vessel, with bearings between +20 and −20 were discarded. To further reduce false detections from background noise, a range of detector trigger thresholds between 10 and 19 dB in 3 dB steps were applied and a value was chosen where the maximum number of false detections were removed, while still retaining all but one odontocete click train. This optimised threshold was used to reanalyse all the recordings and timing, bearing and waveform information for the detected clicks were written to PAMGuard output files for further review and analysis.

Cetacean click vocalisations typically occur in trains, with fairly consistent and characteristic inter click intervals. Thus, trains of clicks, on a consistent bearing, are usually a more reliable cue than individual clicks. The ‘Multi-Hypothesis Tracking (MHT) Click Train Detector’ module within PAMGuard (Macaulay, 2020) was used to automatically group detected clicks into click trains. The module assesses the Inter-Click-Interval (ICI), amplitude, frequency content and bearing information of clicks to assemble putative clicks trains and then calculates the likelihood of being a true click train for every possible click combination. As more clicks are included in the model, the number of possible click trains increases exponentially and a pruning process is implemented so that only the most likely combination of clicks are retained in putative click trains. Each click train is given a χ 2 score. The lower the score, the more likely it is that the clicks within the train come from the same source or target animal. This process is computationally intensive, and while the pruning process increases the efficiency of the model (Macaulay, 2020), removing as many false positive clicks as possible before running the MHT click train detector proved essential.

Settings for the MHT click train detector were based on those suggested by Macaulay (2020) and adjusted iteratively for the data subset. The settings for the MHT click train detector and its classifier were then validated against a full manual analysis of the subset using existing MATLAB functions for PAMGuard (available at: https://github.com/PAMGuard/PAMGuardMatlab) within custom MATLAB scripts (version 9.9.0, MATLAB, 2020).

2.2.2 Classification

Following previous work to identify beaked whales in similar acoustic surveys (Keating & Barlow, 2013; Rone et al., 2014; Yack et al., 2010), two narrow band click classifiers with frequency sweeps were applied to detect beaked whales. The first using the PAMGuard defaults for beaked whales with a test band between 24 and 48 kHz, and the second higher frequency test band (40–80 kHz) to search for higher frequency beaked whale clicks. The presence of a frequency sweep, assessed by eye in Wigner plots of individual clicks was useful in identifying beaked whales. A narrow band classifier was also used to detect narrow band high frequency (NBHF) clicks, with a test band between 100 and 150 kHz, providing a classifier for any NBHF species such as harbour porpoise Phocoena phocoena, dwarf and pygmy sperm whales (Kogia spp.) and NBHF delphinids (e.g. Cephalorhynchus spp.). The classifier within the MHT module uses spectral template classifiers which correlate the average spectrum of each click train with species specific spectral templates and inter-click interval parameters. Classifiers were run within the MHT click train detector for sperm whales, beaked whales and dolphins.

2.3 Whistle processing

PAMGuard's ‘Whistle and Moan Detector’ (Gillespie et al., 2013) was run to detect odontocete whistle contours up to 24 kHz on the wav data files from the ‘medium frequency’ hydrophone pair (decimated to 48 kHz), using settings provided in Gillespie et al. (2013). The detector identifies tonal sounds within recordings using a multi-stage process which removes noise, calculates an FFT, applies an amplitude threshold and joins narrow band peaks in FFTs which are close in time and frequency to show ‘whistle contours’.

Whistle contours were then classified to the species level using PAMGuard's whistle classifier (Gillespie et al., 2013). The classifier works by breaking up the detected contours into fragments of equal length before the mean frequency (Hz), frequency slope (Hz/s) and curvature (Hz/s2) of the fragment are extracted. The distribution patterns of these parameters are calculated for whistles in encounters. The mean, standard deviation and skew of these parameters have been shown to vary between species (Gillespie et al., 2013). Thus, whistles can be classified from multiple whistle fragments, by comparing distributions of fragments measured during acoustic encounters with distributions of contours from known species. Whistle classification was run separately for different geographical regions likely to have a different combination of ‘whistling’ species. In each region the classifier was trained using pre-existing and pre-labelled whistle contours of species likely found in that region. The training data were not necessarily collected from that region however. Contours used in training, also used in Gillespie et al. (2013), had been sampled at 48 kHz with fragment length and section length parameters set at 30 bins (160 ms) and 60 fragments respectively.

Where an event could not be classified, due to an insufficient number of whistle fragments, a range of frequency metrics of each event, such as mean whistle frequency and mean whistle slope were extracted. This information was used to aid species identification. Identification to the family level was attempted where species level identification was not possible.

2.4 Manual audit

All sections of data that contained click train detections and/ or whistle detections were manually audited using the PAMGuard viewer displays and data map. The automatic classification of triggered click trains (e.g. sperm whale, beaked whale, NBHF, delphinid) was corrected where necessary. Echolocating odontocetes can most easily be distinguished on the bearing-time display of the click detector and click characteristics allow events to be placed into a species group. For example, using the PAMGuard Wigner plot for upsweep verification of beaked whale clicks (Papandreou-Suppappola & Antonelli, 2001; Yack et al., 2013). The MHT click train detector often fragmented a single click train into separate sections. In these cases, the analyst ‘marked up’ trains more accurately (Figure 2).

Details are in the caption following the image
Bearing-time windows within PAMGuard showing a sperm whale encounter over a 30-min period. Window A shows the fragmented trains produced by the click train detector, with window B showing the same event after a manual revision and mark-up process to identify single click trains for each vocalising whale where possible. Due to the overlap of click trains, especially at the upper and lower ends of the bearing scale, it is not always possible to distinguish between vocalising individuals

As the whistle and moan detector can trigger on any tonal sound within its detection range, it was important to inspect detected contours to ensure only those from delphinids were included in later analyse and labelled using PAMGuard's ‘Spectrogram Annotation’ module. Delphinid click trains and whistles were merged where temporal overlap occurred into delphinid encounter events.

2.5 Localisation

Click trains were localised using PAMGuard's Target Motion Analysis (TMA) module using the two-dimensional simplex method. This minimises the least squares error within a click train of bearings to a stationary location, estimating a different location for each side of the track. A simple two element linear array was used in these transits. The bearings calculated by a time of arrival difference actually place the target on a semi-circular arc passing beneath the vessel's track line. Fortunately, line transect surveys are quite forgiving for these ambiguities. It has been shown (Leaper et al., 1992; Lewis et al., 2018) that when the perpendicular distances to detections are typically greater than the likely depth of the detected animal, the ‘vertical ambiguity’ can be accounted for by the detection function and has little impact on density estimation. Thus, for localisation purposes bearings can be considered as being horizontal. A left right uncertainty still remains, however, distance sampling methodology only requires a perpendicular distance from the track line (Buckland et al., 1993). Thus, there is no requirement to know which side of the track line an animal is on.

Perpendicular distances can be used to determine a detection function, an equation showing how the probability of detection falls with range from the track line. Perpendicular distances to click trains were processed using the ‘Distance’ package (version 1.0.2, Miller, 2020) within the R statistical analysis environment (version 4.0.2, R Core Team, 2020) to estimate detection functions. Detection functions were calculated for both sperm whales and for non-NBHF delphinids.

Dolphins often bow ride and mill around the vessel. Clearly, target motion analysis cannot be used in these cases. However, a ‘delphinid’ detection function was calculated using a subset of detections which clearly moved past the array that had well-defined click trains.

Detection functions were calculated using data from all transits, as the same vessel and equipment were used throughout the study. Half normal and hazard rate models were explored for each species group, and the best model for each selected based on Akaike information criterion (AIC) scores.

3 RESULTS

A total of 1,696 hr of four channel acoustic data were collected during more than 30,000 km of survey effort across the Atlantic, Southern, Arctic and Indian Oceans (Table 1). This resulted in 25.4 Terabytes (TB) of 16 bit .wav files.

TABLE 1. Summary of acoustic data collection on board M/V Arctic sunrise by transit with data size given in gigabytes (GB)
Transit Start–End Recording effort (hours) Size of recordings (GB) Number of WAV files Distance covered (km) Average speed (knots)
1) English Channel to Islas Canarias

22-Aug-2019

28-Aug-2019

137 1,971 1,874 2,659 8.1
2) Islas Canarias to Dakar

01-Sep-2019

04-Sep-2019

52 753 1,051 1,264 8.9
3) Dakar to Cape Town

16-Sep-2019

15-Oct-2019

191 2,732 3,661 4,057 5.9
4) Cape Town to Vema to Cape Town

23-Oct-2019

07-Nov-2019

79 2,133 1,598 3,903 7.6
5) Cape Town to South Atlantic

03-Dec-2019

16-Dec-2019

261 3,763 5,025 4,783 10.4
6) South Atlantic to Ushuaia

17-Dec-2019

26-Dec-2019

218 3,143 4,827 4,783 10.4
7) Ushuaia to WAP

06-Jan-2020

10-Feb-2020

180 2,596 3,338 3,138 8.6
8) Norway to Svalbard

03-Sep-2020

30-Sep-2020

91 1,317 3,039 2,114 9.5
9) Seychelles to Saya de Malha

02-Mar-2021

30-Mar-2021

487 7,006 10,223 4,961 7.4
Total 1,696 25,414 34,636 31,662 8.5

3.1 Click processing

After exploratory analysis, a 16 dB click detection threshold was chosen over that of the 10 dB threshold used for real-time processing and other tested thresholds (13 and 19 dB) on the basis of the number of retained odontocete clicks and number of noise-originating clicks removed. The processed binary files produced by PAMGuard using a 16 dB compared to a 10 dB threshold were reduced in size between 78.5% and 99.6% for each transit (Table 2). Processing the 1,696.3 hr of .wav files in PAMGuard took approximately 353 hr, an average of 4.8× real-time.

TABLE 2. Number of detected clicks in millions and their file size in gigabytes (GB) to highlight reduction in noise when a 16 dB click detector threshold and 20° veto was applied over the default 10 dB threshold
Transit Number of detected clicks in millions (GB) % reduction
10 dB 16 dB
1 62.0 (13.0) 2.2 (0.5) 96.4
2 23.7 (5.0) 0.6 (0.1) 97.6
3 93.2 (21.7) 0.7 (0.2) 99.2
4 76.2 (19.8) 0.3 (0.1) 99.6
5 14.4 (3.0) 0.2 (0.03) 98.9
6 14.9 (3.1) 3.2 (0.7) 78.5
7 18.7 (3.9) 1.1 (0.2) 94.3
8 8.0 (1.7) 2.7 (0.1) 97.1
9 56.2 (11.8) 5.1 (1.1) 90.9
Total 367.3 (83.0) 16.1 (3.0) 95.6

The MHT click train detector found 5,531 click trains within the subset, 636 of which were automatically classified as sperm whales. A spectral template threshold of 0.7 was chosen. This was the largest value which retained all true positives. Eighty-six sperm whale click trains were manually identified in the subset of data. All of these were identified by the automated click train detector, and no manually identified sperm whale click trains were missed. Five hundred and fifty click trains were, therefore, incorrectly identified as sperm whale click trains. False positive click trains were a mixture of echoes from true sperm whale events, delphinid clicks with peaks at lower frequency and detections from the noise of the towing vessel. 1,144 of the 5,531 click trains from the subset were classified as delphinids, all of these detections were part of manually audited delphinid events and no manually identified delphinid event was missed by the MHT click train detector. Most of the 3,751 click trains which were not classified to species were from the towing vessel on occasions when vessel noise occurred outside the 40° veto zone, such as reflections from the seabed in shallow water and when the vessel was changing heading. Another source of unclassified click trains were short segments of sperm whale click trains which did not meet the correlation threshold of the spectral classifier. In all cases, there were many other short click train segments which did meet the classifier threshold, and so no sperm whale event was missed. Based on this exploratory analysis from the subset, unclassified click trains were discarded from any further analyses and were not manually validated when processing the entire dataset.

The total recording effort during the subset was 52.3 hr, while the duration of the click train encounters (identified automatically and highlighted for the analyst's attention and interpretation) was 17.7 hr, a reduction of approximately 66%, with none of the odontocete detections identified during the initial manual analysis being missed. Analysing this reduced dataset took the analyst 2.2 hr rather than 4.3 hr for the fully manual mark up, almost halving the workload.

The MHT click train detector settings were applied across the entire dataset. A standard laptop computer (Intel i7 2.80 GHz 2nd generation processor with 8GB of RAM) running the PAMGuard MHT click train detector unsupervised took 22.3 hr to process 1,696.3 hr of data. Across the entire dataset the MHT click train detector reduced the amount of data to process manually by 74.5%.

The three click classifiers with a frequency sweep took 56 min to classify all detected clicks for the entire dataset. In total, the manual effort marking up classified clicks from the click classifier and click trains from the click train detector took 55.2 hr for the entire dataset.

It is likely, based on the acoustic characteristics of their clicks, that the six beaked whale detections in the Atlantic were of Cuvier's Ziphius cavirostris, Blainville's Mesoplodon densirostris or True's beaked whales Mesoplodon mirus (Baumann-Pickering et al., 2013; DeAngelis et al., 2018; Shaffer et al., 2013). There were 19 NBHF events detected in total. Two of which were likely harbour porpoise Phocoena phocoena off the northern Norwegian coast (Clausen et al., 2011; Quintela et al., 2020; Storrie et al., 2018; Villadsgaard et al., 2007). Sixteen detections made in shallow waters along the South American coast and one in the Drake passage were likely species of Cephalorhynchys dolphins. Thought to be Commerson's Cephalorhynchus commersonii, Peale's Lagenorhynchus australis or hourglass dolphins Lagenorhynchus cruciger based on click characteristics and species distribution (Cipriano, 2018; Dellabianca et al., 2012; Kyhn et al., 2009, 2010; Reyes Reyes et al., 2015). From these 16 acoustic encounters, two were confidently attributed to Peale's dolphins on the basis of sightings made during the acoustic encounter.

3.2 Whistle processing

PAMGuard's whistle classifier took approximately 62 min to process the entire dataset. The mean correct classification rate in the training dataset varied for each regional species group (64.1%–82.9%) due to the different mix of species included for each region. Confusion matrices showed that within the training dataset all species were correctly classified on most occasions. However, some species showed consistently higher (>20%) miss-classification rates across regional groups. For example, pilot whales were miss-classified as killer whales between 29.7% and 32.5% of the time.

Two hundred and one of the 349 manually verified whistle events detected during this project were classified by the whistle classifier, with most of those not classified containing fewer than 10 whistle contours. For example, four delphinid detections in the Western Antarctic Peninsula (WAP) were not classified but were manually attributed to killer whales Orcinus orca based on vocalisation characteristics (Reyes Reyes et al., 2017; Trickey et al., 2014; Wellard et al., 2020).

Manual validation of the automatically detected whistles across all transits took approximately 35.7 hr with the whistle and moan detector reducing the amount of data to search by 86%.

Table 3 summarises all acoustic events resulting from automatic click and whistle detection. Classification was based on a combination of sightings data, aural inspection and automated click and whistle classifier outputs.

TABLE 3. Summary of acoustic detections made from M/V Arctic sunrise. Likely classifications were made using a combination of sightings data, aural inspection and automated whistle and click classifier outputs
Species grouping Common name Number of acoustic group events
Physeteridae Physeter macrocephalus Sperm whale 55
Ziphiidae Unclassified Beaked whale 6
Phocoenidae Phocoena phocoena Harbour porpoise 2
Delphinidae Of which: 380
Delphinus delphis Common dolphin 8
Feresa attenuata Pygmy killer whale 18
Globicephala spp. Pilot whale 3
Grampus griseus Risso's dolphin 62
Lagenorhynchus acutus or L. albirostris White beaked or white sided dolphins 7
Orcinus orca Killer whale 6
Pseudorca crassidens False killer whale 4
Stenella coeruleoalba Striped dolphin 7
Stenella frontalis Atlantic spotted dolphin 13
Stenella longirostris Spinner dolphin 66
Tursiops truncatus Bottlenose dolphin 5
NBHF Delphinidae Lagenorhynchus australis, L. cruciger or Cephalorhynchus commersonii Peale's, Hourglass or Commerson's dolphin 17
Of which: L. australis Peale's dolphin 2
Unclassified 164
Total 443

The best fitting detection functions for both sperm whales and non-NBHF delphinids used a hazard rate model with two parameters and no adjustments. Fit was assessed using minimum AIC. This gave an effective strip half-width of 3,277 m for sperm whales and 699 m non-NBHF delphinids (Table 4). For sperm whales, there is a reduction in the number of detections within ~500 m of the track line (Figure 3).

TABLE 4. Summary of detection functions for sperm whales and non-NBHF delphinids across the entire M/V Arctic sunrise acoustic transits using click detections
Model AIC score Effective strip half width (m) CV of estimate 95% CI of ESHW
Sperm whale
Hazard rate (hr) 4,834 3,277 0.06 3,135–3,419
Half-normal (hn) 4,852 2,868 0.04 2,456–3,010
Non-NBHF delphinids
Hazard rate (hr) 1,849 699 0.13 569–829
Half-normal (hn) 1,889 1,222 0.04 1,759–1,352
Details are in the caption following the image
Combined hazard rate detection functions for sperm whales and non-NBHF delphinids across all transits on the M/V Arctic sunrise. With effective strip half-width of 3,277 m for sperm whales and 699 m for non-NBHF delphinids

4 DISCUSSION

The semi-automatic processes applied to the large acoustic dataset in this study used existing modules within PAMGuard. This methodology significantly improved acoustic analyst efficiency. Semi-automated analysis methods, such as these, are essential for large-scale acoustic surveys of wild animal populations, where data may be gathered near continuously over months and can total tens of terabytes, and budgetary limitations mean that specialist analyst time must be used cost-efficiently.

We appreciate that the analyses of such data will never be completely optimised for a number of reasons. All detection systems suffer from false positive and false negative detections. While a false positive is easily defined, the definition of a false negative in an acoustic survey is much more complicated. Lowering detection thresholds in order to detect more distant sounds will inevitably lead to a high rate of false positives. False negatives will inevitably increase as detector thresholds are increased to remove false positives. However, distance sampling methodology, used in this study, generally deals well with false negatives (Thomas & Marques, 2012). The detection threshold chosen for the click detector was appropriate for the somewhat noisy towing vessel. Although false positives can generally be removed through manual audit, the additional cost of processing more detections can outweigh the statistical benefit of a larger sample size. A simple ‘optimal’ configuration is therefore illusory unless we know what we are optimising for: is it the maximum number of detections, the maximum ratio between the numbers of true detections and false positives, the maximum number of detections per unit of overall monetary cost, or what? Scientifically, it is important to have an adequate sample for statistical analysis, with few false positives, requiring analyst input. A more obtainable goal is to achieve the best balance between false positives and false negatives, using appropriate threshold levels which can provide sufficient true detections for statistical analysis while managing the analyst time required to eliminate false positives. If these criteria can be met with reduced human effort, then a ‘sweet spot’ will be achieved between true detections and false positives that statistically describes animals along the track line.

It is likely that a mixture of automated processing and human validation will remain the standard method for acoustic data analysis for the foreseeable future. However, improved algorithms, with a higher efficiency and/or a lower false alarm rate will reduce the amount of human effort required for data analysis. For example, Shiu et al. (2020) showed that a new algorithm for detecting the calls of North Atlantic right whales, reduces the false alarm rate to the point where a human analyst can check 1 month of continuous data in 7.44 hr. However, potential short comings of automated detections must be kept in mind, for example the potential to miss changes in vocalisations over time (Sirvic, 2015), and for poorly understood or rare species to be missed (Shiu et al., 2020).

The MHT click train detector produced a large number of false positive sperm whale click trains in the subset, but did not miss any sperm whale events, ensuring important data were not lost during the analysis. Detector settings were heavily influenced by the background noise from the towing vessel and so thresholds should be adjusted to suit the background noise within each particular study.

Validation against the fully manual analysis of the subset showed that all automatically detected delphinid click trains occurred within manually marked up delphinid events and no delphinid events were missed. These comparisons provided a high degree of confidence that few delphinid events would be missed using the MHT click train detector. This detector provides a streamlined way to detect delphinids in large datasets. Grouping individual clicks into click trains and measuring them improves classification and with continued development, there is potential to more accurately associate clicks to individuals (Macaulay, 2020).

While the MHT click train detector does not mark up each click train perfectly, by highlighting click trains automatically, the amount of data an analyst needs to search and mark up is greatly reduced. The 66% reduction in data to search, resulted in a 49% reduction in analyst time. Without reliable detectors, analysis effort is largely governed by data volume; our study shows that with them analysis effort becomes a function of the number of cetacean encounters.

The Whistle and Moan detector (Gillespie et al., 2013) has been used in previous studies to find delphinid whistles (e.g. Erbs et al., 2017; Keating et al., 2015) and vocalisations of baleen whales (e.g. Miller et al., 2016). Gillespie et al. (2013) reported a recall of 88% (the percentage of manual detections which were detected automatically). In this study, false positive whistle detections were manually removed by the analyst. This proved to be an important and necessary step as depth sounders of passing ships and other high frequency signals, likely from telemetry transceivers, triggered the detector on 25 separate occasions. By using this detector, and only investigating triggered events, the amount of data to search was reduced by 85.9% (the analyst needed 35.7 hr to process whistle detections in the full 1,696.3-hr dataset).

For sperm whales, reduction in detection range due to prevailing noise or propagation conditions is shown by the detection function and accounted for in density estimation. For whistles, where we are unable to use target motion analysis to estimate a detection function, it will be necessary to develop alternative methods to measure detection probability.

Although the whistle classifier has been shown to perform well, with Gillespie et al. (2013) reporting correct classification rates up to 94.5%, it is likely some classifications in this study could be incorrect. There is a paucity of specific acoustic data for some species within the survey regions; thus, the data used to train the classifier likely may not have included all species present, nor reflect likely differences in whistles between regions within the same species (Erbs et al., 2017). To increase the confidence in species classification, region specific recordings of as many cetacean species as possible are required to retrain the classifier. This will be particularly important for localised studies estimating abundance or investigating habitat preference (Erbs et al., 2017). When training data become available, delphinid events within this study can be reclassified, compared and corrected where appropriate.

The combined hazard rate detection function for sperm whales gave an effective strip half-width (ESHW) of 3,277 m. Other studies using similar methodologies have reported ESHW's of 4.2–10 km (Fais et al., 2015; Gordon et al., 2020; Lewis et al., 2018). The 3.3 km ESHW reported is narrower than reported by other studies which is likely due higher levels of noise emitted by the towing vessel in this study. This detection function shows that a vocally active sperm whale within 1,500 m of the track line will be detected, and so g(0) for a vocal sperm whale is equal to 1. However, sperm whales are known to have silent periods which can be a function of social behaviour and may vary regionally (Jaquet et al., 2001; Whitehead & Weilgart, 1991) but these data are limited.

The sperm whale detection function measured during this study showed relatively low detections immediately adjacent to the track line. This is a common characteristic of sperm whale acoustic detection functions derived using data from simple two element towed arrays Gordon et al. (2020). It has been suggested that this is consequence of plotting in two dimensions, which uses bearings that have an unknown vertical component. This effect will be most evident for sperm whales vocalising at depth close to the track line. TMA calculates the distance to the animal from the hydrophone but a substantial component of this will be due to the animal's depth rather than its horizontal distance from the track line. Leaper et al. (1992) and Lewis et al. (2018) explored this with simple simulations and concluded that if the ‘shoulders’ of the detection function were greater than typical vocalisation depths for the target species then this effect should not lead to substantial biases. Thus, problems could arise if detection range was substantially reduced as a result of high noise levels or unfavourable propagation conditions. If there is a reason to believe that propagation conditions are markedly different in different regions, then efforts should be made to stratify detection functions by region.

There is rather little information on detection range for delphinids during acoustic surveys. Martin et al. (2020) measured an ESHW of 367 m for dusky dolphins Lagenorhynchus obscurus, while Rankin et al. (2008) reported that the majority of dolphin detections were within a few kilometres of the array. Given the varied nature of group size and behaviour for delphinid species, and uncertainty in classifications, it is unlikely that reliable estimates of density could be obtained for delphinids using this acoustic method alone.

After an initial tuning process, automatic processing took 268 hr to run unsupervised on standard PC hardware (Intel i7 2.80 GHz second generation processor with 8GB of RAM). 90.9 hr of analyst effort was then required to manually audit automatic click train and whistle detections from 1,696 hr of data. A major advantage of PAM surveys on platforms of opportunity is that low field costs can allow very large datasets to be acquired. Analysis costs then become a large proportion of the overall budget. Thus, reducing required analyst time will have a large impact on overall costs allowing such surveys to become a routine activity during transits producing valuable data on poorly known oceanic species. We suggest a similar approach is applicable for other applications where large bioacoustics datasets are one of the primary survey tools, for example, wild bats, insects or even in domestic animal welfare monitoring (Mcloughlin et al., 2019; Zilli et al., 2014).

The coverage provided by this initial survey is extremely broad, but sparse. The real value of initiatives like this will come once data have been collected for several years. Even so, interesting information on distributions in rarely surveyed areas is evident. Delphinids were detected frequently across every transit. Sperm whales were detected close to the shelf break and in oceanic waters in all transits except that in the Southern Ocean. Higher detection rates were evident off north-west Africa, South Africa, South America and Svalbard. Sperm whales were also detected near seamounts such as Vema and Filippov in the South Atlantic and the Bathymetrists Seamounts Chain in the Tropical Atlantic. These detections address key knowledge gaps on the distribution of sperm whales in poorly surveyed ocean areas, helping researchers target future survey efforts. Further survey effort would provide data which could be used in habitat models to enable a more robust comparison between regions and expand on our understanding of fine scale distribution and their drivers. The data gathered during this study are freely available through the online repository OBIS-Seamaps (https://seamap.env.duke.edu/).

Over the past decade, more reliable and cost-effective hardware, and sophisticated software has enabled non-specialist researchers to conduct bioacoustic surveys. In the marine environment such acoustic surveys can be conducted during opportunistic transits using a variety of survey platforms using highly automated and relatively inexpensive towed hydrophone systems. The task of detecting and classifying detections in such data so that species distributions and densities can be inferred is a time-consuming process for specialist acoustic analysts. Our study provides a template for efficient analysis of such large-scale acoustic datasets, reducing the time required by specialist analysts, and ultimately the cost of any acoustic-based study.

AUTHORS' CONTRIBUTIONS

K.F.T., D.G. and J.G. conceived the ideas and designed methodology; T.L., K.F.T. and T.R. collected the data; T. W. and D. G. analysed the data; T.W. and D.G. led the writing of the manuscript. All authors contributed to the drafts and gave final approval for publication.

ACKNOWLEDGEMENTS

We thank the crew of the Arctic Sunrise for helping to collect these data during approximately 30,000 km of transits and campaign work in four oceans. In particular, we acknowledge the teamwork involved in the deployment, checking and retrieval of the hydrophone. We also thank José Antonio Vázquez Bonales and Kike Perez Gil for data collection (Amsterdam–Senegal). Logistical support was provided by Grant Oakes, David Santillo, and Paul Johnston along with other members of the Greenpeace Research Laboratories, University of Exeter and Greenpeace Operations Department, Greenpeace International. We also thank the anonymous reviewers for their comments which helped to improve the clarity of the manuscript.

    CONFLICT OF INTEREST

    The authors declare that they have no competing interests.

    PEER REVIEW

    The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13907.

    DATA AND SOFTWARE AVAILABILITY STATEMENT

    Detection data are available from the OBIS-SEAMAP repository https://seamap.env.duke.edu/dataset/2155 & https://seamap.env.duke.edu/dataset/2156 (Webber et al., 2021a, 2021b). The PAMGuard software is freely available as an executable program which can be downloaded from www.pamguard.org. Source code is available at https://github.com/PAMGuard/PAMGuard. The PAMGuard settings file used in this study can be found at 10.5281/zenodo.6558962 (Webber et al., 2022).