Emerging opportunities and challenges for passive acoustics in ecological assessment and monitoring

High‐throughput environmental sensing technologies are increasingly central to global monitoring of the ecological impacts of human activities. In particular, the recent boom in passive acoustic sensors has provided efficient, noninvasive, and taxonomically broad means to study wildlife populations and communities, and monitor their responses to environmental change. However, until recently, technological costs and constraints have largely confined research in passive acoustic monitoring (PAM) to a handful of taxonomic groups (e.g., bats, cetaceans, birds), often in relatively small‐scale, proof‐of‐concept studies. The arrival of low‐cost, open‐source sensors is now rapidly expanding access to PAM technologies, making it vital to evaluate where these tools can contribute to broader efforts in ecology and biodiversity research. Here, we synthesise and critically assess the current emerging opportunities and challenges for PAM for ecological assessment and monitoring of both species populations and communities. We show that terrestrial and marine PAM applications are advancing rapidly, facilitated by emerging sensor hardware, the application of machine learning innovations to automated wildlife call identification, and work towards developing acoustic biodiversity indicators. However, the broader scope of PAM research remains constrained by limited availability of reference sound libraries and open‐source audio processing tools, especially for the tropics, and lack of clarity around the accuracy, transferability and limitations of many analytical methods. In order to improve possibilities for PAM globally, we emphasise the need for collaborative work to develop standardised survey and analysis protocols, publicly archived sound libraries, multiyear audio datasets, and a more robust theoretical and analytical framework for monitoring vocalising animal communities.


| INTRODUC TI ON
There is a growing need for cost-effective, scalable ecological monitoring techniques, in light of global declines in biodiversity (Cardinale et al., 2012). Alongside addressing fundamental ecological questions, survey and monitoring data are essential in evaluating trends and drivers of population change, informing conservation planning and efficacy assessment, and addressing biodiversity policy commitments (Honrado, Pereira, & Guisan, 2016). Traditional survey methods (e.g., manual counts, trapping) are limited by being resource intensive and invasive, but are now complemented by a suite of high-throughput sensing technologies including satellite sensing, LIDAR, and camera traps. Passive acoustic sensors have become an increasingly important component of this survey toolbox. Many animals emit acoustic signals that encode information about their presence and activities (Bradbury & Vehrencamp, 1998). Sound is also an important feature of the sensory environment, and anthropogenic acoustic phenomena are a critical yet understudied dimension of global change (e.g., Buxton et al., 2017).
Opportunities to acoustically survey wildlife and environments have historically been limited by technological costs and constraints, but this situation is fast improving. For example, the recently released AudioMoth low-cost sensor has seen broad uptake for study objectives ranging from population ecology to anthropogenic activity (Hill et al., 2018). Such initiatives now enable deployment of multisensor networks at scale, involving both experts and volunteers Newson, Evans, & Gillings, 2015). Passive acoustic monitoring (PAM) is thus increasingly suited to objectives-driven survey and monitoring programmes, whose protocols must be standardisable, scalable, and financially sustainable (Honrado et al., 2016).
However, the resulting massive audio datasets still present formidable logistical and analytical difficulties, and it remains unclear how effectively current PAM methodologies, which have mostly been developed in small-scale, taxonomically focused contexts (mostly bats and cetaceans), can translate to the broader challenges of acoustic biodiversity monitoring. In this review, we synthesise current research to highlight emerging opportunities and critical knowledge gaps. We discuss current applications of PAM technologies, identify challenges and research priorities at each stage of the PAM pipeline ( Figure 1), and lastly discuss significant emerging trends for PAM in ecological research.

| PA SS IVE ACOUS TI C S APPLI C ATI ON S IN ECOLOGY
Many animals actively produce sound for communication, and echolocating species also emit sounds for navigation and prey search (Bradbury & Vehrencamp, 1998). Vocalising animals thus leak information into their surroundings regarding their presence, behaviour, and interactions in space and time (Kershenbaum et al., 2014). Long-established acoustic survey methods, for example, bird or amphibian point counts, typically involve experienced surveyors identifying species in the field (Gregory, Gibbons, & Donald, 2004). In contrast, PAM involves recording sound using passive acoustic sensors (recorders, ultrasound detectors, microphones and/or hydrophones; henceforth "acoustic sensors") (Blumstein et al., 2011) and subsequently deriving relevant data from audio (e.g., species detections, environmental sound metrics) (Bittle & Duncan, 2013;Digby, Towsey, Bell, & Teal, 2013;Merchant et al., 2015) (Figure 1). Passive acoustics approaches have long been applied to studying visually cryptic animals such as cetaceans and echolocating bats (Nowacek, Christiansen, Bejder, Goldbogen, & Friedlaender, 2016;Walters et al., 2013), but in recent years their scope has expanded with the arrival of purposedesigned acoustic sensors. These are noninvasive, autonomous, usually omni-directional (sampling a three-dimensional sphere around the sensor), and offer the advantage of a larger detection area and fewer taxonomic restrictions than camera traps (which F I G U R E 1 A typical passive acoustic monitoring workflow are usually limited to detecting larger birds and mammals at close range) (Lucas, Moorcroft, Freeman, Rowcliffe, & Jones, 2015).
Species detections derived from PAM are analogous to other forms of survey data, with applications ranging from species occupancy estimation to biodiversity assessment (detailed in Table 1). Their benefits over traditional surveys include continuous surveying for long periods with low manual effort, and the associated higher likelihood of detecting rarer or less vocally active species (Klingbeil & Willig, 2015). Standardised post hoc analysis also avoids the skill level biases in species identification that often impact citizen science data (Isaac, van Strien, August, de Zeeuw, & Roy, 2014). Conversely, current limitations of PAM data include their unsuitability for studying nonacoustic species, and the inability to identify individual calling animals for most taxa (in contrast to visual recognition or mark-recapture).

| Passive acoustic sensor hardware
In contrast to early PAM studies that repurposed field recorders (Riede, 1993) or naval or seismological equipment (Sousa-Lima, Fernandes, Norris, & Oswald, 2013), commercial acoustic sensors are now comparable to camera traps in durability and user-accessibility ( Figure 1a). Improved battery life and storage, on-board metadata collection and programmable schedules allow for extended autonomous deployments with flexible sampling regimes (Aide et al., 2013;Baumgartner et al., 2013). However, hardware costs have limited scalability, with ubiquitous models such as Wildlife Acoustics Song Meters often substantially more expensive than equivalent-spec camera traps. When synchronous multisensor surveys are unnecessary, one common solution is repeated redeployment of a handful of sensors, for example, the Norfolk Bat Survey loan out ultrasonic detectors to hundreds of volunteers (Newson et al., 2015).
Looking forward, emerging open-source, microcomputer-based sensors are significantly cheaper than commercial alternatives (Sethi, Ewers, Jones, Orme, & Picinali, 2018;Whytock & Christie, 2017). For instance, the AudioMoth can be mass-produced to reduce unit cost to around US$30 (Hill et al., 2018), thereby drastically lowering the initial financial barriers to large multisensor surveys, although maintenance costs (e.g., regular replacement of batteries and SD cards) may substantially increase in larger projects. Furthermore, in some cases the use of inexpensive components (e.g., microelectromechanical systems (MEMS) microphones) might involve tradeoffs between sensor cost and data quality, for example, if these show inconsistent frequency response, lower signal-to-noise ratios, or are vulnerable to environmental damage. A critical open question concerns how much data quality can be sacrificed without compromising the ability to derive sufficient information from audio (i.e., accurate species identification) (e.g., Figure 2). Addressing this question requires comparative analyses of data collected simultaneously with different sensor models (Adams, Jantzen, Hamilton, & Fenton, 2012), and the answer may vary taxonomically since certain species are intrinsically harder to distinguish acoustically than others (see below in "Automated sound identification") (Kershenbaum et al., 2014).

| Survey design and data standardisation
Understanding the comparability of audio data collected using different sensor models and sampling protocols, across different environments, is an ongoing challenge ( Figure 1b)  Measuring spatiotemporal trends in acoustic indices as proxies for community diversity, e.g., relationship between indices and habitat, or community vocalising phenology (Nedelec et al., 2015;Sueur et al., 2014) Relationships between index values and community diversity poorly understood. Indices are strongly sensitive to variation in nonbiotic sound (e.g., from anthropogenic sources)

Environmental sound
Metrics of sound pressure and spectral density; also acoustic indices (e.g., NDSI) Measuring the acoustic environment (e.g., anthropogenic sound) and relationships with wildlife abundance and behaviour Pirotta et al., 2015) More complex metrics (i.e., acoustic indices) may be sensitive to variation in different sound types (e.g., weather) Intraspecific individual identification Detection counts, identified to individual by differences in call structures or repertoire Study of individual call repertoires, social behaviour, or facilitating density estimation, e.g., in birds and cetaceans (King et al., 2013;Petrusková et al., 2016) Currently not possible for most species, due to limited reference data and/or poor knowledge of individual variation in calls/repertoire Sound waves attenuate as they travel through the environment, until at a certain distance from the caller they are no longer detectable above ambient background noise. This distance varies depending on the sound's amplitude and frequency (higher frequencies attenuate more rapidly), the environmental medium (sound velocity in seawater is over four times greater than in air), the caller's position relative to the sensor (e.g., differences in depth underwater) and environmental features such as vegetation, topography, bathymetry, temperature, and pressure (Farcas, Thompson, & Merchant, 2016) (Supporting Information Appendix S1). Sounds can also be masked by nontarget sound, from anthropogenic sources as well as other vocalising animals. The effective sampling area around an acoustic sensor therefore varies among species and call types, and across space and time ( Figure 3). If unaccounted for, any resulting detection biases (e.g., towards animals that call at higher amplitudes and/or lower frequencies) may cause biased population or diversity estimates.
Although previously often overlooked in the PAM literature, there are now increasing efforts to systematically quantify sources of bias and improve survey standardisation. These include sensor calibration guidelines , metadata standards (Roch et al., 2016), assessing the efficacy of sampling designs Ultimately, these efforts should facilitate more robust, data-driven approaches to analysing large, multisensor acoustic datasets, which currently tend to assume constant species detectability over space and time (e.g., Davis et al., 2017;Newson et al., 2015).

| Trade-offs in audio recording and data storage
During digital sound recording, incoming sound waves are transduced into an electrical signal that is recorded at a specified sampling rate (in Hz) and bit-depth (number of bits per sample). These parameters determine a recording's frequency (pitch) and amplitude (volume) resolution, with much higher sampling rates required to revolve ultrasonic frequencies (those above human hearing range; >20,000 Hz) compared to audible range frequencies (20-20,000 Hz) (Supporting Information Appendix S1). The conventional sampling rate for audible sound (44.1 kHz) produces relatively manageable file sizes (c. 5 MB per minute in 16bit mono), but recording full-spectrum ultrasound in bat and cetacean surveys (sampling rates often >200 kHz) produces very large files, resulting in a trade-off between data quality and storage capacity. Some ultrasound detectors use less data-intensive recording methods based on frequency division, which divide the incoming signal frequency by a specified factor; their lower storage requirements may suit extended or remote deployments, provided sufficient information can be derived from the data (e.g., Jaramillo- behaviours (Adams et al., 2012;Walters et al., 2013). In future, these analytical tools may become less sensitive to recording method, but currently ensuring minimal information loss during recording and storage (Supporting Information Appendix S1) both facilitates species identification (Walters et al., 2012) and futureproofs the data by allowing for later reanalysis with improved tools.
Crucially, recording and storing audio at sufficient quality ( Figure 1c), alongside detailed metadata on surveys, sensor type and recording parameters, also provides opportunities to address additional questions. For example, a recent study collated multi- year hydrophone data to estimate the distribution of the critically endangered North Atlantic right whale Eubalaena glacialis (Davis et al., 2017). Leveraging decades of PAM survey data will require collaborative development and maintenance of web infrastructure for the collation and public archiving of massive (multi-gigabyte to petabyte) environmental audio datasets (e.g., https://ngdc.noaa. gov/mgg/pad/). Another possible solution to data capacity issues could be to reduce the amount of audio that is stored, for example, by applying on-board thresholds or algorithms that only trigger recording when potential sounds of interest are present (Baumgartner et al., 2013;Hill et al., 2018). Discarding audio data is scientifically undesirable, but some degree of prior filtering can prevent datasets becoming unmanageably large, and combined with wireless data transmission (Aide et al., 2013) could facilitate real-time ecological monitoring and reporting.

| DE TEC TING AND CL A SS IF YING ACOUS TI C S I G NAL S WITHIN AUD I O DATA S E T S
For studies focusing on specific species or taxonomic groups, target sounds must be identified from recordings (Aide et al., 2013;Salamon & Bello, 2015), which requires pipelines to process sound files and metadata and output useful annotations (e.g., calling animal species, location, precise date/time) (Figure 1d,e).
Conducted manually, this process is time-consuming and subjective, and it is difficult to quantify biases related to analyst knowledge level, which may be particularly problematic in resource-limited conservation settings Kalan et al., 2015).

| Developing a pipeline for automated sound identification
A pipeline for automatically identifying target sounds within audio recordings (hereafter referred to as "automated sound identification" or "auto-ID") involves several stages ( Figure 4). Audio waveforms are commonly preprocessed to recover frequency information and produce a time-frequency-amplitude representation (spectrogram) (Figure 4a,b), usually via Fourier analysis or similar techniques (Supporting Information Appendix S1).
Relevant sounds must first be detected, that is, located in time within the recording (a task sometimes alternatively termed "segmentation") ( Figure 4c), using methods ranging in complexity from simple thresholding to complex statistical models (Table 2) which return the estimated likelihood that a sound belongs to its assigned category (Table 2).
Although methods are fast improving, poor or variable accuracy of auto-ID tools remains a major issue. In particular, the detection stage presents formidable difficulties (Stowell et al., 2016 The often substantially poorer performance of detection and classification algorithms on target audio recorded in novel contexts (e.g., difficult sensor models or more background noise than the training data), is a critical emerging problem as data collection capacities continue to grow (Stowell et al., 2018).
In ecology, auto-ID tools are commonly developed for studyspecific objectives and trained on data representative of the

| Emerging innovations in sound identification
Looking forward, several emerging methods are substantially improving detection and classification accuracies by learning representations from spectrogram data, such as unsupervised feature extraction (Salamon & Bello, 2015;Stowell & Plumbley, 2014) and dynamic time warping based feature representations (Stathopoulos, Zamora-Gutierrez, Jones, & Girolami, 2017). Deep convolutional neural networks (CNNs) are particularly promising, since these can learn discriminating spectro-temporal information directly from annotated spectrograms (bypassing a separate feature extraction stage), improving their robustness to sound overlap and caller distance (Goeau et al., 2016) (Figure 4d). In recent tests, CNNs have markedly outperformed alternative methods on detection and classification of biotic and anthropogenic sounds in urban recordings (Fairbrass et al., 2018;Salamon & Bello, 2016) and animal calls in noisy monitoring datasets (Goeau et al., 2016;Mac Aodha et al., 2018;Marinexplore, 2013). Their performance in more complex tasks that involve distinguishing multiple overlapping vocalisations (e.g., songs in the dawn chorus) has not yet been tested, although their success in similarly challenging computer vision and individual human voice recognition tasks is a promising sign (e.g., Lukic, Vogt, Dürr, & Stadelmann, 2016). However, currently such applications in ecology are constrained by CNN sensitivity to overfitting to training data, and the consequent requirement for very large training datasets that represent natural variability in species call repertoires, background sound, and caller distance (Krause et al., 2016;Russakovsky et al., 2015). Although more accessible for image or voice classification (e.g., using online images or audio) (Krause et al., 2016), very few such datasets exist for environmental sound, since the practical difficulty of reference data collection means that verified wildlife call libraries, when available, are typically small in size and lack variability in call type, recording quality, and acoustic environment. Some studies have partially addressed this issue by augmenting training data with background noise to simulate different distances and acoustic environments (Salamon & Bello, 2016). Online data labelling projects such as Bat Detective (www.batdetective.org) and Snapshot Serengeti

| Sound libraries and training data: identifying and filling the gaps
Perhaps the most fundamental knowledge gap for PAM is the limited availability of comprehensive, expert-verified species call databases for reference and training data. Much remains unknown about the intra-and interspecific call diversity of even well-studied taxa (Kershenbaum et al., 2014), and ground-truthed call databases are difficult and laborious to assemble, requiring the collection of high-quality audio recordings of animals identified to species either visually or through capture (e.g., Zamora-Gutierrez et al., 2016).
Where such verified datasets exist they are biased towards vertebrates (particularly cetaceans, bats, and birds), with especially scarce resources for anurans and invertebrates (Lehmann, Frommolt, Lehmann, & Riede, 2014;Penone et al., 2013) and regions outside Europe and North America, despite the urgent need for tools to facilitate monitoring of subtropical and tropical habitats (Zamora-Gutierrez et al., 2016). These gaps translate into equivalent biases in classifier availability, and to our knowledge no widely available tools exist for distinguishing intraspecific acoustic behaviours (e.g., social from echolocation calls in cetaceans and bats) (Figure 4e), although machine learning methods have successfully been applied to analysis of bat acoustic social behaviour (Prat et al., 2016).
Filling these data gaps is a priority for the entire PAM community, which would strongly benefit from collaborative efforts to collect verified call data for neglected taxa and regions (e.g., tropical terrestrial biomes). Additionally, the establishment of centralised sound libraries with consensus data and metadata standards (e.g., date/time of recording, geographic location, recording parameters, sensor position) (Roch et al., 2016), would improve the accessibility and comparability of ref-  (Mellinger & Clark, 2006;Sayigh et al., 2016).

| Inferring population information from acoustic data
Following processing, a typical sound identification pipeline outputs a spatially and temporally explicit record of species call detections ( Figure 1e). Population inference from PAM-derived species occurrence or count data presents its own difficulties, since acoustic surveys involve multiple sources of detection uncertainty. The first is imperfect detectability: the probability of successfully detecting a vocalising animal depends on its distance from the sensor, vocalising behaviour, call parameters, and site-specific environmental factors (Darras et al., 2016;Kéry & Schmidt, 2008). The second issue is that species vocalisations recorded in close spatial or temporal proximity are statistically nonindependent since they may come from the same individual (Lucas et al., 2015); for example, detection rates may be artificially inflated by individual animals vocalising close to a sensor for long periods. However, acoustic identification of individuals is currently not possible for most taxa, and where possible (e.g., for some birds, primates, cetaceans, and wolves) usually requires extensive manual analysis (e.g., Clink, Bernard, Crofoot, & Marshall, 2017;Petrusková, Pišvejcová, Kinštová, Brinke, & Petrusek, 2016;Root-Gutteridge et al., 2014). Furthermore, many vocalising animals produce multicall sequences (e.g., birdsong phrases, echolocation passes) which must be merged into discrete detections (Jaramillo-Legorreta et al., 2016;Newson et al., 2015). The third major source of uncertainty relates to errors in automated sound identification (Figure 4) (Digby et al., 2013). Predicted detections and classifications below a suitable confidence threshold can be removed prior to modelling, however, site-specific differences in false-positive and -negative rates (e.g., due to environmental noise) may still impact model estimates.
Statistical analyses (Figure 1f) must account for these uncertainties. For example, patch occupancy models are useful tools for spatially explicit distribution modelling with PAM-derived data, since these incorporate detection probability parameters that can be estimated from repeat surveys (e.g., Campos-Cerqueira & Aide, 2016; Kalan et al., 2015). Also, the emergence of more accessible and less computationally expensive Bayesian inference methods for complex hierarchical and occupancy models is increasingly enabling multiple sources of uncertainty to be incorporated into spatiotemporal models (e.g., Isaac et al., 2014;Ruiz-Gutierrez, Hooten, & Campbell Grant, 2016). Such frameworks can be extended to include, for example, the confidence associated with automated call detections and classifications (Banner et al., 2018).
A core application of ecological survey data is abundance and population trend estimation. Abundance estimation from PAM count data is difficult due to the lack of a simple relationship between call counts and animal density; the last decade has seen a growing toolbox of methods to address this issue (reviewed in Marques et al., 2013). Spatially explicit capture recapture models (across multisensor arrays and networks) (Stevenson et al., 2015) and other methods that adjust detected call density by the average calling rate of the target species (Thompson, Schwager, & Payne, 2010;Ward et al., 2012) have been shown to provide accurate density estimates when validated against nonacoustic methods.
Another recent study developed a generalised extension of a random encounter model (REM) originally designed for camera trap data (Lucas et al., 2015). However, these methods are often dataintensive, requiring the deployment and retrieval of multisensor networks and the estimation of species-specific parameters such as detection distances and average call rates (Lucas et al., 2015). In cetacean studies, call rates are often estimated by tagging animals with acoustic loggers (Johnson & Tyack, 2003), but in terrestrial realms these remain too large to ethically deploy on many species.
Estimation of true abundance may be best suited to well-resourced projects with clear, species-focused objectives, rather than broader scope ecological monitoring.
Informed indices of abundance may suffice where these more complex analytical methods are unfeasible. Detection counts within specified sampling periods are often used as proxies for relative density or activity, such as nightly bat detections (Newson et al., 2015) or temporally aggregated click rates in cetacean surveys (Jaramillo-Legorreta et al., 2016). Such approaches generally assume consistent detection between individuals and over time, even though the relationship between detection rates and relative abundance may vary widely between species and habitats (Marques et al., 2013). However, with careful survey design and replication, these issues may be less problematic for estimation of broad-scale activity or occupancy trends.

| Acoustic ecological community and biodiversity assessment
Moving beyond a species focus and towards deriving community information (e.g., species diversity) from PAM data presents the challenge of classifying calls from multiple, or ideally all, vocalising species. For most taxa and geographical regions this is currently either impossible or extremely time-consuming due to the lack of reference data and auto-ID tools, which emphasises the need for acoustic biodiversity indicators (Figure 1g) to facilitate surveys of data-deficient (often highly biodiverse) regions (Harris, Shears, & Radford, 2016). Monitoring proposed indicator taxa such as bats or orthoptera offers one potential solution (Fischer, Schulz, Schubert, Knapp, & Schmoger, 1997;Jones et al., 2013) but their usefulness as ecological indicators is not clearly established. Recent years have therefore seen the development of soundscape-based methods that seek to infer community information from a habitat's global sound dynamics (Pijanowski, Farina, Gage, Dumyahn, & Krause, 2011) ( Figure 5). Under the theme of ecoacoustics, various summary indices have been designed to facilitate comparison of biotic sound between sites and over time (reviewed in Sueur et al., 2014). Most involve calculation of power ratios between multiple frequency and/ or time bins across a recording, and thus are essentially more complex extensions of conventional sound pressure and spectral density metrics (Kasten, Gage, Fox, & Joo, 2012;Merchant et al., 2015;Pieretti, Farina, & Morri, 2011;Sueur et al., 2008) ( Figure 5). Acoustic indices are derived from the theory that competition for acoustic space between sympatric signalling animals drives the evolution of signal divergence (acoustic niche partitioning), and therefore that the spectro-temporal diversity of biotic sound in a habitat correlates with vocalising species diversity (Pijanowski et al., 2011;Sueur et al., 2008). For example, acoustic entropy and dissimilarity indices are designed as acoustic analogues of classical α-and β-diversity indices (Sueur et al., 2008).
More fundamentally, the theorised link between community and biotic sound diversity remains controversial. The acoustic F I G U R E 5 Indices of biotic and environmental sound. Conventional metrics such as power spectral density (ai) can measure the acoustic environment. Ecoacoustic indices range from simple power ratios across broad frequency bands (e.g., Normalised Difference Soundscape Index; aii) to finer-band spectral/temporal diversity and entropy (aiii). Their practical applications are limited by poor understanding of the relationships between the diversity of recorded biotic sound, the diversity of vocalising species, and wider community diversity (b) niche partitioning hypothesis that underpins acoustic indices has rarely been empirically tested, and the sensory, environmental and evolutionary processes that structure vocalising animal communities are poorly understood (Tobias et al., 2014).
It remains unclear if and how landscape-scale biotic sound diversity relates to either vocalising species diversity or wider community diversity, and how this relationship varies taxonomically, geographically, and between terrestrial and marine realms ( Figure 5b) (Gasc et al., 2013;Harris et al., 2016;Sueur et al., 2014). Despite this lack of clarity, tools for calculation of acoustic indices are increasingly accessible in bioacoustic software packages; similar to auto-ID softwares their outputs should be treated critically, with index values at a minimum groundtruthed against either expert-labelled audio subsets and/ or other forms of survey data (e.g., Harris et al., 2016;Sueur et al., 2008). If these practical and theoretical problems can be resolved, acoustic community analyses promise to be one of PAM's unique ecological applications, with potential to offer rich local biodiversity information to complement landscape data from satellite and aerial LIDAR sensing (Bush et al., 2017).
For now, leveraging these opportunities will likely require the use of acoustic indices or similar proxies. Ongoing work to improve these prospects could include systematic evaluation of the performance of indices across taxa and habitats (including tests in well-characterised, low-diversity communities), alongside fundamental research into the structure and evolution of acoustic communities (Farina & James, 2016).
Looking forward, newer machine learning methods may offer alternative means to tackle the problem of soundscape monitoring.
For instance, a recent study used CNNs to separate and quantify biotic and anthropogenic sound in urban audio, thereby explicitly bypassing the issue of background noise sensitivity (although their transferability to different cities or environments remains unknown) (Fairbrass et al., 2018). Another promising avenue involves unsuper-

| EMERG ING AND FUTURE OPP ORTUNITIE S FOR PA SS IVE ACOUS TI C S
Finally, we outline some major emerging opportunities, as PAM moves beyond proof-of-concept studies towards applications in management and conservation. Until recently, outcomes-driven acoustic monitoring projects have mostly occurred where PAM is either the only feasible approach, or provides clear advantages over other methods despite higher costs (i.e., bat and cetacean surveys, and field bioacoustics studies). However, low-cost sensors have pushed the bottlenecks into the analysis and management stages, and as we have emphasised, addressing these logistical and analytical barriers now increasingly requires collaborative, community-led efforts. Marine research remains a source of key innovations, including auto-ID software development (Baumgartner & Mussoline, 2011;Gillespie et al., 2009), acoustic sensor tags (Johnson & Tyack, 2003), density estimation methods (Marques et al., 2013), real-time reporting (Baumgartner et al., 2013; http://dcs.whoi.edu/), and collation of multisource datasets (Davis et al., 2017). Increased integration between marine and terrestrial PAM communities would be beneficial to jointly addressing pressing challenges, such as stand- Currently, we are seeing the arrival of massive acoustic datasets collected across research networks and citizen science programmes (Table 1). As auto-ID tools and wireless data transmission improve, the increasing scope of these datasets could facilitate, for example, the tracking of range shifts under climate change (Davis et al., 2017), long-term studies of population ecology and habitat use , year-on-year tracking of population trends (Jaramillo- Legorreta et al., 2016), conservation planning and efficacy assessment (Astaras et al., 2017;Border et al., 2017), behaviour and phenology studies in taxa beyond birds and cetaceans (Nedelec et al., 2015), as well as monitoring of species of concern as ecosystem services providers (e.g., pollinators), pests, invasive species or public health threats (Mukundarajan, Hol, Castillo, Newby, & Prakash, 2017).
Looking further forward, emerging networked sensors and on-board analysis pipelines raise the possibility of using PAMderived data for real-time monitoring and adaptive management (Table 1). Detections derived from sensor networks can provide highly spatially and temporally detailed data on wildlife activity (e.g., London's Nature-Smart Cities bat monitoring network: https://naturesmartcities.com). Real-time data feeds could, for instance, be applied to adjust urban lighting regimes to reduce impacts on bat activity, mitigate human-wildlife conflict, adaptively reroute shipping traffic to avoid threatened cetacean populations (Davis et al., 2017;Van Parijs et al., 2009), or report on illegal logging or hunting (Astaras et al., 2017, Rainforest Connection https://rfcx.org). Beyond the institutional and political barriers, developing such an infrastructure would still face substantial technical difficulties, especially since the ultimate goal of developing comprehensive suites of robust auto-ID tools is likely many years or even decades away. Nonetheless, these possibilities represent exciting futures for a technology that, alongside other sensing technologies, is providing increasingly sensitive insights into the effects of human pressures on wildlife and ecosystems. gratefully thank the respondents of our 2016 WWF-UK online survey on best practices in PAM (Browning et al., 2017).

D I SCLOS U R E
The authors declare no conflicts of interest.

AUTH O R S' CO NTR I B UTI O N S
All authors conceived the study and were involved in the development and writing of the manuscript. R.G. and E.B. conducted the literature review and user survey, and planned and wrote the initial manuscript.

DATA ACCE SS I B I LIT Y
Our manuscript does not contain any data.