Less is more: On-board lossy compression of accelerometer data increases biologging capacity
Abstract
- GPS-tracking devices have been used in combination with a wide range of additional sensors to study animal behaviour, physiology and interaction with their environment. Tri-axial accelerometers allow researchers to remotely infer the behaviour of individuals, at all places and times. Collection of accelerometer data is relatively cheap in terms of energy usage, but the amount of raw data collected generally requires much storage space and is particularly demanding in terms of energy needed for data transmission.
- Here, we propose compressing the raw accelerometer (ACC) data into summary statistics within the tracking device (before transmission) to reduce data size, as a means to overcome limitations in storage and energy capacity.
- We explored this type of lossy data compression in the accelerometer data of tagged Bewick's swans Cygnus columbianus bewickii collected in spring 2017. Using software settings in which bouts of 2 s of both raw ACC data and summary statistics were collected in parallel but with different bout intervals to keep total data size comparable, we created the opportunity for a direct comparison of time budgets derived by the two data collection methods.
- We found that the data compression in our case yielded a six times reduction in data size per bout, and concurrent, similar decreases in storage and energy use of the device. We show that with the same accuracy of the behavioural classification, the freed memory and energy of the device can be used to increase the monitoring effort, resulting in a more detailed representation of the individuals’ time budget. Rare and/or short behaviours, such as daily roost flights, were picked up significantly more when collecting summary statistics instead of raw ACC data (but note differences in sampling rate). Such level of detail can be of essential importance, for instance to make a reliable estimate of the energy budgets of individuals.
- In conclusion, we argue that this type of lossy data compression can be a well-considered choice in study situations where limitations in energy and storage space of the device pose a problem. Ultimately, these developments can allow for long-term and nearly continuous remote monitoring of the behaviour of free-ranging animals.
1 INTRODUCTION
The use of biologging has increased enormously in ecology and allows for remote observation of wild animals (Cooke et al., 2004; Wilmers et al., 2015). GPS-tracking devices have been used in combination with heart rate measurements (Duriez et al., 2014; Wascher, Kotrschal, & Arnold, 2018), temperature sensors (Ryan, Petersen, Peters, & Grémillet, 2004; Sala, Pisoni, & Quintana, 2017), magnetometers (Laplanche, Marques, & Thomas, 2015; Noda, Kawabata, Arai, Mitamura, & Watanabe, 2014), accelerometers (Brown, Kays, Wikelski, Wilson, & Klimley, 2013; Nathan et al., 2012) and even cameras (Patel, Stocks, Fisher, Nicolls, & Boje, 2017; Watanabe, Mitani, Sato, Cameron, & Naito, 2003) to learn about animal behaviour and the interaction of individuals with their environment. The remote tracking of individual animals has solved many questions that were previously beyond reach (e.g. Mansfield, Wyneken, Porter, & Luo, 2014; Williams et al., 2014) and the observations, objective and undisturbed by the observer, are valuable for both fundamental (Watanabe, Ito, & Takahashi, 2014) and applied ecological research (Wilson, Wikelski, Wilson, & Cooke, 2015). Technological developments have made the devices increasingly smaller (Kays, Crofoot, Jetz, & Wikelski, 2015) so that nowadays almost any mammal, bird or reptile species, and even amphibians and invertebrates can be remotely observed to answer research questions about their biology (Cagnacci, Boitani, Powell, & Boyce, 2010; Kissling, Pattemore, & Hagen, 2014). Although this development has also reduced the effects of a tracking device on the survival and behaviour of the animal, this can never be completely excluded and should be monitored closely (Lameris et al., 2018). Practical limitations regarding battery weight (and thus device weight) were reduced by the development and usage of solar energy to recharge the battery while attached to the animal (Bouten, Baaij, Shamoun-Baranes, & Camphuysen, 2013; Tomkiewicz, Fuller, Kie, & Bates, 2010). This reliable and predictable power source elongated deployment time of devices in many environments apart from, for example, the marine domain (Adoram-Kershner et al., 2017), under dense canopy cover (Kays et al., 2011), or in winter at high latitudes (Therrien, Gauthier, & Bêty, 2012). Moreover, the use of remote download techniques such as Bluetooth, radio- and GSM networks made re-catching of the individual redundant, allowing for increased data yield per device deployment (Bouten et al., 2013; Tomkiewicz et al., 2010) and allowing more species to be tracked (e.g. those that die during deployment or that do not return to accessible places for tag retrieval). With these practical limitations being addressed, the road is paved for longer deployment time and high(er) frequency measurements to answer more detailed research questions about individual animal behaviour (Allan et al., 2018; Wilmers et al., 2015). For example, due to long deployment it was shown that migratory performance of Black Kite Milvus migrans increases with age through a combination of individual improvement and selective mortality of poor performers (Sergio et al., 2014). And thanks to frequent measurements the extraordinary locomotor dynamics of hunting cheetahs Acinonyx jubatus were described (Wilson et al., 2013).
The high frequency required for answering detailed research questions comes at the cost of storage space and energy use for data collection and transmission. The use of multiple sensors or intensive use of a single sensor then becomes a trade-off: if additional sensor data are collected, fewer fixes can be stored on the memory of the tracking device (Bouten et al., 2013; Wilson et al., 2015). For devices that need to be retrieved to get the data, often storage space can be limiting so that the research is restricted in either deployment time or frequency of sampling, for example many seabird studies use tags in which individuals are followed for only several days (Dean et al., 2012; Shaffer et al., 2017). Remote data download, on the other hand, mainly puts pressure on the energy balance of the device since making connection with the download system and data transmission requires a considerable amount of energy. In this case, the speed of the network and energy available for uploading become limiting with high-frequency data collection.
With respect to the limitation in data transmission, there are, broadly speaking, two kinds of solutions: (a) increasing the capacity of the network and (b) decreasing the amount of data that need to be transferred by clever data compression. The first solution is aimed at the bandwidth of a certain system, that is, the amount of data that can be transmitted per time interval through the network. Improvements of this kind have indeed been implemented, for example in the Global System for Mobile communications (GSM). Technological developments have advanced the communication via this network from analogue radio signals (1G) to digital radio signals (2G) and then step-wise increased the bandwidth enormously (3G, 4G and consecutively 5G networks) so that it can now support global telecommunication (Tondare, Panchal, & Kushnure, 2014). Although great profit can be achieved from this type of advancement, changing the bandwidth of the network used for biologging devices is often beyond the researcher's control. In case of the GSM network, for example, it depends on the availability of the network at the location of the animal. The second solution, on the other hand, is within control, and already widespread in many aspects of digital modern life. Files that are too big to send as an attachment are often compressed and then extracted (e.g. in the ZIP file format), and for images, size can be reduced by storing it as a Portable Network Graphic (PNG). These are both common examples of ‘lossless’ data compression techniques, referring to the fact that no information is lost by the data compression.
An alternative is the so-called ‘lossy’ data compression, when some data are lost. A popular lossy compression method for images is JPEG, where the image visualization is stored as block-wise quantized discrete cosine transform coefficients (Fridrich, Goljan, & Du, 2015). This reduces the quality of the image, but the content is mostly still clear enough for its purpose. Losing information may sound unwanted, but often there is quite some redundant information in a large data file that can be lost without compromising the output. For example in a video, background features shot by the same camera often do not change for the duration of a scene. These less complex ‘chunks’ of the video, in terms of motion and detail, can be encoded separately and with a lower bitrate, thus reducing the data size of that part. This ‘chunk-based’ encoding allows for high-quality video streaming even in low-bandwidth internet connections (De Cock, Li, Manohara, & Aaron, 2016; Norkin, Cock, Mavlankar, & Aaron, 2016).
Similar solutions of lossy data compression could be advantageous in biologging. In the challenging marine environment of the Antarctic, in terms of tag retrieval and data transmission, such solutions have already been applied to study behaviour in seals. To be able to collect data on prey catch dives in these animals, an abstract from peaks in acceleration indicative of rapid head movements was calculated on-board the data logger (Cox et al., 2018; Heerah, Cox, Blevin, Guinet, & Charrassin, 2019). Also in a less challenging environment, compressing acceleration data can be advantageous to overcome storage and bandwidth limitations. Liechti et al. (2018) recently showed that only storing a summary of acceleration data in the z-axis enabled the collection of data on the full migratory journey of small trans-Saharan migrants, something that was not possible before. One example of lossy data collection is a conditional sampling regime, where the frequency of sampling is not continuously the same. The exact frequency of sampling can then, for example, be determined by the researcher (Bouten et al., 2013), based on the energy level of the device (Dokter et al., 2018; see Appendix A), or the inferred behaviour of the focal animal (e.g. flight detection, based on GPS-ground speed [Harel, Horvitz, & Nathan, 2016] or the overall activity level [Brown et al., 2012]). Although this can reduce the data size over the study period, it is still a compromise as continuous and high frequency long-term sampling is not achieved, and one has to choose beforehand which time periods or behaviours will be monitored with high frequency and which are of less interest (and thus ‘lost’).
A biologging sensor that may be particularly suitable for data compression is the accelerometer (ACC) as has recently been suggested in the technological literature (le Roux, Wolhuter, Stevens, & Niesler, 2018). Tri-axial accelerometer sensors are becoming an increasingly common addition to GPS-tracking devices. Tri-axial accelerometers measure the rate of change in directional speed along three orthogonal axes, traditionally called x or ‘surge’, y or ‘sway’ and z or ‘heave’ (Yoda et al., 2001). The first reported use of accelerometer data in ecology was in Adélie penguins Pygoscelis adeliae, where ACC data enabled the researchers to distinguish seven types of behaviour (Yoda et al., 1999). ACC data have been measured in two ways, either continuously for short deployments of several days (Chimienti et al., 2016; Wilson, Shepard, & Liebsch, 2008) or for longer deployments up to several years, in short bouts (Flack, Nagy, Fiedler, Couzin, & Wikelski, 2018; Yoda et al., 1999). Collection of ACC data is relatively cheap in terms of energy usage; however, the storage of the data requires a lot of space and the data are particularly demanding in terms of energy needed for data transmission (Wilson et al., 2008). For example, if the ACC sensor collects tri-axial data at a resolution of 1 byte with a duration (referred to as bout length in the remainder of this study; for a full explanation of terms see Figure 1) of 2 s, and a signal frequency of 20 Hz, it means that 120 bytes are stored in the device (2 s × 20 Hz × 3 axes) per bout. For species that live in remote areas and are therefore not easy to reach or observe, and that one would like to follow long term (preferably year-round, if not multiple years, Wikelski et al., 2007), this amount of data can altogether easily become problematic and compromise either deployment time or the number of measurements taken.
In ecology, there is reluctance towards the idea of lossy data compression because of the loss of raw data and potentially important information in the process. Here we propose and test a method for lossy data compression by reducing the raw ACC data to summary statistics per ACC bout and discuss its advantages and disadvantages. This type of data compression reduces the amount of data that need to be stored, and thus the amount of bytes that need to be transmitted by the device. Using this type of data compression, the monitoring coverage of data collection (either by reducing the bout interval between measurements, increasing the frequency or increasing the bout length) can be greatly improved, by enabling higher frequency monitoring or longer tracking periods.
This study presents a methodological approach to compress the raw accelerometer data within the device to summary statistics and simultaneously decrease the bout interval between sampling bouts. We calculated and compared time budgets of free-ranging Bewick's swans Cygnus columbianus bewickii derived from both raw and summary statistic ACC data collected in parallel.
2 MATERIALS AND METHODS
2.1 Study species
The Bewick's swan is a long-distance migratory bird, which in the western part of its range, winters in North-Western Europe and breeds at the European Russian tundra (Rees, 2006). The migration route and breeding area of this population is well known due to extensive tracking efforts with PTT transmitters and GPS loggers in the past (Beekman, Nolet, & Klaassen, 2002; Nuijten et al., 2014). In the summer of 2016 and 2017, observations were carried out in three zoos of captive Bewick's swans equipped with GPS/GSM-tracking devices to ground-truth the accelerometer data and build a behavioural classification model (R.J.M. Nuijten, E.F. Prins, J. Lammers, C. Mager, & B.A. Nolet, unpubl. data). In the winter of 2016–2017, 30 free-ranging Bewick's swans were equipped with these tracking devices in the province of Noord-Brabant (The Netherlands). Tracking data from spring 2017 (1 February–31 May) of 10 individuals in which both raw ACC and ACC summary statistics were collected at high rate was used to apply the behavioural classification model and create individual time budgets (this study).
2.2 Device and settings
We used custom designed, 3D-printed GPS/GSM neck-collars with a weight of 70 g, an inner diameter of 51 mm and a height of 80 mm. The weight of the collar (including the tag with sensors) represented 1.1% and 1.2% of the average weight of adult and non-adult Bewick's swans, respectively, based on a dataset of 295 Bewick's swans caught in the Netherlands between 2005 and 2017. During previous observations of captive Bewick's swans with such collars, the swans with the collars preened more at first but no effect on the behaviour of the swans was found after 4 weeks (Nuijten et al., 2014). The collar contained, apart from the GPS sensor, a tri-axial accelerometer and a water sensor, and sent its data remotely via the GSM network. The accelerometer collected data with a bout length of 2 s and a frequency of 20 Hz (Figure 1).
The accelerometer and water sensor sampling were programmed separately rather than simultaneously with the GPS fixes, to be able to maintain a fixed sampling scheme for the water sensor, while the GPS and accelerometer settings were made dependent on battery voltage of the device (see Appendix A for an overview of all settings). In the spring season, the period of which we used the data in this study, GPS fixes were collected every 15 min (Appendix A). Raw ACC data were stored also every 15 min, and ACC data summarized to summary statistics every 2 min (see Section 2.3). The water sensor recorded water (1) or no water (0) every second. The collar connected to the GSM network once a day to transmit the data. The settings could not be changed after deployment.
2.3 Raw ACC and summary statistics data collection
The accelerometer, as mentioned before, is a very demanding sensor in terms of energy needed for transmission of the data. Combining the frequency, the axes and bout length for the raw ACC data in this study, every bout adds up to 120 bytes per bout (20 Hz × 3 axes × 2 s). In our study (i.e. 1 bout every 15 min) this equals 480 bytes per hour and 11,520 bytes per day for the raw acceleration data only. That is excluding metadata such as date, time, individual ID, column labels, etc.
Compressing the amount of data already within the tracking device by reducing the raw ACC numbers to summary statistics (SS) such as average × or Overall Dynamic Body Acceleration (ODBA; see Appendix B for an overview of all SS used in this study) over the bout length reduces the amount of data that need to be stored and transmitted per bout. Here we used 20 summary statistics to compress the raw ACC data per ACC bout, which equals 20 bytes per bout (excluding metadata), a reduction of (120/20) six times when compared to a raw ACC bout. To keep the total amount of data approximately the same between the two data collection methods (for the purpose of comparison), we increased the number of ACC bouts per time unit for the SS method accordingly. We therefore programmed the ACC sensor of each collar to collect SS ACC every 2 min (excluding the time points when raw ACC was collected), to be able to compare two datasets collected with the same storage and energy capacity of the device.
2.4 Behavioural classification and statistical analysis
We used an ensemble learning decision tree method (random forest, Liaw & Wiener, 2002) to build a classification tree from the annotated acceleration data obtained in the zoo, complemented with flight data from free-ranging Bewick's swans as flapping flight is very easy to distinguish from other behaviours (Bishop et al., 2015; R.J.M. Nuijten, E.F. Prins, J. Lammers, C. Mager, & B.A. Nolet, unpubl. data; Shamoun-Baranes, Bouten, Loon, Meijer, & Camphuysen, 2016). When working with raw ACC data, it is a common practice to reduce this data to classifiers (i.e. summary statistics) before applying the classification model (Bom, Bouten, Piersma, Oosterbeek, & Gils, 2014; Shamoun-Baranes et al., 2012). We used 21 statistics (20 ACC summary statistics + the information from the water sensor) to classify the behaviours in this study (Appendix B). The same 20 summary statistics were calculated in the SS and raw ACC bouts, the sole difference between the datasets being the moment of calculation (i.e. before and after transmission respectively; cf. Figure 2a,b). Five-minute aggregates of the water sensor data (i.e. 300 s) were aligned to the ACC data based on the satellite timestamps of both measurements. If for ≥30 s within this 5-min aggregate water was recorded, the overlapping bouts were assigned a ‘1’, otherwise a ‘0’.
The behavioural classification for both the raw ACC data and the SS data from the free-ranging individuals was performed with the same classification tree which had an overall classification accuracy of 91% (recall: 0.89; precision: 0.92) and included the behaviours sleeping, resting, terrestrial active (combination of terrestrial foraging and preening behaviour), swimming, aquatic foraging and flying (R.J.M. Nuijten, E.F. Prins, J. Lammers, C. Mager, & B.A. Nolet, unpubl. data). The classified data were used to visualize daily time budgets for free-ranging individual swans in spring, once for the raw ACC and once for the SS dataset over the same time period. Additionally, proportions of each behaviour per day were calculated for both the raw ACC and the SS data. Sample sizes for the daily proportions were maximally 96 per day for the raw ACC data (one ACC bout every 15 min) and 672 per day for the SS data (one ACC bout every 2 min, excluding the time points when raw ACC data were collected).
To assess whether the different datasets yielded different time budgets, we calculated the mean difference between raw ACC and SS-based daily proportions per behaviour, and calculated the probability of this observed mean difference originating by chance using a non-parametric permutation test. We did this by randomizing of the sign of the difference between raw ACC-based proportions and SS-based proportions per day, and taking the mean of these differences. By repeating this 10,000 times, we created a distribution of randomized mean differences between raw ACC and SS proportions against which the observed mean difference was tested.
3 RESULTS
On-board calculation of summary statistics greatly reduced the amount of data per bout to be transmitted by the biologging devices. Concerning the accelerometer data only, we reduced our data size per 2 s bout six times from 120 bytes (2 s × 20 Hz × 3 axes) to 20 bytes, by storing 20 summary statistics on-board the biologgers rather than the raw tri-axial accelerometer data. Including metadata such as individual ID and timestamp, we realized a 4.7× reduction in the amount of data per bout (127 vs. 27 bytes, respectively). This resulted in a similar decrease in energy needed for transmission of the data. Transmission of the raw ACC data over the network took approximately 5 min and 2,639 µWh for all data of 1 day. Transmitting the SS data took roughly 1 min and required 528 µWh from the collar. The extra energy needed for the calculation of the SS within the device was only 0.239 µWh by which 672 SS bouts of 27 bytes were created (i.e. 1 day worth of SS data). So by ‘paying’ 0.239 µWh as a cost for calculation, and with similar circumstances in terms of bandwidth and connection with the network for both data collection methods, a 5× reduction (=2639/(528 + 0.239)) in energy use for transmission was realized.
Both the raw ACC and SS data were used to create time budgets for each individual (see Figure 3 for an example). Within individuals, the difference in bout interval between the two methods is clearly visible in the time budget graphs (Figure 3). This difference in bout interval resulted in some biologically relevant behaviours to be better represented by the SS compared to the raw ACC-based data. For instance, roost flights, a twice-daily behaviour of relatively short duration that Bewick's swans perform to travel between sleeping and foraging areas in the morning and evening, was detected on significantly more occurrences (paired t test: N = 14 days; t = 4.8963; p = .001) in the SS data than the raw ACC data at the end of the winter season (i.e. the first 14 days of our study period; SS: 20.11 ± 2.1 and raw ACC: 10.6 ± 2.2 days [M ± SE]) when the swans are known to perform this behaviour.
We found a significant difference between the raw ACC and SS-based average daily proportions for all behaviours over the study period (p ≪ .0001). All permutation tests had a sample size of 1,200 (120 days × 10 individuals). The proportion of flying (observed mean difference −0.007), standing (−0.011), terrestrial active (−0.056) and aquatic foraging (−0.001) was higher when based on SS data when compared to raw ACC data, while the proportion for swimming (observed mean difference 0.068) and sleeping (0.007) was lower (Figure 4).
4 DISCUSSION
We explored the use of lossy data compression in biologging devices as a solution to overcome limitations in energy capacity of the device, specifically with regard to the accelerometer sensor. Using ACC data collected in free-ranging Bewick's swans as an example we show that lossy data compression reduces the size of the ACC data that need to be stored and transmitted by the tracking device without loss of biological information. The exact reduction factor depends on the settings of the accelerometer (bout length, frequency, amplitude and resolution; Figure 1) and the number of SS stored (Appendix D). The freed capacity of the device using the SS data collection method instead of raw ACC can be used to decrease bout interval (as was done in this study), or increase the frequency or resolution of the ACC measurements during the setup of the study which will lead to an increased level of detail in the output data (see e.g. Bom et al., 2014; Broell et al., 2013). Alternatively, the freed capacity can be used to increase the frequency or resolution of another sensor, to elongate the deployment time of the device or by including other (data-rich) sensors such as a heart rate sensor or sensors that measure features from the environment. These latter scenarios were not considered in this study, but can have huge advantages in studies where ACC data transmission is currently limiting.
Both the raw ACC and SS data were classified with the same behavioural model that was built based on the zoo observations (R.J.M. Nuijten, E.F. Prins, J. Lammers, C. Mager, & B.A. Nolet, unpubl. data). In such a supervised classification model, raw ACC data are commonly reduced to summary statistics before classification can be done (Bom et al., 2014; Shamoun-Baranes et al., 2012), so our method does not differ from classical ACC analyses in that respect. In the classification, we used the same summary statistics as were calculated for the SS bouts within the device as these represent such a broad range of statistics that all behaviours should be represented by one or a combination of several of them. This was confirmed by the high performance of the model (91% correct classification overall). Also, our final behavioural classification model only used four out of the 21 statistics that were collected (ODBA, maximum z-value, mean z-value and the water sensor; R.J.M. Nuijten, E.F. Prins, J. Lammers, C. Mager, & B.A. Nolet, unpubl. data), so the selection of summary statistics before deployment of the devices could have been more restrictive, resulting in a more than 10-fold reduction in data size per bout. This shows that the use of SS in accelerometer data collection can even increase the biologging capacity of this sensor more than we demonstrate in this study.
Application of the model on both SS and raw ACC datasets yielded the classified datasets that were used to create the time budgets for the individual swans for spring 2017. Although it is generally assumed that a discontinuous but structured ACC sampling can be used validly as a proxy for continuous measurement of behaviour (Brown et al., 2013), we found a small but statistically significant difference for all behaviours when testing for differences between raw ACC and SS-based daily proportions of behaviour. Although both methods collect the same type of data every bout (20 Hz ACC data of 2 s duration), there are two differences that could have caused the differences that we found. First, the SS method takes more samples of ACC data in the same time interval (raw 1: SS 7 bouts in 15 min; Figure 1). This leads to a higher monitoring coverage in the SS method. Second, due to this higher monitoring coverage, the SS bouts are taken at different time points than the raw ACC bouts. The differences in the proportions are not unidirectional (i.e. that SS is always higher or always lower than raw ACC), and cannot be, because the behaviours are proportional and thus not independent from each other (Appendix E). For example, when a swan increases the time spent foraging, there is less time for other activities (e.g. sleeping). This is a property of proportional data, as all proportions together must sum to 1. We found that especially both foraging behaviours were negatively correlated (so when more time was spent on aquatic foraging, less time was spent on terrestrial foraging [classified as terrestrial active in this study]; Pearson correlation coefficient −0.58; Appendix E). Due to the higher monitoring coverage of the SS bouts (i.e. more samples to represent the continuum of an animal's behaviour), we believe that the proportions and time budgets calculated based on these data give a better representation of the real behaviour of the swans than the proportions and time budgets based on the raw ACC. And although significantly different, the actual differences between the two datasets is so small that it can be questioned whether this implies a biologically relevant difference.
The added value of the SS ACC collection method, through a decreased bout interval in our case, is especially visible in rare behaviours or short duration behaviours, since a sensor with a longer bout interval is more likely to miss these behaviours. Five of the behaviours tested here are not considered rare nor of short duration (aquatic foraging, terrestrial active, swimming, standing resting and sleeping). Flight, however, might be considered rare, especially in non-migratory seasons, when flight is mainly used to get to and from the roost site (i.e. roost flights), a behaviour that tends to last less than 10 min (Nolet, Bevan, Klaassen, Langevoord, & Heijden, 2002). We indeed found a significant difference between the two methods in the number of days that these roost flights were detected. For such an important behaviour in terms of energy expenditure (Nolet et al., 2002), even small differences in duration can have important consequences. Because flight is a biologically relevant and expensive behaviour in terms of energy use, accurate estimation of its occurrence and duration is valuable. For detailed questions with potential management implications, an underestimation of flight behaviour can have important consequences. For example, geese that are ‘scared’ five times a day as part of a damage control management, fly more and need to compensate for this extra energy expenditure by eating 12%–16% more grass (Nolet, Kölzsch, Elderenbosch, & Noordwijk, 2016). This compensational feeding could cause more damage to agricultural fields while the scaring was actually meant to decrease the damage (Nolet et al., 2016). To obtain accurate model input for such predictions and link them to the feeding and reproductive ecology of the species, it is important to be able to estimate the time spent on each behaviour as precise as possible.
Reductions of ACC data size, such as using summary statistics as we show here, can be advantageous for future biologging studies. For example in a study of migratory dark-bellied brent geese Branta b. bernicla, the short bouts of raw ACC data that was collected within the limits of collar storage and data transmission only allowed for a very rough behavioural classification into the categories ‘active’ and ‘inactive’ (Dokter et al., 2018). Although this yielded interesting results in combination with the GPS data of the same tags, more ACC measurements could have increased the understanding of the behavioural patterns of these geese in their fuelling and migration periods.
Despite the clear advantage of a decrease in data size and the accompanying possibility to elongate the deployment time or reduce the interval of measurements to obtain a more detailed dataset, the method described here might not be suitable for all study systems. Proper use of summary statistics requires a thorough understanding of the study system and a priori annotation of the behaviour so that the summary statistics can be chosen wisely. Only then will these predictors be useful in classifying the behaviour of interest after collection of the data. When no prior knowledge on behavioural patterns is present, or the behaviour of interest is difficult to capture with commonly used summary statistics or might differ significantly among individuals, it is recommended to collect raw ACC data.
If the data compression is used to increase the monitoring coverage (this study), the level of detail obtained using SS opens up the opportunity to study specific research questions that are out of reach with the data yield from raw ACC, such as the example of the roost flights in this study. Using lossy data compression as a means to elongate deployment time, one could answer a whole different set of questions by potentially tracking individuals for several years and compare their time budgets or (migratory) performance (see Harel et al., 2016; Sergio et al., 2014) across seasons or developmental or life-history stages. A higher monitoring coverage using SS not only means a more accurate representation of the time budgets but also allows for a more in-depth study of causal factors and drivers of change. However, the data on these (ecological) drivers then also need to be very fine-scale which is often not available (Wilmers et al., 2015). A solution is to use the animals themselves to collect valuable data on their environment by including extra sensors in the tracking devices (Kays et al., 2015). This is already successfully done in some marine animals (Evans, Lea, & Patterson, 2013; Fedak, 2004; Sala et al., 2017). For example, elephant seals Mirounga leonina equipped with oceanographic sensors collected data on ocean structure and salinity that enabled researchers to map the ice front south of 60°S and calculate the sea ice formation rate from upper ocean salinity levels on rarely observed sites (Charrassin et al., 2008). Collection of environmental data by animal-borne sensors is providing very time- and space-specific information that can be dependent on preferences of the animal, but at the same time this gives a very accurate look inside the lives of these animals and the conditions they encounter. The collection of environmental variables by tracking devices is facilitated using SS to store the data from the ACC sensor, since the freed storage space and bandwidth can be used for this purpose.
The field of biotelemetry is continuously developing. Just as computational developments for the processing of large amounts of biologging data produced by sensors like the accelerometer (see e.g. Wilson et al., 2008), the methodology in this study can be seen as a part of this development. Especially in a well-studied system, the behaviours of importance are generally known and these can be reliably classified using familiar summary statistics. A next step is to use all known information to not only summarize but also to classify behaviour on-board already (Figure 2c). This might not be possible for all behaviours, but for some very common or easily recognizable behaviours such as sleeping or flying in this study it is feasible. The biologging device could be programmed in such a way that it would attempt to recognize the behaviour performed through time-series classification of raw sensor output (see e.g. Wilson et al., 2018). If it does recognize the behaviour, it can suffice with storing and transmitting a single number or letter for that bout, indicating the specific behaviour. The device could even be programmed in such a way that settings (bout interval and bout duration for example) are dependent on which behaviour is performed (see Harel et al., 2016 for an example of flight detection). If the algorithm does not recognize the behaviour, either the SS or the raw ACC data can be stored and (later) sent to the researcher (combination between Figure 2b,c). Often the behaviours that can be classified with very high accuracy together make up a large part of the daily time budget, so this can potentially yield large reductions in data size. With such a ‘smart’ sampling schedule, prior knowledge about the species is used optimally and the storage space and available bandwidth are used for collecting new information about the study species and behaviours of interest. This makes the proposed lossy data collection method a very lucrative way of reducing data size. Because the behaviours, classification and summary statistics will vary greatly per species and research question, a close collaboration with system developers is necessary to make the proposed progress in remote animal observation. These developments can pave the way for continuous remote monitoring of animal behaviour in the future.
ACKNOWLEDGEMENTS
We thank all persons who have contributed to the work in the field or during data analysis: Gerard Müskens, Youri van der Horst, Erik Kleyheeg, Peter de Vries, Thomas Lameris, Anna Hermsen, Jan Vegelin, Fred Cottaar, Sibrand Rinzema, Stefan Vriend, Anne-Lieke Knaven and Nina Thierij. We acknowledge the Royal Burgers' Zoo in Arnhem, Avifauna in Alphen aan de Rijn and GaiaZoo in Kerkrade for their permission to test the GPS/GSM collars and observe the captive Bewick's swans in their institutions. We thank anonymous referees and the associate editor for their constructive comments to earlier versions of this manuscript. The catching and tagging of the wild Bewick's swans was carried out under licences 2016518 of the Centrale Commissie Dierproeven and FF/75A/2016/044 of the Flora-en faunawet. R.J.M.N. was supported by NWO-NPP grant 866.15.206.
AUTHORS’ CONTRIBUTIONS
R.J.M.N. and B.A.N. conceived the ideas and designed the methodology, technically supported by T.G. T.G. wrote the software installed in the tracking devices and provided the calculations on data reduction. R.J.M.N. analysed the data, advised by J.S.-B. and B.A.N. R.J.M.N. led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.
Open Research
DATA AVAILABILITY STATEMENT
Data available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.6djh9w0x9 (Nuijten, Gerrits, Shamoun-Baranes, & Nolet, 2019). GPS data archived on Movebank Data Repository: https://doi.org/10.5441/001/1.8ms7mm80 (Nuijten, Gerrits, de Vries, Müskens, & Nolet, 2020).