Combining Unity with machine vision to create low latency, flexible and simple virtual realities
Yuri Ogawa, Raymond Aoukar, Richard Leibbrandt and Jake S Manger contributed equally to the work.
Abstract
- In recent years, virtual reality arenas have become increasingly popular for quantifying visual behaviours. By using the actions of a constrained animal to control the visual scenery, the animal perceives that it is moving through a virtual world. Importantly, as the animal is constrained in space, behavioural quantification is facilitated. Furthermore, using computer-generated visual scenery allows for identification of visual triggers of behaviour.
- We created a novel virtual reality arena combining machine vision with the gaming engine Unity. For tethered flight, we enhanced an existing multi-modal virtual reality arena, MultiMoVR, but tracked wing movements using DeepLabCut-live (DLC-live). For tethered walking animals, we used FicTrac to track the motion of a trackball. In both cases, real-time tracking was interfaced with Unity to control the location and rotation of the tethered animal's avatar in the virtual world. We developed a user-friendly Unity Editor interface, CAVE, to simplify experimental design and data storage without the need for coding.
- We show that both the DLC-live-Unity and the FicTrac-Unity configurations close the feedback loop effectively and quickly. We show that closed-loop feedback reduces behavioural artefacts exhibited by walking crabs in open-loop scenarios, and that flying Eristalis tenax hoverflies navigate towards virtual flowers in closed loop. We show examples of how the CAVE interface can enable experimental sequencing control including use of avatar proximity to virtual objects of interest.
- Our results show that combining Unity with machine vision tools provides an easy and flexible virtual reality environment that can be readily adjusted to new experiments and species. This can be implemented programmatically in Unity, or by using our new tool CAVE, which allows users to design new experiments without additional programming. We provide resources for replicating experiments and our interface CAVE via GitHub, together with user manuals and instruction videos, for sharing with the wider scientific community.
1 INTRODUCTION
Many animals use visual information to control fast and dynamic behaviours. Such behaviours include course stabilization and obstacle avoidance during navigation, homing, conspecific interactions, and predator avoidance. These behaviours are often studied in the laboratory, as this provides better control of the parameter space. However, a drawback of such experiments is that they are often so constrained that the ecological relevance comes into question (Gomez-Marin & Ghazanfar, 2019). Alternatively, visual behaviours can be studied under completely natural conditions in the field. While this produces unconstrained, naturalistic behaviours, the lack of experimental control makes mechanistic analyses difficult.
Due to recent technical advances, virtual reality (VR) arenas have become more popular for behavioural quantification. They are more likely to produce more naturalistic visual behaviours, while still being constrained enough to provide highly detailed behavioural quantifications. In VR, the actions of a constrained animal lead to changes of the displayed scenery, thus providing the perception of moving within a simulated world (Dombeck & Reiser, 2012). As the animal is tethered, and thus not physically moving through space, monitoring the animal and quantifying its responses is greatly simplified. In addition, as the visual world is computer-generated, it is easy to manipulate, and thus allows systematic control over the parameter space, facilitating identification of visual triggers of behaviour.
A popular way to design insect VR is to use closed loop tethered flight arenas (Schuster et al., 2002). Tethered flight originally used a torque meter to quantify yaw turns (Heisenberg & Wolf, 1979), which proved useful for studying for example memory and pattern recognition (Ernst & Heisenberg, 1999). Current tethered flight arenas often use the difference between the left and the right wing's peak downstroke angle, or wing beat amplitude (WBA, for terminology and abbreviations, refer to Table 1), which is strongly correlated with yaw torque (Götz, 1968; Tammero et al., 2004), to control yaw steering motion (see e.g. Maimon et al., 2010). Alternative approaches include tracking the abdomen of moths with optical sensors, as the abdominal movements follow the forewing asymmetry (Gray et al., 2002).
Abbreviations and definitions used in text | |
---|---|
Experimental/tethered animal | The real-world animal |
The tethered animal's avatar | The GameObject being moved in the virtual world |
Virtual crab/bird | Objects in the virtual world |
DLC-live | DeepLabCut-live |
OOI | Object of interest |
Pre-stimulus | A defined time window with open loop stimuli before a closed loop experiment begins |
Post-stimulus | A defined time window with open loop stimuli after a closed loop experiment begins |
Sequence | Several subsequent trials |
Empty scene | A panorama without any discrete objects (Figure 7a without objects) |
Trial | A defined time window in closed loop |
UDP | User datagram protocol |
VR | Virtual reality |
WBA | Wing beat amplitude |
However, these approaches are often focused on yaw rotational motion, and do not appropriately capture translational motion. This is important, as flying insects perform both rotational and translational behaviours (Geurten et al., 2012), such as forward motion and sideslip. Several newer VR arenas have been developed for more freely moving animals, and thus include translational motion. These include, for example, TrackFly (Fry et al., 2009) and FreemoVR (Stowers et al., 2017), validated in zebrafish, flies, and mice, where the visual surround is updated based on the animal's current position (see also Cruz et al., 2021; Frasnelli et al., 2018; Pokusaeva et al., 2023). When insects perform translational behaviours, this generates 3D cues. Such 3D motion cues are provided in FreemoVR (Stowers et al., 2017) as well as in an earlier VR developed for head-fixed walking Drosophila (Haberkern et al., 2019). In this context, the Antarium is an interesting VR framework, where a tethered ant's walking on an air-supported ball controls both rotational yaw and translational motion through a Unity generated virtual environment (Kócsi et al., 2020). Analogously, a recently developed multi-sensory tethered flight arena, MultiMoVR, allows both yaw rotations and forward translations through the virtual environment (Kaushik et al., 2020). Unity (Unity Technologies) is relatively easy to learn, and has for example been used for creating immersive learning environments (Needle et al., 2022) and for studying peripheral visual field loss in humans, as it provides in-built perspective correction (Doyon & Jung, 2023), including shadows and other 3D cues.
For VR to provide an immersive experience it is important to close the loop with minimal delays (for human examples, see e.g. Brunnström et al., 2020; Caserman et al., 2019). Interestingly, however, Drosophila walking towards high contrast objects in virtual reality can tolerate delays up to 2 s (Schuster et al., 2002), but this has not been quantified for other insect behaviours. Importantly, there is a lower limit to the delays when using conventional cameras and visual displays, as these have in-built latencies that are difficult to access (Stowers et al., 2014). In addition, when using tethered flight WBA, at least one full wing stroke needs to be captured by each video frame, leading to a minimal delay of one wing stroke period (Götz, 1968; Maimon et al., 2010). When measuring walking behaviour on a trackball, the ball's motion can theoretically be recorded at higher temporal frequency. In practice, however, the trackball is often filmed below 200 Hz (see e.g. Bagheri et al., 2022; Dahmen et al., 2017; Loesche & Reiser, 2021; Longden & Krapp, 2009).
One of our development aims was to combine several robust and easily accessed software packages to produce a flexible VR with short latencies. While MultiMoVR (Kaushik et al., 2020) was developed to reduce costs and uses the open source Panda3D game engine (Goslin & Mine, 2004), it uses Kinefly (Maimon et al., 2010) for tracking wing movements, which is sensitive to camera settings and light fluctuations. Recent advances in machine learning, and especially the development of DeepLabCut (DLC; Mathis et al., 2018), provides an alternative, efficient and robust method for tracking the WBA (Salem et al., 2022). Indeed, DLC has revolutionized behavioural neuroscience, as it allows for markerless pose estimation after training on relatively small data sets. FicTrac, which is often used for trackball experiments (e.g. Fenk et al., 2022; Loesche & Reiser, 2021; Turner et al., 2022), uses open-source software for reconstructing the fictive path of an animal walking on a patterned sphere filmed with a USB camera (Moore et al., 2014). Note, however, that there are many alternatives to FicTrac (e.g. Longden & Krapp, 2009).
We here show that the loop can be effectively closed using the Unity game engine together with machine vision (see also Müller et al., 2023). We used DLC-live (Kane et al., 2020) for tethered flight in hoverflies, with delays around 50 ms and FicTrac (Moore et al., 2014) for trackball experiments in crabs, with even shorter delays. We validated the DLC-live-Unity connection by letting hoverflies navigate towards flowers, and the FicTrac-Unity integration by studying tethered crabs evading virtual crabs or birds. We also developed a Unity Editor interface, CAVE, that allows the user to define the gain between the WBA and the tethered animal avatar's yaw and thrust motion through the virtual world, and provides the ability to design experiments and save data with little or no code.
2 MATERIALS AND METHODS
2.1 Hardware and software
All hardware and software for the tethered flight (Figure 1a) and trackball arena (Figure 2a) can be found in Table 2. In the tethered flight arena (Figure 1a) each of the three monitors subtended 118° × 142° (width × height) of the visual field. In the trackball configuration (Figure 2a) each of the four monitors subtended 90° × 49° of the visual field.
Camera and lights | Tethered flight arena | Trackball arena |
---|---|---|
Camera | Sony PlayStation 3 Move Eye Camera (SLEH-00448, Sony, Tokyo, Japan) with removed IR filter | FLIR Grasshopper3 GS3-U3-23S4C |
Lens | 5–50 mm varifocal C-mount lens | Fujifilm Fujinon 1:1.4/3.8–13 mm |
XY plane DSLR macro slider | Neewer pro 4-way macro focus rail 10,033,981 (as in Kaushik et al., 2020) | |
Ball socket DSLR tripod head | Generic swivel head (as in Kaushik et al., 2020) | |
Lens holder | 3D printed (see Kaushik et al., 2020) | |
IR pass filter | R72 INFRARED, 49 mm, HOYA, Tokyo, Japan | — |
Infrared lights | Infrared T-1¾ LED 850 nm 6° SFH4550, Osram Opto Semiconductors GmbH, Regensburg, Germany | No external light. Light is provided by the 4 surrounding monitors |
USB lights | JANSJÖ LED USB lamp, IKEA, Sweden | — |
Black background | Shin Kokushoku Musou black (KOYO Orient Japan, Saitama, Japan) | — |
Hoverfly tether | 3D printed from material provided at Backyard Brains: https://backyardbrains.com/products/micromanipulator | — |
Computer hardware | Tethered flight arena | Trackball arena |
---|---|---|
Motherboard | Micro-Star Z490-A PRO | Hewlett-Packard EliteDesk 800 G1 SFF |
GPU | Nvidia GeForce RTX 3080 | Nvidia GeForce GTX 1050ti |
CPU | Intel I7-10700k | Intel i7-4770 |
Storage | Samsung 970 Evo Plus Nvme M.2 SSD 1 TB | SanDisk Extreme SSD 250GB |
RAM | 2 Corsair DDR4 3200Mhz 16GB | 4 Kingston DDR3 1600Mhz 4GB |
Monitors | Three 27″ LCD monitors, PG279Q ROG SWIFT, Asus, Taipei, Taiwan | Four 24″ U2412M LED monitors, Dell, Texas, United States of America |
Software (general) | Tethered flight arena | Trackball arena |
---|---|---|
Operating System | Ubuntu 20.04 LTS | Windows 10 Enterprise |
GPU Driver version | 520.61.05 | 27.21.14.5671 |
Unity Editor | 2021.3.21f1 | Executable built from 2020.3.36f1 |
DeepLabCut Anaconda environment | ||
DeepLabCut version | 2.3.3 | — |
Python version | 3.8 | — |
Tensorflow version | 2.12 | — |
CUDA | 11.8 | — |
cuDNN | 8.6 | — |
DeepLabCut-live Anaconda environment | ||
DeepLabCut-live version | 1 | — |
Python version | 3.7 | — |
Tensorflow version | 2.5.0-gpu | — |
2.2 Animals
For behavioural testing of the tethered flight configuration we used Eristalis tenax females, 14–24 days old, reared and housed as described previously (Nicholas et al., 2018). The hoverfly was tethered to a needle (BD Microlance 23G × 1 1/4″–0.6 × 30 mm Blue hypodermic needles) at a 32° angle using a bee's wax and resin mixture. To encourage flying we provided airflow manually. Once the hoverfly flew consistently, we attached it to a syringe (BD tuberculin syringe, 1 mL) placed it in the centre of the arena (Figure 1a,b) at 10 cm distance from the centre of each monitor. We kept encouraging flight using an empty scene, with a textured ground and faded sky, rotating at 0.33 Hz in open loop.
For behavioural testing of the trackball configuration, we used the fiddler crab Gelasimus dampieri collected from intertidal mudflats near Broome (−17.9° S, 122.2° E), Western Australia. Crabs were housed in an artificial mudflat at the University of Western Australia and exposed to a tidal cycle of seawater inundation. They had a 12-h light–dark cycle and their diet was supplemented with crab food. Crabs were treated according to UWA Animal Ethics Committee (AEC) approved methods (UWA AEC project number RA/3/100/1515). At experimental time, crabs were tethered to a sliding carbon-fibre rod by a magnet glued (Loctite Superglue, ethyl cyanoacrylate) to the crab's carapace. They were then placed on an air-cushioned polystyrene treadmill ball (Bagheri et al., 2022; Donohue et al., 2022). When initially placed on the trackball, crabs display a ‘limb tuck’ behaviour, with legs and claws close to the body. The crabs were considered acclimatized after they began walking or feeding (monitored via video, and see How et al., 2012). However, we waited a minimum of 5 min before starting the experiment. The maximum wait time was 7 min.
2.3 DLC model
We filmed tethered male and female Eristalis tenax responding to open loop stimuli in the tethered flight arena (Figure 1a), including a sinusoidal grating (wavelength 20°, 5 Hz), a starfield stimulus (yaw at 50°/s), and a bar (width 3°, varying in height from 0.8° to 142°). We used DeepLabCut (DLC) version 2.3.3 (Table 2, and see Mathis et al., 2018) to train a model to track the thorax, and the peak downstroke angle, also referred to as the wing beat amplitude (WBA), of each wing (see e.g. Maimon et al., 2010). For this, we manually labelled the following six locations: tegula and tip of the left and right wing, anterior thorax, and anterior abdomen (Figure 1c), for 16 extracted video frames each from videos of four individual animals (2 males, 2 females). These frames included examples of yaw rotation, forward translation and no flight. We trained the DLC model using 300,000 maximum iterations. The evaluated train and test errors were 1.2 pixels and 1.16 pixels, respectively.
The resulting model was used in DLC-live version 1 (Table 2, and see Kane et al., 2020), to track the same six points in real-time. We quantified DLC-live's ability to perform markerless pose estimation of every unique frame under different video resolutions (Figure 1d) by filming at two spatial resolutions (240 × 320 pixels and 480 × 640 pixels) and three temporal resolutions (10, 60 and 100 Hz, all below the wingbeat frequency of Eristalis hoverflies, see Walker et al., 2010). We quantified the time between sequential unique frames analysed by DLC-live (‘Time Frame was Captured’ column in CAVE's DLC-Data file, see below), and converted this to temporal frequency.
2.4 Unity and the CAVE interface
In the FicTrac-Unity configuration, the crab was attached to a magnetic tether, allowing it to rotate freely along the yaw axis (Figure 2a, and see Donohue et al., 2022). The ball's rotations in radians, obtained from the video recording (Movie S1), multiplied by the ball's radius, were used to update the crab's avatar's sideslip and forward motion (Moore et al., 2014) in the Unity generated virtual world. Jitter from FicTrac ball-tracking noise was reduced by suppressing movements below a minimum distance over time. Thus, the avatar's position would only update if the position had changed by at least 0.1 cm, determined visually during preliminary testing. Rapid, random movement from tracking error was similarly limited by suppressing movements above a maximum distance where the avatar's position each frame was linearly interpolated to the new position at a maximum speed of 80 cm/s. Because FicTrac errors are not cumulative, these adjustments did not affect overall tracking accuracy. These filtered positions are found in log files from Unity and raw tracking values are found in log files from FicTrac. Videos were filmed at a resolution of 544 × 638 pixels at 60 Hz.
In the DLC-live-Unity configuration, we used the x-y-coordinates of the six points tracked in each frame (Figure 1c) to calculate the WBA and the resulting yaw and thrust values for the avatar controller based on the gain information from the settings manager, available in our CAVE interface (Figure 3). This is described in the Results section.
In both cases, we used UDP sockets to receive the x-y-coordinates of the points tracked by DLC-live and their confidence, or the forward and sideward rotation of the treadmill ball provided by FicTrac (blue, Figure 3, and see Aoukar, 2021). Upon receiving a packet, the C# script responsible for controlling the tethered animal's avatar updates its rotation and/or position (green, Figure 3), and the requisite main thread updates at the monitor refresh rate (60 Hz for the crabs, 165 Hz for the hoverflies, Figures 1a and 2a). In the tethered flight configuration, the stimulus monitors were displayed at 165 Hz, and the video filmed at 100 Hz. If there was no new DLC-live packet available, the avatar would keep performing thrust and yaw motion as defined by the previous packet (Figure 3).
In CAVE, the object of interest (OOI) manager tracks the conditions of objects of interest, such as the distance to the tethered animal's avatar (dashed line, Figure 3). The trial manager is used to define trials, which contain the closed loop component, flanked by pre- and post-stimulus intervals in open loop (Figure 3). The sequence manager is used to combine trials, and to define the open loop stimulus shown in between (Figure 3). The usability and testing of the CAVE interface are described in the Results.
2.5 Behavioural validation, trackball
For behavioural testing of the trackball configuration, we exposed six fiddler crabs (Gelasimus dampieri) to two virtual objects: a crab walking on the ground and a flying bird. The virtual crab was placed near an invisible burrow 30 cm from the tethered crab's avatar. Burrows are not normally visible from more than 10–15 cm away due to unevenness in the mudflat surface (e.g. Zeil & Layne, 2002). The burrow was centred on one of the four monitors. The virtual crab moved in a cyclical fashion 10 cm away and then back to the burrow, spending 9–12 s inside the burrow and 21–35 s above it, depending on the movement speed. Movement direction and speed (0.75–1.25 cm/s) were randomized. In between cycles the virtual crab briefly descended into the burrow.
The virtual bird, represented by a 3 cm diameter black sphere, appeared randomly on one of the four surrounding monitors, with the virtual crab randomly positioned either on the monitor to the left or right of the bird. The bird could move in three different ways, but always started 200 cm from the avatar and 15° above the avatar's visual horizon (Figure 2b). During the (1) ‘threatening’ condition, the virtual bird approached the crab in a straight line with a velocity of 19.9 cm/s from the start position and stopped moving at a virtual distance of 1.5 cm from the position of the crab's avatar at the beginning of the movement. In the two ‘non-threatening’ conditions, the bird was either (2) stationary at the start position or (3) moved forwards and backwards along a 30° arc around the start position without coming closer.
We displayed these stimuli in either open or closed loop. In open loop, the crabs' movements did not influence the appearance of the virtual stimuli on monitor, whereas in closed loop, the visual scene adjusted based on the crabs' position in the scene. Each trial was 60 s long. Both the virtual crab and bird were stationary at the start. After 5 s the bird started moving, and it kept moving for 12 s. To ensure adequate recovery time, there was a pause of at least 5 min between successive trials. We conducted six trials for each crab, presenting each bird movement condition with the virtual crabs positioned on both the left and right sides. The order of these trials was randomized using 3 × 3 Latin squares to prevent order bias.
2.6 Behavioural validation, tethered flight
For behavioural validation of tethered flight, we used a flower scene with two dandelion plants placed equidistantly 45° and 2 m from the hoverfly's start position in a textured ground plane with a sky. Each plant was approximately 20 cm tall and had a 20 cm diameter with the leaves. We used a proximity criterion to complete each trial, defined as a 3D sphere of 40 cm radius around either dandelion maintained for at least 1 s. The hoverfly's avatar was programmed to fly 30 cm above the ground. At this height, the proximity radius in the horizontal plane is 26 cm. Each trial had a maximum duration of 30 s. As control, we used an empty scene without any dandelion plants, but the same textured ground plane and sky, and a 30 s duration. In six females, we performed control (C) and dandelion (D) trials in the following order: C-C-C-C-C-C-D-D-D-D-D-C-C-D-D-D-D-D-C-C-D-D-D-D-D-C-C-D-D-D-D-D-C. The first six control trials were used for data analysis, whereas the interspersed control trials were used to control for learning and fatigue. In between each trial, we showed the empty scene rotating at 0.33 Hz in open loop. We excluded trials where the hoverfly stopped flying.
2.7 CAVE data extraction and latency measurements
We extracted the WBA difference (WBAD, defined as |WBALeft − WBARight|) and the WBA sum (WBAS, defined as WBALeft + WBARight) from the DLC-data.csv file. We extracted the resulting avatar yaw and position data using the y-rotation and x-z coordinates reported in the Transform.csv file. These files are saved automatically by CAVE.
To determine the time for closing the DLC-live-Unity loop we used a video with a small white square changing position every 0.5 s, replacing the fly in the set-up. The change in position was associated with a larger square alternating between white and black, upon which we placed a photodiode. We trained a DLC-model to track the position change of the small square. By using CAVE's ‘latency test’ option, the moving small square resulted in a corresponding black-to-white change on the stimulus monitor, which we recorded with a second photodiode, both at 10 kHz. To determine the effect of using the Unity Editor for CAVE, we developed a simplified version and compared the latency when displaying stimuli from the Editor and when displaying stimuli from the compiled version.
We quantified the time to close the FicTrac-Unity loop using a mirror and camera, allowing the ball's movements and the subsequent monitor updates of a vertical line to be recorded simultaneously. We used the cross-correlation between the pixel position of the vertical line on the monitor reflected by the mirror and the rotation of the ball in the x-axis from FicTrac to estimate latency. The video was recorded at 60 Hz, giving a 16.7 ms resolution.
2.8 Statistical analysis
Crab movement trajectories relative to the virtual crab and bird were analysed in R (R Core Team, 2008). Path coordinates were transformed such that virtual crabs were aligned at 0° and virtual birds at 90°. We then used linear mixed effects models using the lme4 package (Bates et al., 2015) to calculate the statistical significance of feedback condition (open or closed loop) and bird actions on distances crabs travelled away from the virtual crab and bird over the duration of the experiment. Animal ID was included as a random effect. Significance (p < 0.05) of fixed factors was determined by comparing nested models that differed by only one factor. All p-values presented were estimated by comparison with the final model (using likelihood ratio tests) that only contained significant fixed factors. The assumptions of the models were checked by exploring the distribution of the residuals (using Q–Q plots) and examining plots of the standardized residuals against the fitted values for each model.
Analysis of tethered flight data was performed in Matlab (R2021b, Mathworks), Prism 10.1.0 for Mac OS X (GraphPad Software) and R (R Core Team, 2008), with sample size (N = number of animals, n = number of trials) indicated in each figure legend. The proximity rate was defined as the ratio of trials where the tethered hoverfly's avatar triggered the proximity criterion. From each trial, we quantified the minimum distance to either of the two dandelions, or their corresponding position in the empty scene. The total path length (L) for each trial was defined as the vector sum of the tethered hoverfly avatar's location changes. We determined the hoverfly avatar's position 5 s into each trial. We calculated the weighted path score for the first 5 s of each trial using concentric circles with increasing diameters of 0.2 m around the two dandelions, or their corresponding positions in the empty scene. We scored each frame linearly between 0 (outside the largest circle) and 7 (inside the smallest circle) and defined the weighted path score as the average across frames.
We compared path straightness, defined as D/L (Benhamou, 2004), where D was the distance between the start and end position of each trial, and L was the total path length, for those dandelion trials that triggered the proximity criterion (<30 s) and those that did not (always 30 s). Path straightness depends on flight duration, so we corrected for this before statistical comparison: As 36 dandelion trials triggered the proximity criterion, we randomly selected 36 out of the remaining 77 dandelion trials and cropped these to the same durations.
We used Wilcoxon matched-pairs signed rank test for statistical analysis of proximity rate, and simple linear regression to correlate path length with trial duration. For all others we used linear mixed effects models using the nlme package to calculate the statistical significance of flower presence or proximity criterion being triggered. Animal ID was included as a random effect. Since the residuals were not normally distributed, we then used a permutation approach to evaluate significance by randomly permuting flower presence or proximity criterion 5000 times.
3 RESULTS
3.1 A virtual reality arena using machine vision and Unity
We developed closed loop, virtual reality (VR), by combining machine vision with the Unity gaming engine (https://unity.com). For flying insects, we used MultiMoVR (Kaushik et al., 2020) as a foundation, but updated it using Unity and DLC-live (Kane et al., 2020; Mathis et al., 2018). We filmed the flying insect from above (here an Eristalis tenax hoverfly, Figure 1a–c) using a PS3 camera (as in Kaushik et al., 2020) equipped with an infrared pass filter. The hoverfly was illuminated with infrared lights, with a musou black surface below maximizing the contrast (Figure 1b).
We trained a DLC model (Kane et al., 2020; Nath et al., 2019) to track six points on the hoverfly (coloured dots, Figure 1c). Two of these were along its longitudinal axis, and two each along the anterior edge of each wing stroke (Figure 1c). To determine the performance of DLC-live, we measured its update frequency under different video resolutions. We first fixed the spatial resolution at 240 × 320 pixels and found that at 10 and 60 Hz DLC-live runs reliably without missing any frames, but at 100 Hz 15% of the video frames were updated at 50 Hz (Figure 1d). We next fixed the video's spatial resolution at 480 × 640 pixels and found that at 60 Hz DLC-live runs reliably without missing any frames, and at 100 Hz 36% of the frames are run at 50 Hz (Figure 1e). If high temporal fidelity is crucial, increasing the graphics card performance or reducing the temporal or spatial resolution of the video, can thus be advantageous (hardware specifications, Table 2). From here on we show tethered flight data after filming at 240 × 320 pixels at 100 Hz (Figure 1d), which is below the wingbeat frequency (149–180 Hz) of Eristalis hoverflies (Walker et al., 2010).
For the trackball set-up, we recorded movement live at a resolution of 544 × 638 pixels at 60 Hz using a dorsally mounted camera and FicTrac (Figure 2). During testing, we chose this resolution and temporal frequency to allow both Unity and FicTrac to run on a single computer without skipping frames. Resolution and framerate were limited by the use of low-specification computer hardware (Table 2, Trackball arena) and could likely be improved with higher quality monitors, graphics card or CPU.
3.2 Escape responses in crabs
To test the FicTrac-Unity closed loop (Figures 2 and 3), we used tethered fiddler crabs walking on a trackball and compared their responses to a virtual crab and bird in closed and open loop (Ogawa et al., 2024). In both open and closed loop scenarios, we found that the tethered crabs' avatars ran significantly further away from the bird when it was looming (Figure 4a,e; red, Figure 4d; χ2 = 6.52; df = 2; p = 0.0385, linear mixed model). The fast-moving nature of the looming virtual bird reduced the importance of the feedback and there was no difference in how far the crabs ran away from the looming bird between open and closed loop (red, Figure 4d; χ2 = 1.25; df = 1; p = 0.264, linear mixed model).
However, the response to the virtual crab differed between open and closed loop. In closed loop (Figure 4a–c), the experimental crab's effort to move away from the virtual crab resulted in the virtual crab visually receding into the distance, accurately replicating the dynamics of a stationary object in the environment. In contrast, in open loop the tethered crabs could not increase their distance to the virtual crab, which would therefore have appeared to follow the experimental crab as it moved (Figure 4e–h). Consequently, in open loop, the experimental crabs ran approximately twice as far from the virtual crab (Figure 4h, χ2 = 4.29; df = 1; p = 0.0383, linear mixed model), irrespective of the virtual bird's behaviour (Figure 4; χ2 = 0.213; df = 2; p = 0.899, linear mixed model). Indeed, the experimental crabs moved away from the virtual crab even when there was a larger threat of a looming bird (Figure 4e). Our data (Figure 4) thus suggest that closed loop experiments offer a more accurate simulation of non-threatening stimuli.
3.3 CAVE user interface developed for tethered flight
For tethered flight VR, the six points (Figure 1c) were used to calculate the left and right wingbeat amplitude (WBAL and WBAR, Figure 5a,e), relative to the longitudinal axis (black line, Figure 5a,e). Note that it is important that DLC-live tracks the two points underlying the longitudinal axis robustly, as this is used as a ground truth (black, Figure 5a,e). We assume that if one wing has a larger WBA than the other (Figure 5a), this represents an attempted yaw turn in the opposite direction (Maimon et al., 2010). We developed a Unity Editor user interface for tethered flight, CAVE, that allows user control (grey, Figure 3). In the CAVE settings manager (grey, Figure 3), the user can define the relationship between the WBA difference (WBAD, defined as |WBAL − WBAR|) and the resulting yaw (Figure 5b), which has an inherent trade-off between manoeuvrability and stability (Kaushik et al., 2020). E.g., if the gain is high, the insect can turn rapidly, but this may result in spiralling flight. If the user wants WBAD above or below a certain value to have less effect on the resulting yaw, the variable setting can be used (Figure 5c,d).
We used the sum of the left and the right WBA (WBAS, Figure 5e) to define the tethered animal avatar's forward thrust as a linear (Figure 5f) or variable (Figure 5h) relationship. Alternatively, thrust can be constant (Figure 5g). The gains between WBA and avatar motion should ideally replicate the flight dynamics of the species investigated, to allow for naturalistic behaviour (Kaushik et al., 2020).
To validate that CAVE executes the yaw and thrust calculations for the avatar movement in the virtual world correctly we created a video (Movie S2) where we varied the WBAL and the WBAR of a stylized insect (Figure 5a,e). The graphs show the resulting WBAD (Figure 5ii) and WBAS (Figure 5ji) as a function of time. The WBAD and the WBAS as extracted by CAVE (Figure 5i,jii), and the resulting yaw (Figure 5iiii) and forward thrust (Figure 5jiii) of the avatar, after using the linear settings described above (Figure 5b,f), confirm that this was done correctly. The resulting trajectory seen from above (Figure 5k) highlights that due to the mismatch in sampling frequency between the input video (100 Hz), and the refresh rates of the monitors (165 Hz), the avatar's positions are not always evenly spaced (inset, Figure 5k).
3.4 CAVE experimental design
CAVE provides a user-interface for designing experiments (grey, Figure 3). Before each experiment, the user chooses a scene and populates it with objects (Figure 6a). These can be scaled as desired (here scaled to have the same size, Figure 7a,b), and defined as objects of interest (OOI, Figure 6a), such as a stationary flower (yellow and red, Figure 7a,b) or a moving insect or predator (brown and black, Figure 7a,b). The user defines the initial OOI behaviour, the encounter behaviour and defines an encounter (Figure 6a). For example, a flower can appear when the tethered animal's avatar is within a certain distance, or a predator can follow the tethered animal's avatar's motion at a defined speed.
Next, the user creates a trial, which is flanked by defined open loop pre- and post-stimulus times (Figures 3, 6b and 7c), which can be used to for example habituate the insect to the virtual surround or entice flying using the optomotor response. The trials are combined with interventions to, for example, move an OOI to a new location if the tethered animal's avatar is within a certain distance (Figure 6b). Each trial ends when a proximity criterion is fulfilled or after a fixed duration. For example, here we have defined a proximity criterion as a 40 cm distance from the OOI (a dandelion, dashed circles, Figures 3 and 7c) for at least 1 s. In trial 1 (Figure 7c), the avatar fulfilled the proximity criterion, whereas in trial 2 it was never close enough to the flower, and thus the trial continued for the defined duration. Trial 3 (Figure 7c) was designed to run for a fixed duration.
Trials can be put together into a sequence (Figures 3 and 6c). By using interpolations, an OOI can, for example, be placed at three evenly spaced distances within the trials of a sequence (flower, Figure 7c). Several sequences can be put together before the experiment (Figure 7d). The user defines what happens between sequences, in this example an open loop rotation (Figure 7d). This open loop stimulus ends when the user presses Next to initiate the next sequence.
During the experiment (Figure 6e), each trial's data are continuously saved as lists. When a trial ends, the most recent lists are saved as CSV files (Figure 6f), and the previous lists cleared to save memory. The Sequence and Trial files contain all experimental settings described above. The DLC-Data file contains frame-by-frame data from DLC-live (Figures 3 and 6f). The Transform files contain rotation and position data for the tethered hoverfly's avatar (used in e.g. Figure 5k), as well as for each OOI. This will, for example, include position and rotation data for dynamic OOIs (Figure 7d).
3.5 Hoverfly navigation
To test whether hoverflies navigate in the CAVE controlled VR, we used an empty scene (Figure 7a without the OOIs), as well as one with two OOIs, each a yellow dandelion plant (Figure 8a,b). We used linear settings for yaw and thrust (see Figure 5b,f). We found that the hoverflies' avatars flew towards the yellow dandelions, with a median proximity rate of 0.25 (dark pink, Figure 8b,c, Movie S3). When not fulfilling the proximity criterion, the trials lasted for 30 s (e.g. pale pink, Figure 8b). In the control empty scene without dandelions, post experiment we analysed whether the hoverflies would have fulfilled the proximity criterion (solid circle, dark green, Figure 8a), and found a median proximity rate of 0.1667 (Figure 8c, p = 0.0312, Wilcoxon matched-pairs signed rank test). We next quantified the minimum distance to either dandelion, or their corresponding position in the empty scene, from each trial and found that this was significantly smaller when the dandelions were present (0.44 ± 1.8 m, median ± range) compared with when they were not (0.77 ± 1.8 m, p = 0.0062, permutation test, Figure 8d). We quantified the path lengths from each trial as a function of trial duration, and found that it was directly proportional to trial duration (Figure 8e, R2 = 0.955, simple linear regression test), indicating that the hoverflies did not modulate their flight speed substantially.
We next examined the position of the hoverfly's avatar 5 s into each trial. A qualitative inspection of the data suggests that in dandelion trials, the hoverflies were closer to them (pink, Figure 8f) compared with their corresponding location in the empty scene (green, Figure 8f). We quantified this by determining the hoverflies' location in the first 5 s of each trial within concentric circles with increasing diameters of 0.2 m (Figure 8f). We quantified the weighted path score, which was 0.43 ± 2.4 (median ± range) in control trials and 0.51 ± 5.5 in the dandelion trials (pink vs. green, Figure 8g, p < 0.0001, permutation test). Taken together (Figure 8a–g), this suggests that the hoverflies navigated to the flowers when these were present, compared with their corresponding locations in the empty control scene.
To test if the flights were different when the proximity criterion was triggered compared with when it was not (pale vs. dark pink, Figure 8b,d–g), we looked at the dandelion trials. We expected the paths to be straighter if the flight towards the flowers was directed. We found that the paths were significantly straighter when the proximity criterion was triggered compared with when it was not (median 0.1096 and 0.2810, respectively, Figure 8h, p < 0.0001, permutation test), suggesting that the hoverflies were performing a directed flight towards the virtual dandelions when triggering the proximity setting.
3.6 Closing the loop
We next investigated how long each step in the tethered flight loop-closing process takes (Figures 1a and 9a). By using the DLC-live data reported in the DLC-Data file (Figure 6f), we extracted the time between each new frame update from the video camera. As the camera filmed at 100 Hz, we expected this to be 10 ms. However, we found that 14% of the frames had a 20 ms delay, and 0.8% a 30 ms delay (11.5 ± 3.8 ms, mean ± std., ‘Video’, Figure 9b, see also Figure 1d). It took DLC-live 15.9 ± 3.4 ms (mean ± std., Figure 9b) to perform the markerless pose estimation of the six points (see Figure 1c). Unity used 3.7 ± 2.0 ms (mean ± std., Figure 9b) to calculate the WBA, and the resulting yaw and thrust movements of the avatar.
We next measured the time it took to close the loop, from the ‘hoverfly’ to the monitors (blue, Figure 9a), using photodiodes and found this to be 47.6 ± 7.1 ms (mean ± std., blue, Figure 9a,b), which is longer than the sum of the other three components (pink-scale, Figure 9a,b). This suggests that the video camera does not immediately send each frame to DLC-live, that there is a delay between Unity drawing the frame and this being shown on the monitors, or a combination of the two (stars, Figure 9a). As CAVE is designed to run in the Editor, which could introduce delays, we quantified the latency of a scaled-down version and found that the compiled version was significantly quicker (35.5 ± 5.2 ms, vs. 47.6 ± 6.6 ms, mean ± std., Figure 9c, p < 0.0001, unpaired t-test). However, tracking more points, showing the DLC-live video on the monitor, or simultaneously recording the video, did not make the process slower (Figure 9d).
We measured the time it took to close the Fictrac-Unity loop using a video recording at 60 Hz. The peak in the cross-correlation between the pixel position of a vertical line on the monitor and the rotation of the ball in the corresponding axis from FicTrac was a single frame. Given the 16.7 ms temporal resolution from recording at 60 Hz, this indicates the loop was effectively closed within one to two frames, that is within 16.7–33.3 ms.
4 DISCUSSION
We here present a flexible VR system developed by combining the gaming engine Unity with machine vision. We validated this with crabs walking on a trackball whose movements were recorded with FicTrac (Figures 2 and 4), and with tethered hoverflies where the wing movements were analysed with DLC-live (Figures 1 and 8). In both cases, the actions of the tethered animal were used to control the movements of an avatar in a simulated environment (Figure 3). We also developed a Unity user interface, CAVE, which allows easy control of yaw and gain settings (Figure 5), experimental design (Figures 6 and 7), and post-experiment analysis (Figures 6 and 8). We show that the CAVE loop is closed in around 50 ms (Figure 9).
4.1 Behavioural validation
We validated our virtual reality systems using crabs walking on a trackball (Figures 2 and 4) and hoverflies flying on a fixed tether (Figures 1 and 8). Although conducting matching experiments in unrestrained environments is challenging, the behaviour of fiddler crabs escaping from approaching threats is well-documented. For instance, crabs in natural environments not associated with a burrow, or crabs on a standard treadmill, run away in the opposite direction of the threat when approached (Donohue et al., 2022; Hemmi & Tomsic, 2012; Land & Layne, 1995; Nalbach, 1990). In contrast, crabs with burrows flee to them when approached by predators (e.g. Hemmi, 2005). Our FicTrac-Unity system, simulating a looming bird or an apparent following virtual crab in open loop, elicited similar escape behaviours (Figure 4), supporting the behavioural validity of our observations.
Hoverflies are generalist pollinators, visiting a range of flowering plants (Doyle et al., 2020). Even if they can fly very fast when pursuing conspecifics or territorial intruders (Collett & Land, 1978; Thyselius et al., 2023), when feeding they fly between flowers at 0.2–0.3 m/s (Golding et al., 2001; Thyselius et al., 2018), similar to the speeds here (Figure 8e). When flying between flowers, there is a 10°–80° difference between their orientation at departure and arrival (Gilbert, 1983), suggesting that the tortuous paths recorded here (Figure 8b) are realistic. Furthermore, the birds-eye view of Eristalis tenax flight trajectories between flowers in the field (Golding et al., 2001), are remarkably similar to what we observed (Figure 8a,b). Thus, even if our set-up lacked the olfactory cues that are important for flower recognition (Nordström et al., 2017), nor did the virtual flowers provide any sugar reward, the hoverflies were able to navigate towards them in the virtual reality.
Importantly, while many previous arenas allow for real-time closing of the loop based on the animal's spatial position (e.g. Cruz et al., 2021; Frasnelli et al., 2018; Haberkern et al., 2019; Pokusaeva et al., 2023; Stowers et al., 2017), it can be difficult to analyse finer behaviours offline when the animals are moving. This is important as many flying insects move the head relative to the body (Cellini et al., 2022; Talley et al., 2023), the antennas actively move (Sant & Sane, 2018), and the legs might perform landing movements (Shen & Sun, 2017). By tethering the animal in space (Figures 1a and 2), our approach allows for such analyses post experiment.
4.2 Low delays for closing the loop
To make the VR immersive it is important to close the loop with low delays and to quantify this with photodiodes or similar. In humans, for example, cybersickness is induced when delays are longer than ca 60 ms, and the feeling of body ownership reduced when delays are above 100 ms (see e.g. Brunnström et al., 2020; Caserman et al., 2019). However, walking Drosophila appear to be able to handle much longer delays than this in some situations (Schuster et al., 2002).
Other recent VR systems, for example one that closed the loop based on centroid tracking of walking Drosophila, reported a 40 ms delay (Tadres & Louis, 2020), a Raspberry Pi Virtual Reality (PiVR) for optogenetic stimulation reported a 30 ms delay, FlyVR has a latency of 40–80 ms (Stowers et al., 2014; Straw et al., 2011) and a similar system for bumblebees reported a 50 ms delay (Frasnelli et al., 2018). Our measured delays were around 50 ms for tethered flight (blue, Figure 9b), but substantially shorter for the FicTrac-Unity configuration (17–33 ms). In VR, the total latency comes from three major components: tracking, rendering, and display (Stowers et al., 2014). It is likely that, in part, FicTrac-Unity is faster as tracking movement of patterns on the trackball's surface (Movie S1) requires less processing, whereas DLC-live (Kane et al., 2020) uses a neural network model running on the GPU to track wing movements (Figure 1a,c), which is processing-intensive and takes about 16 ms to complete (DLC-live, Figure 9b). There are, however, other differences between the FicTrac-Unity and DLC-live-Unity configurations, such as the two systems using different hardware (Table 2), and conventional cameras and computer monitors have latencies that can be difficult to assess (Stowers et al., 2014). In addition, the FicTrac-Unity experiments were run in a compiled version of Unity, while CAVE used the Unity Editor, which slows down closing the loop (Figure 9c).
If short delays are important, it could be possible to optimize the flight configuration with a magnetic tether as this allows unrestricted yaw turns (Duistermars & Frye, 2008), similar to our crab configuration (Figure 2). This could allow the hoverfly to perform fast saccades, while the avatar's translational motion would be controlled by Unity. Importantly, however, we show that the current configurations allow hoverflies to navigate towards flowers (Figure 8) and crabs to respond appropriately to other crabs (Figure 4).
4.3 Usability
Our VR system using Unity with machine vision (DLC-live or FicTrac) is highly adaptable to different systems or species, as highlighted by our work using hoverfly tethered flight (Figure 8) and crab trackball experiments (Figure 4). Unity can be run by itself (Figures 2-4), or with our user interface, CAVE, which provides a version that requires no coding to get started. CAVE allows the user to design visual experiments consisting of trials that can be put together into sequences (Figures 3, 6 and 7), and provides easy access to yaw and thrust gain settings (Figure 5). Importantly, all data are stored in CSV files for offline analysis (Figure 6). This allows the user to combine detailed knowledge of the position of each OOI and the tethered animal's avatar at each point in time (see e.g. Figures 5i–k and 8a,b), which can be combined with offline analysis of the DLC movies, to quantify e.g. leg or head movements (Figure 1c). Our GitHub repositories includes not only user manuals, instruction videos and other documentation, but starting scenes, objects and OOIs, to get the novice user started.
While the supplied CAVE interface contains many useful tools, future improvements could e.g. include a repository of more naturalistic scenes from a range of habitats (Tolhurst et al., 1992). Importantly, Unity allows the user to create complex scenes with appropriate perspective correction, shadows and other 3D cues, whereas this can be difficult to implement using e.g. LED panels. The monitors used here (Figures 1a and 2a) are developed for human vision, and thus do not contain the correct colour space for insects. In contrast, the Antarium (Kócsi et al., 2020) was developed to be optimized for ant vision. As such, it could be beneficial to use a digital display that is more correct for insect vision, and which is updated at a higher rate. We hope that by making our interface available on GitHub, other participants may contribute to such future development.
AUTHOR CONTRIBUTIONS
Yuri Ogawa: Methodology, validation, formal analysis, investigation, writing—review and editing, visualization, supervision; Raymond Aoukar: Conceptualization, methodology, software, validation, formal analysis, investigation, writing—review and editing, visualization; Richard Leibbrandt: Conceptualization, methodology, software, writing—review and editing; Jake S. Manger: Methodology, software, validation, formal analysis, investigation, writing—review and editing, visualization; Zahra Bagheri: Methodology, validation, formal analysis, investigation, supervision, writing—review and editing; Luke Turnbull: Methodology, validation, investigation, writing—review and editing; Chris Johnston: Methodology, software, writing—review and editing; Pavan K. Kaushik: Conceptualization, methodology, software, writing—review and editing; Jaxon Mitchell: Software, validation, writing—review and editing; Jan M. Hemmi: Validation, resources, writing—review and editing, supervision, project administration, funding acquisition; Karin Nordström: Conceptualization, validation, resources, writing—original draft, visualization, supervision, project administration, funding acquisition.
ACKNOWLEDGEMENTS
We thank Biomedical Engineering at SAHLN and the Botanic Gardens of Adelaide for their ongoing support. We thank Charlotte Goh for helping with running crab behavioural experiments and Juan Francisco Guarracino for helping with crab collection. This research was funded by the US Air Force Office of Scientific Research (AFOSR, FA9550-19-1-0294 and FA9550-23-1-0473), the Australian Research Council (ARC, DP180100491, FT180100289, DP200102642, DP210100740 and DP230100006) and the Flinders Foundation.
CONFLICT OF INTEREST STATEMENT
Raymond Aoukar is the Director of Ibelin, which provides software consulting. The other authors declare no competing interests.
Open Research
PEER REVIEW
The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer-review/10.1111/2041-210X.14449.
DATA AVAILABILITY STATEMENT
Data and software for analysis available via https://doi.org/10.5061/dryad.83bk3jb01 (Ogawa et al., 2024). The FicTrac-Unity interface can be found at Github: https://github.com/jakemanger/fiddlercrabvr. The CAVE interface, user manuals, user videos and other documentation is available via Github: https://github.com/HoverflyLab/CAVE_TetheredFlightArena.