CamoGAN: Evolving optimum camouflage with Generative Adversarial Networks
Abstract
en
- One of the most challenging issues in modelling the evolution of protective colouration is the immense number of potential combinations of colours and textures.
- We describe CamoGAN, a novel method to exploit Generative Adversarial Networks to simulate an evolutionary arms race between the camouflage of a synthetic prey and its predator.
- Patterns evolved using our methods are shown to provide progressively more effective concealment and outperform two recognized camouflage techniques, as validated by using humans as visual predators.
- We believe CamoGAN will be highly useful, particularly for biologists, for rapidly developing and testing optimal camouflage or signalling patterns in multiple environments.
Foreign Language AbstractHungarian
hu
- Az egyik legnagyobb kihívás a különböző védőszínezetek fejlődésének modellezésénél a színek és a textúrák lehetséges kombinációinak óriási száma.
- A cikk célja egy Generatív Ellenséges Hálózatokon alapuló módszer, a CamoGAN, bemutatása, amely képes egy szintetikus zsákmány és ragadozója közötti evolúciós fegyverkezési verseny szimulálására.
- Megállapítottuk, hogy a bemutatott módszerrel kifejlesztett minták fokozatosan hatékonyabb rejtőszínt biztosítanak, és felülmúlnak két gyakori álcázási technikát, amelyet egy, embereken, mint vizuális ragadózókon végzett kísérlet igazol.
- Meggyőződésünk, hogy a CamoGAN egy jelentősen hasznos módszer lesz, különösen biológusok számára, optimális álca- és jelzőminták gyors kifejlesztésére és tesztelésére több környezetben.
1 INTRODUCTION
Historically, camouflage has been considered a prominent example of an evolutionary arms-race between prey and predators (Dawkins & Krebs, 1979), whereby one species gradually evolves harder-to-see colouration which, as a consequence, exerts evolutionary pressure on the other species for a more effective detection system (Stankowich & Coss, 2007). Despite the expectation that camouflage will become progressively more effective, it has been challenging to model how the evolution of optimal camouflage might take place in a particular environment (Merilaita, Scott-Samuel, & Cuthill, 2017). This problem has inspired biologists for centuries, ever since Erasmus Darwin claimed that ‘the colours of many animals seem adapted to their purposes of concealing themselves either to avoid danger, or to spring upon their prey’ (Darwin, 1794).
In recent years, research has predominantly focused on testing the advantage of particular camouflage strategies using predefined patterns designed by the experimenter (Troscianko, Skelhorn, & Stevens, 2017). Although these studies are able to provide strong evidence that certain camouflage works better than others, they have limited power to explain what would be the optimum pattern for concealment. One of the challenges is simply the number of potential patterns in a complex visual environment: the parameter space for all possible colour and texture combinations is often gigantic (Fennell, Talas, Baddeley, Cuthill, & Scott-Samuel, 2019).
One solution to this problem is to employ dynamically evolving stimulus sets in detection experiments. Bond and Kamil presented blue jays with digital moths on computer screens in greyscale, with birds trained to peck on detected prey items (Bond & Kamil, 2002). The digital moths evolved on the basis of predetermined ‘genes’. While this approach was effective, improving survival, manually encoding genes for a specific task makes generalizability difficult: for example, increasing the parameter space beyond a certain complexity (using colour rather than greyscale, say) makes testing live subjects unrealistic because of the number of trials required. However, putting a credible artificial observer into the evolutionary loop would circumvent this problem.
Recently, methods that stem from Artificial Intelligence have proved capable of deceiving human observers: deep neural networks can mimic fine art (Gatys, Ecker, & Bethge, 2015) or create photorealistic images based on text descriptions (Zhang et al., 2017). Here, we report CamoGAN, an unsupervised method to create biologically-relevant camouflaged stimuli based on Generative Adversarial Networks (GAN) (Goodfellow et al., 2014). GANs employ competing agents, usually modelled as deep neural networks, to perform a zero-sum game. In their original example, Goodfellow and colleagues illustrated the underlying idea of GANs using a competition between police and a counterfeiter. The objective of the police (discriminative network) was to distinguish between counterfeit and real money, whilst the counterfeiter (generative network) aimed to produce counterfeit money that the discriminative network would falsely identify as real. Both agents evolved over time: the police became more sensitive to fake money, while the counterfeiter produced more and more authentic-looking forgeries. As pointed out by Goodfellow et al, over time and if such a pair of strategies exist, these two systems will become stable at a so-called Nash equilibrium. In a Nash equilibrium, given two agents with complete knowledge of their opponent's strategy, there is no possible improvement that can be made. Nash equilibria often correspond to evolutionary stables strategies, as proposed in evolutionary game theory (Maynard Smith & Price, 1973). The arms race between a counterfeiter and the police mirrors antagonistic agents, like predator and prey, and is therefore of inherent biological interest.
In particular, predators evolve, or learn, to locate camouflaged prey by detecting them against some background, while prey evolve to remain undetected using camouflage strategies (e.g. background matching, disruptive colouration) to lower its signal-to-noise ratio (visual features of the target relative to factors that interfere with extraction of the signal) against the background (Cuthill, 2019; Merilaita et al., 2017). In other words, the objective of the predators is to distinguish visual input that contains prey from scenes that are devoid of prey (Figure 1). Meanwhile, the prey aims to achieve a visual signature that makes a scene containing them look empty to a predator. In this example, the discriminative network can be thought of as the visual system of the predator that evolves over time to more effectively detect prey, and the generative network represents the genotype of prey, where new generations can inherit properties of previous survivors and exhibit better camouflage.

To model the evolution of camouflage and demonstrate the effectiveness of CamoGAN producing increasingly difficult-to-see patterns, we conducted a validation study. In our experiments, targets were triangles presented against images of ash tree Fraxinus excelsior bark, a complex texture (Figure 1). Targets were extracted from each network after a set number of iterations and contrasted with two control patterns: the average colour of backgrounds, and a pattern developed through Fourier analysis (Figure 2). Averaging the background is considered to offer ‘good’ concealment (Merilaita & Stevens, 2011) and, as in our study, is often used in camouflage research as a baseline control (Cuthill et al., 2005). We adopted the Fourier approach as it has previously been shown to be highly effective for developing (military) camouflage against human observers (Toet & Hogervorst, 2012). As such, it is a straightforward and well-defined technique, which is also readily implementable. To quantify difficulty, we measured the reaction time for human participants to detect the targets when displayed on a computer screen.

It is important to note that contrary to other GAN implementations (e.g. Zhu, Park, Isola, & Efros, 2017), where the generative network modifies a whole image, in our implementation only the target was evolved by the generative network, leaving the background unmodified. Using this approach, we demonstrate that a purely artificial system can demonstrate the gradual evolution of camouflage.
2 MATERIALS AND METHODS
2.1 Participants
45 participants (4 male, 41 female) were recruited from the student population at the University of Bristol. The number 45 arises as a multiple of the number of generated ‘strains’ of GAN targets (see below). All participants had normal or corrected to normal vision. Informed consent was obtained from all participants as stated in the Declaration of Helsinki. All experiments were approved by the Ethics Committee of the University of Bristol's Faculty of Science (application 60,061) and were performed in accordance with relevant guidelines and regulations.
2.2 Stimulus construction and neural network parameters
We took photographs of the bark of 100 individual ash trees (Fraxinus excelsior) in October 2017 at Ashton Court Estate, Bristol, UK (2°64.8′ W, 51°44.6′ N). All images were taken on overcast days at around noon in a forest environment where trees were typically 3–5 m apart. No images were taken in direct sunlight or within a cast shadow. Images were taken from a distance of 1 m and a focal length of 18 mm using a Nikon D90 DSLR camera. Photographs contained an X-Rite ColorChecker Passport (X-Rite Inc.), which was used to standardize images to sRGB colour space using a cubic transformation function implemented in matlab 2015b (The MathWorks, Inc.). Images were cropped so they only contained tree bark and resized to 1 pixel equalling 1.5 mm, using cubic interpolation.
Image size for the networks was selected to be 256 × 256 pixels, while the target triangle size was 32 × 64 pixels. Networks were trained on a custom-built PC with two graphical processing units (1× Nvidia Titan X Pascal and 1× Nvidia Geforce GTX 1,080 Ti) using the open-source neural network library keras 2.0.8 (Chollet et al., 2015) with a tensorflow 1.4 backend, written in python 3.6 (Python Software Foundation). The size of the training set was 3,200 images, which comprised 32 randomly selected crops from each of the 100 images of ash bark.
The discriminative network was set to distinguish between empty scenes of tree bark and scenes with a target triangle present in the middle. To create effective camouflage, the task of the generative network was to modify the colour and patterns of target triangles over randomly selected backgrounds so that the discriminative network would identify them as empty images. The discriminator was a convolutional neural network (CNN). CNNs contain convolution layers, which allow the network to learn feature maps that code for particular visual properties (e.g. spatial frequency or orientation) and form complex interactions between these layers. The generator network contained transposed convolution layers, which can perform convolution in the backward direction, i.e. creating a two-dimensional image rather than decomposing it (Goodfellow, Bengio, & Courville, 2016).
We adapted the architecture of a simple GAN used to generate convincing examples of the MNIST database (Atienza, 2017). The architecture of the discriminative network was: Conv2D(64), MaxPooling2d(2,2), Conv2D(128), MaxPooling2d(2,2), Conv2D(256), Conv2D(512), Flatten, Dense(1) and a sigmoid activation function to obtain predictions. All Conv2D layers had leakyReLU activations with alpha of 0.2 and ‘same’ padding. Dropouts were set to 0.5 for all Conv2D layers. The architecture of the generative network was as follows: Architecture: Dense(8192, with dropout of 0.6), BatchNormalization, Dense(4096), BatchNormalization, Reshape(64,32,2), Conv2DTranspose(4,3), BatchNormalization, Conv2DTranspose(3,3) and a sigmoid activation function to normalize pixel values between 0 and 1. All batch normalization had momentum of 0.9 and Conv2DTranspose layers had padding set to ‘same’. The input for the generative network was 100 uniformly sampled random numbers between 0 and 1. In order to train the generative network, the generative network and discriminative network was linked together into an ‘adversarial network’ and its loss was calculated based on how many images (with targets) the discriminative network misclassified as empty. Networks were trained for 10,000 steps with a batch size of 32. The RMSprop optimiser was used for both the discriminative and adversarial networks, with learning rates of 2 × 10–4 and 1 × 10–4, and decays of 6 × 10–8 and 3 × 10–8, respectively. Binary cross-entropy was used as the loss function. Binary cross-entropy is commonly used as a binary classifier in neural networks (in this case whether the scene was empty or not) and it calculates the log probability of an observation belonging to a particular class (Goodfellow et al., 2016). Ten networks were trained in total, with 15 evolved targets (strains) extracted after 500, 2,500, 5,000, 7,500 and 10,000 training steps from each network (Figure 2), resulting in a total of 1,050 GAN-derived targets.
In addition to the GAN targets, we included two control treatments: ‘Fourier’ and ‘Average’. These were constructed using the following methods. Initially, 32 randomly positioned squares (sized 256 × 256 pixels) were cropped from each of the 100 images of tree bark. ‘Fourier’ targets were constructed by decomposing the 3,200 crops into energy and phase using a two-dimensional Fourier transformation, followed by taking pixel-wise average energy across the images. 15 targets were created by randomizing the phase for each, and after an inverse Fourier transformation, the resulting images were indexed with 32 quantized colours obtained via minimum variance quantization and dithering of the original crops (Toet & Hogervorst, 2012). Accordingly, the final images featured the average spatial frequency and colours of the original ash bark images. The ‘Average’ targets were created by taking the average colour of the same 3,200 crops. Targets were created by cropping a 32 pixel high by 64 pixel wide triangle from the images. Both processes were repeated ten times and the resulting targets were grouped together with the GAN-derived targets, totalling seven treatment groups.
2.3 Experimental procedure
A bespoke program, written using the Psychtoolbox-3 extensions (Brainard, 1997; Kleiner, Brainard, & Pelli, 2007; Pelli, 1997) for matlab 2015b (The MathWorks, Inc.) was used to construct and present the stimuli, and to collect experimental data. Each experimental trial consisted of a single target presented at a random position on randomly selected images of ash tree bark (Figure 3) on a gamma-corrected computer display (Iiyama). The background images were 512 × 1,024 pixels and subtended a visual angle of 26.5° × 53°. Targets had a size of 64 × 32 pixels, accounting for a visual angle of 5.5° × 2.75°. A central fixation cross on mid-grey background was displayed for 2 s prior to stimulus onset. To avoid spotting the target too early due to a location close to the fixation cross, each target was placed at least 64 pixels away from the centre of the screen.
Participants were required to click on the detected target as quickly and accurately as possible, using a computer mouse. Their reaction times and whether they hit the target were recorded. Each trial was only visible for a maximum of 10 s, after which the experiment moved on to the next trial. Timed out trials and missed targets were removed from the results.
Each participant was randomly assigned to a single strain of targets: one target per five treatment groups from all 10 GANs and 10 targets from both control groups, repeated five times in a random order, totalling 350 trials. Each of the 15 strains was exclusively presented to three participants only. In addition to the experimental trials, 10 practice trials using targets with a single random colour were presented to the participants at the beginning of the experiment to familiarize them with the task.

2.4 Statistical analyses
Analyses were carried out using the lme4 package (Bates, Machler, Bolker, & Walker, 2015) in r (R Core Team). General linear mixed model (GLMM) analyses were initiated with the most complex model and were gradually simplified and assessed for significantly improved fits. Likelihood ratio tests were used to obtain p-values for the full model and the effect against a model without the effect. Nested models were compared using the change in deviance on removal of a term.
3 RESULTS
We found that targets produced by GANs after more iterations were increasingly harder to find. In the first analysis, the effect of increasing training steps on reaction time was examined. GLMMs were used to show that targets became significantly harder to find as the number of iterations increased. The fixed effect of iterations on log-transformed reaction times were analysed by fitting general linear mixed models.
A random effects model with a common subject slope, but different intercepts was not a significantly poorer fit to the data than a model with varying slopes and intercepts (Δdeviance = 3.1571, df = 2, p = .2063). Fitting the simpler model gave an estimate for the effect of training steps on reaction time of 2.077 × 10–5 (SEM = 1.006 × 10–6) and this was highly significantly different from zero (Δdeviance = 418.42, df = 1, p < .0001).
Furthermore, targets evolved by GANs were more effective than controls (Figure 4). A random effects model with a common subject slope, but different intercepts was not a significantly poorer fit to the data than a model with varying slopes and intercepts (Δdeviance = 18.029, df = 27, p = .9026). Fitting the simpler model showed that treatment means were significantly different (Δdeviance = 1,089.7, df = 6, p < .0001). Based on Tukey post hoc tests, all GAN-derived stimuli greater than 500 steps had significantly higher mean reaction times than Average targets. Fourier targets were significantly harder to detect than Average (p < .001) but GAN-derived stimuli with 5,000+ training steps were significantly harder to detect than Fourier (p < .001). For details on the Tukey post hoc tests see Table S1 in the Supporting Information.

We also found that some GANs produced more effective camouflage than others. Reaction times to GAN-derived stimuli of 10,000 training steps were selected and grouped by the network of origin. A random effects model with a common slope but different intercepts was chosen as the initial model, because as it provided a significantly better fit to the data than a model with varying slopes and intercepts (Δdeviance = 73.574, df = 54, p = .0395). The effect of networks on reaction time was significantly different from zero (Δdeviance = 29.144, df = 9, p < .0001). Mean reaction times ranged between 1.57 (SEM = 0.09) and 1.25 (SEM = 0.04) seconds (see Figure S2 in the Supporting Information).
4 DISCUSSION
CamoGAN facilitates the rapid and automated development of optimal camouflage patterns. We found that the generator network evolved increasingly harder to find patterns over progressive training steps. Interestingly, discriminator networks maintained high accuracy (i.e. discriminative power between empty and non-empty backgrounds), which started to diminish at later iterations. Generative networks on the other hand showed a low, but increasing accuracy, mirroring a strong selection pressure for effective camouflage (for an example see Figure 5). From visual inspection, it is clear that the largest changes occur at earlier stages of pattern evolution with the rate of change in patterns beginning to decrease beyond 5,000 iterations (Figure 2). Accordingly, increments in detection times also started to diminish (Figure 4). This result demonstrates that CamoGAN can successfully illustrate an evolutionary arms-race, producing camouflage that is difficult to identify. In other words, the discriminator network improves discrimination in response to the generated features and the generative network evolves to reduce discrimination. The end point, just as in a predator-prey arms-race, is either an equilibrium or “extinction” of one or both components (that is, the discriminator finds 100% of the generated targets on every iteration).

In this study, both generator and discriminator networks were initialized with white noise, which is the reason why patterns at low iterations have a high inter-network variability (see first column in Figure 2). We used this setup to demonstrate convergent evolution: the visual variance between the chosen backgrounds of tree bark was low and hence we expected that networks would come up with similar (and similarly effective) solutions after a higher number of training iterations. Nevertheless, certain networks were found to produce significantly harder to see patterns than others, which suggests that CamoGAN has the potential for modelling polymorphic scenarios, commonly found in nature (Karpestam, Merilaita, & Forsman, 2016). This could be done by changing the background set, e.g. using different types of bark. The method can also clearly be adapted to use fixed initializations, for example one could initialize the discriminator with pre-trained networks capable of better target detection (Simonyan & Zisserman, 2015). Our implementation follows a design that was deliberately simple, and we acknowledge that many alternative and more complex GAN architectures could be employed (Creswell et al., 2018). However, we believe that maintaining a simple architecture aids understanding and allows easier implementation for early adopters.
As such, in our example, both discriminator and generator were left “unconstrained” and no biologically-relevant limitations were imposed. Furthermore, biological systems hardly ever evolve under a single selection pressure. One promising development that could be beneficial in modelling biological systems would be to introduce multiple discriminator networks, standing for multiple observers influencing the target (generator network). For example, one of the discriminators could be limited to dichromatic representations of the target, simulating a typical mammalian predator (Jacobs, 2009), or with altered visual acuity or viewing distance. A setup like this could simulate how patterns evolve under different selection pressures (Cuthill et al., 2017), e.g. trichromat conspecifics and dichromat predators. It is also possible to introduce restrictions and limitations to the generator, other than the size and shape of the target; for example, bilateral symmetry or restriction of the available colour space. These constraints are readily implementable by either modifying the output of the generator before it is shown to the discriminator (e.g. mirroring the output in the case of bilateral symmetry) or modifying the input layer of the discriminator (e.g. applying a Fourier transformation that simulates different levels of visual acuity (Caves & Johnsen, 2017)). Imposing such limitations will undoubtedly alter the range of optimal patterns that could be discovered by the system (Fennell et al., 2019; Troscianko, Wilson-Aggarwal, Griffiths, Spottiswoode, & Stevens, 2017). The present implementation generated patterns using transposed convolution, allowing the generator to establish its own optimized parameter space of hard-to-see solutions. It is also possible to force the generator to pick from a parameterized library of existing patterns that only vary in a few dimensions (e.g. colour) and use the generator only as an optimizer without active pattern generation (Fennell et al., 2019).
We have demonstrated that CamoGAN outperforms two control methods for generating effective camouflage. This novel technique allows the exploration of high-dimensional texture and colour spaces in a way impossible by human, or non-human, observers. This obviously has applications for the development of military and civilian camouflage (Talas, Baddeley, & Cuthill, 2017), but will also allow biologists to assess the trade-offs (e.g. distance-dependent camouflage, visibility for conspecifics vs. camouflage against predators), beyond a pure concealment function, in natural camouflage patterns (Barnett, Michalis, Scott-Samuel, & Cuthill, 2018). More widely, by reversing the reward function for the generative and/or discriminative networks, one can determine the optimal conspicuous signal and or sensory tuning for a given environment.
ACKNOWLEDGEMENTS
We thank Jack Daniels and Thomas Ma for helping to take photographs of tree bark and we are grateful towards Erik Stuchly, Siyan Ye, Khishika Naidoo and Frankie King for their help during data collection. L.T. and J.G.F. were supported by an EPSRC grant (EP/M006905/1) awarded to N.E.S.-S., R.J.B. and I.C.C. We thank three anonymous reviewers for their insightful comments that improved the manuscript.
CONFLICT OF INTEREST
The authors declare no competing interests.
AUTHORS' CONTRIBUTIONS
L.T. conceived the project. L.T. and J.G.F. developed the method. L.T. and K.K. collected the data. L.T. analysed the data and led the writing with help from J.G.F., K.K., I.C.C., N.E.S.-S. and R.J.B. All co-authors assisted with edits and approve publication.
Open Research
DATA AVAILABILITY STATEMENT
Our implementation of the CamoGAN method is available and maintained at https://gitlab.com/asb-lab/camogan (https://doi.org/10.5281/zenodo.3529601) (Talas & Fennell, 2019). The dataset used in the analysis is available at https://doi.org/10.6084/m9.figshare.10099394 (Talas et al., 2019).