Volume 90, Issue 10 p. 2431-2445
Open Access

Male size and reproductive performance in three species of livebearing fishes (Gambusia spp.): A systematic review and meta-analysis

Bora Kim

Corresponding Author

Bora Kim

Department of Evolutionary Biology, Bielefeld University, Bielefeld, Germany


Bora Kim

Email: [email protected]

Search for more papers by this author
Nicholas Patrick Moran

Nicholas Patrick Moran

Department of Evolutionary Biology, Bielefeld University, Bielefeld, Germany

Centre for Ocean Life DTU-Aqua, Technical University of Denmark, Lyngby, Denmark

Search for more papers by this author
Klaus Reinhold

Klaus Reinhold

Department of Evolutionary Biology, Bielefeld University, Bielefeld, Germany

Search for more papers by this author
Alfredo Sánchez-Tójar

Alfredo Sánchez-Tójar

Department of Evolutionary Biology, Bielefeld University, Bielefeld, Germany

Search for more papers by this author
First published: 06 July 2021
Citations: 6

Handling Editor: Antica Culina

Nicholas Patrick Moran and Alfredo Sánchez-Tójar contributed equally to this work.


  1. The genus Gambusia represents approximately 45 species of polyandrous livebearing fishes with reversed sexual size dimorphism (i.e. males smaller than females) and with copulation predominantly via male coercion. Male body size has been suggested as an important sexually selected trait, but despite abundant research, evidence for sexual selection on male body size in this genus is mixed.
  2. Studies have found that large males have an advantage in both male–male competition and female choice, but that small males perform sneaky copulations better and at higher frequency and thus may sire more offspring in this coercive mating system. Here, we synthesized this inconsistent body of evidence using pre-registered methods and hypotheses.
  3. We performed a systematic review and meta-analysis of summary and primary (raw) data combining both published (n = 19 studies, k = 106 effect sizes) and unpublished effect sizes (n = 17, k = 242) to test whether there is overall selection on male body size across studies in Gambusia. We also tested several specific hypotheses to understand the sources of heterogeneity across effects.
  4. Meta-analysis revealed an overall positive correlation between male size and reproductive performance (r = 0.23, 95% confidence interval: 0.10–0.35, n = 36, k = 348, 4,514 males, three Gambusia species). Despite high heterogeneity, the large-male advantage appeared robust across all measures studied (i.e. female choice, mating success, paternity, sperm quantity and quality), and was considerably larger for female choice (r = 0.43, 95% confidence interval: 0.28–0.59, n = 14, k = 43). Meta-regressions found several important factors explaining heterogeneity across effects, including type of sperm characteristic, male-to-female ratio, female reproductive status and environmental conditions. We found evidence of publication bias; however, its influence on our estimates was attenuated by including a substantial amount of unpublished effects, highlighting the importance of open primary data for more accurate meta-analytic estimates.
  5. In addition to positive selection on male size, our study suggests that we need to rethink the role and form of sexual selection in Gambusia and, more broadly, to consider the ecological factors that affect reproductive behaviour in livebearing fishes.


Body size is one of the most important traits affecting the fitness of organisms (Roff, 2002). Larger females are often more fecund than smaller females while larger males may outcompete smaller males for access to females and are preferred by females in many species (Andersson, 1994; Roff, 2002). An outstanding example of large-male advantage can be found in pinnipeds, where selection has led to males of some species being up to seven times heavier than females (Lindenfors et al., 2002). Nonetheless, the largest are not always the most successful. For example, trade-offs between small and large male body size led to an intermediate-sized-male advantage and stabilizing selection in midges (Neems et al., 1998). Furthermore, negative selection on male body size has been found in several fly species (McLachlan & Allen, 1987) and waders (Blomqvist et al., 1997), in which small males outperform large males in aerobatic display. In most species, we do not yet understand whether and how body size is selected for and how intraspecific variation in body size is maintained.

Sexual size dimorphism denotes a difference in adult body size between males and females of the same species. Female-biased sexual size dimorphism (i.e. females larger than males) is also called reversed sexual size dimorphism despite females being usually the larger sex in the majority of species except most birds and mammals (Blanckenhorn, 2005). An extreme case of reversed sexual size dimorphism is observed in a family of livebearing fishes, Poeciliidae, in which males of some species are among the smallest living vertebrates (Bisazza, 1993; Pilastro et al., 1997). Within this family, the genus Gambusia contains approximately 45 species of promiscuous fishes with generally non-descript appearance (Froese & Pauly, 2000). Unlike most fishes, they show internal fertilization with males using a gonopodium, an intromittent organ that transfers sperm into the female gonopore (Constanz, 1989). Whether courtship occurs is unclear (Bisazza & Marin, 1991; Martin, 1975); however, it appears that males commonly bypass female cooperation and forcibly inseminate females via coercive mating tactics (i.e. ‘gonopodial thrusting’; Bisazza, 1993; Bisazza & Marin, 1995; Itzkowitz, 1971; Martin, 1975; McPeek, 1992). Males can perform about one gonopodial thrust per minute (Wilson, 2005), and this incessant male harassment seemingly lowers female fitness by reducing foraging efficiency as well as increasing predation risk and energy expenditure (Dadda et al., 2005; Iglesias-Carrasco et al., 2019). Gambusia shows considerable interspecific and intraspecific male size variation, making them an often-used model to study male body size selection (Deaton, 2008; Zulian et al., 1995). However, despite abundant research, evidence of size-dependent sexual selection is mixed.

Low detection and increased agility in performing gonopodial thrusts have been proposed as explanations for the apparent mating advantage of small males, and thus, for the existence of reversed sexual size dimorphism in Gambusia (Hughes, 1985). Laboratory experiments have found that smaller males perform thrusts at higher frequency (Bisazza & Marin, 1995), are more likely to inseminate females (Pilastro et al., 1997; but see Head et al., 2015) and may sire more offspring than larger males in eastern mosquitofish (Gambusia holbrooki; Head et al., 2017). However, large male size may confer an advantage in intrasexual competition. For instance, large males have been observed to monopolize access to females and prevent other males from attempting gonopodial thrusting in both eastern and western mosquitofish (Gambusia affinis; Bisazza & Marin, 1995; Hughes, 1985) and to be more likely to sire offspring than small males in eastern mosquitofish (Booksmythe et al., 2016). It has also been observed that female presence can incite aggressive behaviour among eastern mosquitofish males and that larger males were more likely to be aggressive and dominant (Itzkowitz, 1971).

There is also evidence that Gambusia females may still exercise some control via pre- and postcopulatory female choice (Bisazza, 1993). At the precopulatory level, eastern and western mosquitofish females have been found to preferentially associate with large males (Chen et al., 2018; Hughes, 1985; McPeek, 1992). At the postcopulatory level, Gambusia females can store sperm for months, and a single brood can have multiple paternity (Constanz, 1989; Head et al., 2017; Zane et al., 1999), suggesting that sperm competition is likely intense. Larger males have been found to produce more sperm in a number of poeciliid species, including eastern mosquitofish (Locatello et al., 2008; O'Dea et al., 2014; Vega-Trejo et al., 2019). However, Head, Vega-Trejo, et al. (2015) found evidence of nonlinear selection on male sperm count in eastern mosquitofish, where males with an intermediate sperm count were more successful at insemination than those with higher or lower sperm counts. Furthermore, sperm quality might trade-off with sperm quantity (Head et al., 2007). Sperm quality traits such as longevity, viability, morphology and velocity influence fertilization success under sperm competition in many species (Birkhead & Pizzari, 2002; Boschetto et al., 2011; Garcı́a-González & Simmons, 2005). Although body size may be negatively correlated with sperm quality due to trade-offs between body growth/maintenance and sperm quality (Evans et al., 2003; Locatello et al., 2008), the relationship between male size and sperm quality in Gambusia is unclear (Locatello et al., 2008; Vega-Trejo et al., 2019).

Several environmental factors have been suggested to mediate the body size-fitness relationship in Gambusia, leading to context-dependency. The operational sex ratio (i.e. the ratio of sexually receptive males to females) is often proposed as an important factor mediating sexual selection across species by altering the opportunity for selection (Emlen & Oring, 1977; Kvarnemo & Ahnesjö, 1996; but see Klug et al., 2010; Jennions et al., 2012; meta-analysis: Rios Moura & Peixoto, 2013). In coercive mating systems, male-biased operational sex ratios can be particularly costly to males and lead to increased opportunity for selection on male traits (Cureton et al., 2010). For instance, more male-biased ratios resulted in elevated male–male interference (e.g. chasing) and reduced number of gonopodial thrusts in western mosquitofish (Smith & Sargent, 2006). Furthermore, male-biased ratios have been suggested both to benefit large males (Bisazza & Marin, 1995) and to play no clear role in the relationship between male body size and reproductive success in eastern mosquitofish (Head et al., 2017).

In sum, there is conflicting evidence for male body size selection in Gambusia. Frequency-dependent selection may maintain male body size polymorphism (Pilastro et al., 1997). Nonetheless, environmental and ecological factors such as population density, sex ratio, habitat complexity, photoperiod and temperature are at play and could exert different selective pressures, leading to context-dependency. Here, we performed a systematic review and meta-analysis combining published and unpublished effect sizes to test whether (and how) there is sexual selection on male body size in Gambusia and to understand the sources of heterogeneity. Our hypotheses and predictions, which we pre-registered prior to data collection (Kim et al., 2019), were as follows:
  1. Since most copulations in Gambusia seemingly involve forcible inseminations that bypass female cooperation and small males seem to be more successful at it, we expect that overall, small males show higher reproductive performance than large males. Thus, we predict that male size and reproductive performance are negatively correlated across studies, but we expect this overall effect to be small and uncertain with high heterogeneity in effect sizes.
  2. We expect the association between male size and reproductive performance to be context-dependent. Specifically, we predict a positive correlation when (a) females can choose between males without physical interaction (e.g. in dichotomous female mate choice test); (b) experimental density is low, allowing large males to physically dominate small males; (c) habitat complexity is high, allowing females to avoid or reduce sexual harassment, and thus to be preferentially choosy and (d) sex ratio is male-biased, which leads to increased male–male competition. Regarding postcopulatory selection, we predict (e) a negative correlation between male size and sperm quality due to a trade-off between growth and reproductive allocation, but (f) a positive correlation between male size and sperm quantity.
  3. Since we expect that female reproductive potential plays a role in male reproductive behaviour, (a) we predict larger effect sizes when females are either virgin or postpartum than when they are gravid. Additionally, we expect the association between male size and reproductive performance to be strengthened by male reproductive motivation. Therefore, (b) we predict larger effect sizes when males are kept separated from females prior to the experiment than when they are kept with females. Last, since the mating system is similar across Gambusia species, (c) we do not predict large differences among species.


2.1 Protocol

The study protocol was pre-registered on the Open Science Framework prior to data collection (Kim et al., 2019). The pre-registration specified our a priori hypotheses, search methods, and confirmatory and exploratory analysis plan. Unless stated otherwise, we adhered to these plans. The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) is a minimum set of items designed to help authors report systematic reviews and meta-analyses in a transparent manner, which we followed where relevant (Moher et al., 2009; Figure S2.3). All data processing, analysis and presentation were conducted using R v.3.6.3 (R Core Team, 2020).

2.2 Information sources and search

We performed a systematic literature search to find published studies in English from all years. Three blocks of search keywords were designed to search for the genus (i.e. Gambusia), the predictor (i.e. body size estimates) and the response of interest (i.e. proxies for fitness and reproductive performance) in titles, abstracts and keywords. Searches were conducted on 21st January 2019. See Supporting Information S1 for full details about the search.

2.3 Study selection and eligibility criteria

Our searches on Web of Science Core Collection and PubMed yielded 278 and 97 records, respectively, which were combined and deduplicated using the R package revtool v.0.3.0 (Westgate, 2018). The titles and abstracts of 310 unique records were screened using Rayyan (Ouzzani et al., 2016). In all, 90 records passed the title-and-abstract screening and were subjected to full-text screening. Full-text records varied in their specific research questions, but studies were included as long as they fulfilled the criteria of measuring male size (standard length, total length and body mass) and any measure of reproductive performance (see below) for any species in the genus Gambusia (see decision trees in Figure S2.1 and S2.2; more information below). Full-text screening identified 55 studies meeting our inclusion criteria (PRISMA diagram in Figure S2.3). All titles, abstracts and full texts were double-screened to reduce potential individual biases, with the primary screener (BK) screening all records and secondary screeners (NPM and AST) each independently screening 50%. Conflicting decisions were collectively discussed and resolved.

Studies where animals were exposed to environmental pollutants and/or pharmaceuticals (e.g. endocrine disrupting chemicals such as fluoxetine) were excluded because even very low levels of exposure can affect morphology and reproductive behaviour (Saaristo et al., 2013); however, data from non-exposed control groups from those studies were included, if available. Studies where male fish were size-matched in trials were excluded because potential effects of male body size were effectively eliminated, whereas studies testing non-size-related hypotheses were included as long as males were not size-matched.

Four categories of outcome measures were considered measures of male reproductive performance: female choice, mating success, sperm characteristics (quantity and quality) and paternity (number of offspring sired). In some cases, female choice was measured as the number of approaches made towards males or the number of arching displays by females (n = 3 studies, k = 12 effects), but the predominant female choice measure was association time in dichotomous mate choice tests (n = 13, k = 31). Female association preferences have been shown to be indicative of the likelihood of reproducing with preferred males in a poeciliid (Walling et al., 2010). Likewise, the number of mating attempts (gonopodial thrusts), the predominant measure of male mating success, has been shown to be a good predictor of successful copulation (Bisazza, 1993) and paternity (Deaton, 2008) in mosquitofish. Outcome measures not considered as measures of male reproductive performance and excluded were male mate choice, male aggressive behaviour and male gonadal size or mass.

2.4 Data collection and extraction

One observer (BK) performed all data extraction, and secondary observers (NPM and AST) each independently extracted data from 27% (n = 15, 54% total) of records to verify extraction and enhance reproducibility. Summary data were extracted from text, tables or figures in published articles, and the R package metaDigitise v.1.0.1 (Pick et al., 2019) was used to extract data from figures. Primary (raw) data were obtained directly from authors and from published (open) datasets, including datasets that, although they contained our variables of interest (i.e. reproductive performance and male body size), had not been used to test the relationship between reproductive performance and male body size. Complete data extraction from published material was possible for 18 studies, and partial extraction from seven additional studies. Requests for missing or partially reported data were sent to 24 authors of 37 studies via a standardized e-mail template, from which we obtained data for 11 studies (from nine authors). Six authors communicated that data were lost, and the remaining nine did not reply. During author correspondence, it was revealed that Head, Vega-Trejo, et al. (2015) re-analysed a subset of data from another study (Head et al., 2015), so the former was excluded from our analyses.

2.5 Extracted variables

Information was extracted regarding the study (publication year, journal and author information), study subject (species, collection site, fish considered native or invasive at the collection site, wild or laboratory born and female reproductive status), laboratory maintenance conditions (fish kept with/without the opposite sex, temperature and photoperiod), experimental condition (dimension of experimental aquarium, number of female and male fish within experimental trials, presence/absence of physical interaction among experimental fish and habitat complexity) and type/unit of experimental variable. The type of male body size trait (standard length, total length and body mass) and the type of reproductive performance measure were also recorded. The complete lists of continuous and categorical moderators are shown in Table S3.1 and Table S3.2.

2.6 Effect size calculation

We extracted all necessary statistical information to quantify the association between male size and reproductive performance using Pearson's correlation coefficients (hereafter r). Following Jacobs and Viechtbauer (2017), mean differences between small and large fish in studies that compared male size categories (e.g. dichotomous female choice trials) were transformed to biserial correlations using the function ‘escalc’ from the R package metafor v.2.4-0 (Viechtbauer, 2010). Biserial correlations are conceptually equivalent and directly comparable to r (Jacobs & Viechtbauer, 2017). Note that meta-analyses involving both Pearson's and biserial correlation coefficients need to be based on the raw coefficients, which is why we did not use Fisher's r-to-z transformation (Jacobs & Viechtbauer, 2017). When there were more than two male size groups, we specified in the pre-registration that all pairwise correlations would be calculated; however, this was not a common issue in our dataset (i.e. only two such designs), so instead, only data from the smallest and the largest groups were extracted to calculate the biserial correlation.

Where more than one effect size could be calculated from the same data due to the reporting of multiple statistical outputs, we chose one using the following order of preference: (a) r; (b) other correlation coefficients (e.g. Spearman's rho); (c) mean differences between small and large males (used to calculate biserial correlations as above); (d) R2 from simple or multiple regression and (e) inferential statistics (e.g. t-value, F-value). This order of preference was chosen to minimize the number of inferential steps (and thus of noise) required to transform the reported statistical outputs to our main effect size of interest (i.e. r). Effect sizes other than r and biserial correlations were converted into r using the equations provided in Lajeunesse (2013) and Nakagawa and Cuthill (2007; see Table S4). Sampling variances of r (Vr) were calculated as (1 − r2)2/(n − 1) (Borenstein et al., 2009), and those of biserial correlations were calculated using the function ‘escalc’ from the R package metafor v.2.4-0 (Viechtbauer, 2010). The sample size of each effect size reflected the number of replicates rather than the number of males. These two numbers were the same except for dichotomous mate choice trials, in which one female chose between two males, and we assigned the number of females as the sample size rather than the number of males to avoid artificially inflating sample size. Effect sizes were coded so that a negative effect size denoted a negative correlation between male size and reproductive performance, and vice versa.

2.7 Main effect model

A multilevel intercept-only meta-analytic model was fitted to estimate the overall effect size (i.e. meta-analytic mean) for the association between male size and reproductive performance using the R package metafor v.2.4-0 (Viechtbauer, 2010). Estimates (i.e. means) are presented with their 95% confidence intervals (CI) in square brackets throughout. Furthermore, we estimated 95% prediction intervals (PI), which incorporate heterogeneity (IntHout et al., 2016). Whereas confidence intervals show the range in which the overall effect is likely to be found, prediction intervals estimate the likely range in which 95% of effects are expected to occur in similar future (or unknown) studies (IntHout et al., 2016).

All models, including the meta-regressions (see below), included the following random effects: (a) study ID, which encompasses effect sizes extracted from the same study, (b) group ID, which encompasses effect sizes obtained from the same group of fish, (c) experiment ID, which encompasses effect sizes derived from the same experiment and (d) effect ID, which represents residual/within-study variance. Our models included one more random effect (i.e. group ID) than planned in our pre-registration, but this was considered necessary to account for this source of non-independence among effect sizes. We ran two additional sensitivity analyses that showed very similar results: (a) an analysis fitting sampling variances as a variance–covariance matrix assuming a 0.5 correlation between sampling variances from the same experiment ID (Supporting Information S9) and (b) an analysis that included an extra random effect (lab ID) to partition among-laboratory heterogeneity (S10).

For the intercept-only meta-analytic model, we calculated Cochran's Q and urn:x-wiley:00218790:media:jane13554:jane13554-math-0001 (Higgins & Thompson, 2002) and the equivalent for each random effect, as measures of absolute and relative heterogeneity, respectively. Heterogeneity refers to the unexplained variation among effect sizes after accounting for sampling variance.

2.8 Meta-regressions for testing hypotheses

We fitted multilevel meta-regressions to investigate potential effects of moderators on the relationship between male size and reproductive performance. To test whether physical interaction among individual fish affected the results (Hypothesis 2a), we fitted a meta-regression including the moderator ‘physical interaction’ (levels: yes, no) for the subset of studies in which female choice was measured. For experiments where fish could physically interact, we fitted a meta-regression including the following moderators: experimental density (i.e. total number of fish in the trial divided by the aquarium volume (L); Hypothesis 2b), habitat complexity (levels: low, high; Hypothesis 2c) and male-to-female ratio (Hypothesis 2d) as well as the interaction between experimental density and habitat complexity, and the interaction between male-to-female ratio and habitat complexity. Since the latter two meta-regressions tested hypotheses related to precopulatory mechanisms, they did not include effect sizes on sperm quantity nor quality. For the subset of studies that measured sperm quantity and/or quality, we fitted a meta-regression including the type of sperm characteristic as a moderator (levels: quantity, quality; Hypotheses 2e and 2f).

Due to limited reporting on female reproductive status and male housing conditions in the literature, we deviated from our pre-registration for hypotheses 3a and 3b (details in Supporting Information S8). Instead, to test for effects of female reproductive status (Hypothesis 3a), we fitted a meta-regression with four levels of female status (virgin, gravid, male-deprived and non-deprived). To test for male housing condition effects (Hypothesis 3b), we fitted a meta-regression including a moderator with two levels (mixed-sex: kept with females, same-sex: kept separated from females). Last, we fitted a meta-regression including the moderator ‘species’ (levels: G. affinis, G. geiseri and G. holbrooki) to test whether effects differed among species (Hypothesis 3c).

2.9 Meta-regressions for exploratory analyses

Five additional pre-registered exploratory meta-regressions were performed to test hypotheses related to methodological design, but for which no specific direction was predicted (Kim et al., 2019). We tested whether results differed: (a) depending on the type of male size proxy used (levels: standard length, total length and body mass); (b) between native and invasive populations (levels: native and invasive); (c) depending on the fish's rearing environment (levels: wild and laboratory); (d) depending on temperature (°C) and photoperiod (i.e. number of daylight hours per day); and (e) depending on the type of outcome variable (i.e. reproductive performance measure; levels: female choice, mating success, sperm quality, sperm quantity and paternity).

For all meta-regressions, we estimated the percentage of heterogeneity explained by the moderators using urn:x-wiley:00218790:media:jane13554:jane13554-math-0002 (Nakagawa & Schielzeth, 2013). Missing and unreported data were not included in the meta-regressions (i.e. we ran complete-case analyses). Continuous and categorical moderators involved in interactions terms (e.g. habitat complexity) were mean-centred to aid interpretation (Schielzeth, 2010). Results of the main effect model and meta-regressions with categorical moderators were graphically represented as orchard plots using the R package orchaRd v. (Nakagawa et al., 2021). Meta-regressions with continuous moderators were plotted with the R package ggplot2 v.3.3.2 (Wickham, 2016).

2.10 Publication bias tests

To test for small-study bias, we fitted a multilevel meta-regression with sample size as the moderator (Nakagawa & Santos, 2012). Likewise, to test for time-lag bias (i.e. decline effects) in the published literature (Jennions & Møller, 2002; Koricheva & Kulinskaya, 2019), we fitted a multilevel meta-regression including the year of publication as a moderator in the subset of effect sizes categorized as ‘published’ (Sánchez-Tójar et al., 2018). Furthermore, the source of data was included as a moderator (levels: published and unpublished) in a meta-regression to test whether effect sizes were larger in published than unpublished effects (Moran et al., 2020; Sánchez-Tójar et al., 2018). We categorized supplementary material (i.e. open datasets) as ‘unpublished’ whenever the specific research question/hypothesis of the study did not involve male size per se, but male size was nevertheless measured and provided. We did not expect to find publication bias regarding male body size in these effects. Additionally, whether results were reported completely or incompletely (e.g. missing effect sizes, relationships reported as simply ‘non-significant’, etc.) was included as a moderator (levels: complete and incomplete) in a meta-regression to test whether effect sizes were larger in studies that incompletely reported results. Last, we originally intended to test whether data collected by observers blind to male size led to smaller effect sizes than data collected by observers not blind to male size (see Holman et al., 2015), but we did not encounter any study using blind data collection regarding male size, so this pre-registered hypothesis was not tested.


Overall, 348 effect sizes were obtained from 36 studies including 179 groups of fish tested in 216 experiments (4,514 male fish in total). Median and mean sample sizes per effect size were 16 and 35, respectively (range: 3–294; only three data points had a sample size of three). Data were available only for three species: G. affinis (n = 7 studies, k = 29 effects), G. geiseri (n = 1, k = 5) and G. holbrooki (n = 29, k = 314; map of collection sites shown in Figure S5.1).

3.1 Main effect model (Hypothesis 1)

Contrary to our hypothesis, the intercept-only model revealed a positive association between male size and reproductive performance (r = 0.23 [0.10–0.35], 95% PI = −0.69 to 1.15, p < 0.001, n = 36, k = 348; Figure 1). That is, our meta-analysis suggests that there is positive selection on male size in Gambusia. Nonetheless, absolute (Q = 5,484, p < 0.001) and relative heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0003 = 92.2% [85.3–95.7]) were high. When urn:x-wiley:00218790:media:jane13554:jane13554-math-0004 was partitioned, 33.0% [23.7–41.2] was attributed to study ID, 53.1% [40.8–60.9] to group ID, 6.2% [0.8–11.9] to experiment ID and 0.0% [0.0–1.8] to effect ID.

Details are in the caption following the image
Male size appears positively selected across included effects. Orchard plot of the meta-analytic model, showing the meta-analytic mean, 95% CI (thick whisker), 95% PI (thin whisker) and individual effect sizes scaled by their precision (circles)

3.2 Meta-regressions for testing hypotheses

3.2.1 Physical interaction (Hypothesis 2a)

The size–reproductive performance correlation was positive in both presence (r = 0.18 [0.01–0.35], p = 0.015, n = 19, k = 171) and absence (r = 0.38 [0.16–0.59], p < 0.001, n = 14, k = 37) of physical interaction between males and females during mate choice tests. Effect sizes tended to be larger in absence than in presence, but that difference was not statistically significant (p = 0.105). The moderator explained 2.3% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0005 = 0.023).

3.2.2 Experimental density (Hypothesis 2b), habitat complexity (Hypothesis 2c) and male-to-female ratio (Hypothesis 2d)

For experiments where fish were allowed to physically interact, the size–reproductive performance correlation did not seem to be affected by experimental density, male-to-female ratio or the interaction between those and habitat complexity (Table S6.1). Effect sizes tended to be stronger in more complex habitats, but a subsequent non-pre-registered meta-regression including habitat complexity as the only moderator showed that the difference between low (r = 0.10 [−0.11 to 0.30], p = 0.354, n = 15, k = 144) and high habitat complexity (r = 0.23 [−0.05 to 0.50], p = 0.115; n = 6, k = 27) was not statistically significant (p = 0.383; urn:x-wiley:00218790:media:jane13554:jane13554-math-0006 = 0.008). In contrast, an additional non-pre-registered meta-regression that included male-to-female ratio as the only moderator showed that, as predicted, the more male-biased the population, the better reproductive performance of large males (intercept = 0.14 [−0.05 to 0.33], p = 0.137; slope = 0.13 [0.02 to 0.25], p = 0.022; n = 19, k = 171; urn:x-wiley:00218790:media:jane13554:jane13554-math-0007 = 0.104; Figure 2). Since the latter two meta-regressions were not pre-registered, the results should be interpreted cautiously.

Details are in the caption following the image
Large males showed greater reproductive performance in more male-biased populations. The solid line represents the model estimate, shading represents the 95% CI and individual effect sizes are scaled by their precision

3.2.3 Sperm quantity and quality (Hypotheses 2e and 2f)

Male size and sperm quantity were positively correlated (r = 0.17 [0.09–0.24], p < 0.001, n = 10, k = 74), while the estimate for sperm quality was small and its 95% CI overlapped zero (r = 0.04 [−0.04 to 0.12], p = 0.316, n = 8, k = 66). Indeed, the difference between quantity and quality was statistically significant (p < 0.001; Figure 3a), and the type of sperm characteristic as a moderator explained 8.8% of the heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0008 = 0.088).

Details are in the caption following the image
Orchard plots showing that (a) male size was positively correlated with sperm quantity but not quality; (b) female reproductive status did not strongly influence the correlation; (c) the correlation did not differ substantially across Gambusia species; (d) the correlation was generally positive across male reproductive performance measures in Gambusia species. Note that, although paternity contains more effect sizes than the other levels, only four studies measured paternity. Plots show means, 95% CI (thick whisker), 95% PI (thin whisker) and individual effect sizes scaled by their precision (circles)

3.2.4 Female reproductive status (Hypothesis 3a)

The size–reproductive performance correlation was positive in all four levels of female reproductive status, but the 95% CIs overlapped zero in virgin (r = 0.18 [−0.07 to 0.44], p = 0.160, n = 7, k = 84) and non-deprived females (r = 0.15 [−0.22 to 0.52], p = 0.414, n = 3, k = 10) while they did not in gravid (r = 0.46 [0.04–0.88], p = 0.031, n = 3, k = 8) and male-deprived females (r = 0.28 [0.03–0.52], p = 0.026, n = 8, k = 31; Figure 3b). Post-hoc Wald tests revealed no statistically significant differences between those four levels of female reproductive status (p > 0.282 in all cases), and the moderator explained 3.0% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0009 = 0.030).

3.2.5 Male housing condition (Hypothesis 3b)

The size–reproductive performance correlation was positive in both mixed-sex (r = 0.38 [0.18–0.57], p < 0.001, n = 10, k = 98) and same-sex housing conditions (r = 0.16 [0.01–0.32], p = 0.038, n = 17, k = 164). Contrary to our hypothesis, effect sizes tended to be larger in mixed-sex than in same-sex conditions (p = 0.091). Male housing conditions explained 5.3% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0010 = 0.053).

3.2.6 Species (Hypothesis 3c)

The size–reproductive performance correlation was positive in all three species, although the 95% CI substantially overlapped zero in G. geiseri (G. affinis: r = 0.31 [0.00–0.62], p = 0.048, n = 7, k = 29; G. geiseri: r = 0.08 [−0.62 to 0.78], p = 0.829, n = 1, k = 5; G. holbrooki: r = 0.22 [0.08–0.35], p = 0.002, n = 29, k = 314). As predicted, the differences across species were not statistically significant (p > 0.515 in all cases; Figure 3c), and the moderator explained only 0.4% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0011 = 0.004).

3.3 Meta-regressions for exploratory analyses

3.3.1 Type of male size proxy

The size–reproductive performance correlation was positive and similar regardless of the type of male size proxy used (p > 0.949 in all cases; urn:x-wiley:00218790:media:jane13554:jane13554-math-0012 = 0.000): standard length (r = 0.22 [0.09–0.35], p < 0.001, n = 32, k = 263), total length (r = 0.23 [0.06–0.39], p = 0.008, n = 4, k = 31) and body mass (r = 0.23 [0.09–0.36], p = 0.001, n = 7, k = 43).

3.3.2 Origin of population

The size–reproductive performance correlation was positive for both invasive (r = 0.21 [0.07–0.36], p = 0.004, n = 27, k = 274) and native populations (r = 0.26 [−0.02 to 0.53], p = 0.069, n = 8, k = 73). That difference was not statistically significant (p = 0.784), and the moderator explained only 0.1% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0013 = 0.001).

3.3.3 Rearing environment

The size–reproductive performance correlation was positive for wild fish (r = 0.27 [0.13–0.41], p < 0.001, n = 28, k = 222), but not statistically significantly so for laboratory-bred fish (r = 0.08 [−0.17 to 0.32], p = 0.551, n = 7, k = 125); however, that difference was not statistically significant (p = 0.181). Rearing environment explained 3.9% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0014 = 0.039).

3.3.4 Temperature and photoperiod

Neither temperature nor photoperiod seemed to strongly influence the size–reproductive performance correlation (intercept = 0.26 [0. 12–0.41], p < 0.001; temperature = −0.03 [−0.10 to 0.04], p = 0.359; photoperiod = 0.11 [−0.02 to 0.24], p = 0. 101; n = 26, k = 250). However, there was a tendency for the correlation to be greater with longer hours of daylight, and both moderators combined explained 5.2% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0015 = 0.052).

3.3.5 Measures of male reproductive performance

The size–reproductive performance correlation was positive regardless of the measure of male reproductive performance. However, it was only statistically significant for female choice (r = 0.43 [0.28–0.59], p < 0.001, n = 14, k = 43), mating success (r = 0.16 [0.01–0.30], p = 0.035, n = 14, k = 50) and sperm quantity (r = 0.19 [0.03–0.36], p = 0.024, n = 10, k = 74), whereas the estimates for paternity (r = 0.12 [−0.14 to 0.38], p = 0.362, n = 4, k = 115) and sperm quality (r = 0.04 [−0.13 to 0.21], p = 0.651, n = 8, k = 66) were not statistically significant (Figure 3d). Post-hoc Wald tests showed that the estimate for female choice was statistically significantly larger than those of the other measures (p < 0.041 in all cases), and the estimate for sperm quantity was statistically significantly larger than that of sperm quality (p < 0.001). The measure of reproductive performance explained 6.3% of heterogeneity (urn:x-wiley:00218790:media:jane13554:jane13554-math-0016 = 0.063).

3.4 Publication bias tests

Overall, we found some evidence of publication bias in the published literature, the influence of which was seemingly ameliorated by our approach of including both published and unpublished effect sizes. Effect sizes tended to become slightly smaller as sample size increased (i.e. small-study effect; intercept = 0.23 [0.11–0.35], p < 0.001; slope = −0.001 [−0.002 to 0.000], p = 0.082; n = 36, k = 348; urn:x-wiley:00218790:media:jane13554:jane13554-math-0017 = 0.010; Figure 4). This small-study effect became prominent when only published effect sizes were considered (Figure S7.1). There was no clear evidence of time-lag bias (i.e. decline effects) in published effect sizes (intercept = 0.32 [0.05 to 0.59], p = 0.017; slope = −0.002 [−0.024 to 0.020], p = 0.834; n = 19, k = 106; urn:x-wiley:00218790:media:jane13554:jane13554-math-0018 = 0.003). However, published effect sizes (r = 0.33 [0.16–0.51], p < 0.001, n = 19, k = 106) tended to be larger than unpublished ones (r = 0.12 [−0.05 to 0.29], p = 0.157, n = 17, k = 242), although not statistically significantly so (p = 0.086; urn:x-wiley:00218790:media:jane13554:jane13554-math-0019 = 0.043; Figure 5). Finally, as expected, studies reporting data incompletely (r = 0.53 [0.12–0.95], p < 0.012, n = 5, k = 29) tended to show larger effect sizes than studies reporting data in full (r = 0.27 [0.02–0.51], p < 0.032, n = 14, k = 77), but that difference was not statistically significant (p = 0.284; urn:x-wiley:00218790:media:jane13554:jane13554-math-0020 = 0.049).

Details are in the caption following the image
Effect sizes became slightly smaller as sample size increased, demonstrating some evidence of small-study effect. The solid line represents the model estimate, shading represents the 95% CI and circles represent individual effect sizes scaled by their precision
Details are in the caption following the image
Published effect sizes tended to be larger than unpublished ones for the correlation between male size and reproductive performance in Gambusia. Orchard plot showing means, 95% CI (thick whisker), 95% PI (thin whisker) and individual effect sizes scaled by their precision (circles)


We found that male size and reproductive performance are positively correlated across studies of Gambusia. Throughout, all mean effect estimates were positive, including the overall effect and the category-specific meta-regression effects, which suggests that evidence for large-male advantage is robust. Positive selection on male size in the face of reversed sexual size dimorphism in Gambusia might seem unexpected, but it should be kept in mind that our study focused on sexual selection on body size. Variation in body size and sexual size dimorphism originates and is maintained by complex interactions between natural and sexual selection, so there could be opposing ecological selection pressures and viability costs that keep males small (Blanckenhorn, 2000). For example, natural selection via ecological niche partitioning between the sexes and small-male advantage in foraging have been associated with reversed sexual size dimorphism in birds of prey (Krüger, 2005). Whether this seemingly directional and positive selection is driving evolution of male body size in Gambusia is also unclear, in part because the heritability of body size appears negligible in the most studied Gambusia species, the eastern mosquitofish (Booksmythe et al., 2016; Vega-Trejo et al., 2018; Zulian et al., 1993). Indeed, environmental effects, including maternal effects, have been found to be important components of male body size in eastern mosquitofish (Vega-Trejo et al., 2018). Furthermore, differential selection on the age/size at maturity (e.g. Carmona-Catot et al., 2011; Hughes, 1985; Reznick et al., 2006) is likely a key component explaining why variation in male body size is commonplace in this genus. The overall positive effect contrasts with our prediction and with earlier studies that found a small-male advantage, mostly when focusing on gonopodial thrusting as a measure of reproductive performance (Bisazza & Marin, 1995). Nonetheless, the high heterogeneity found and consequently wide prediction intervals for our main effect highlights that our results do not preclude a small-male advantage being the ‘true’ effect in certain contexts.

Meta-regressions revealed that the type of reproductive performance measure, the male-to-female ratio and the type of sperm characteristic are important moderators explaining a sizable amount of heterogeneity. The five categories of reproductive performance we used could be associated with different aspects of sexual selection: Female choice is associated with precopulatory intersexual selection, mating success presumably with both male–male competition (intrasexual selection) and precopulatory intersexual selection, sperm quality and quantity with postcopulatory sexual selection, and paternity with overall reproductive success. The category-specific estimates were generally positive, suggesting large males have an advantage at each level of sexual selection. However, there are reasons to interpret this cautiously. The estimate for paternity, arguably the measure closest to fitness in our data, was positive but small and uncertain. The paternity category had the highest number of effect sizes (k = 115) among all five categories, but all of those effect sizes were based on a few males (range: 4–36 males/effect size) and came from only four studies. Furthermore, we expected a negative estimate for the mating success category because, according to the literature, Gambusia shows a coercive mating system where small males outperform large males at gonopodial thrusting (e.g. Bisazza & Marin, 1995; Pilastro et al., 1997). Surprisingly, the estimate was still slightly positive, even though this category included many effect sizes for which individual males were tested singly, which potentially benefitted smaller males due to the absence of competitors. As the number of males tested together increased, larger males generally prevailed and performed more gonopodial thrusting (Figure 2; as in Bisazza & Marin, 1995; Deaton, 2008; Booksmythe et al., 2013). The inconspicuousness and manoeuvrability that give smaller males an edge in gonopodial thrusting (Bisazza & Marin, 1995; Pilastro et al., 1997) may be eclipsed by larger males’ competitive dominance, and thus, this category may have underestimated the influence of male–male competition.

As predicted, the association between male size and sperm quantity was positive while the relationship between male size and sperm quality was virtually non-existent. The latter finding contrasted with our prediction for a trade-off between sperm quality and male size/growth. It is possible that sperm competition in this genus is so intense irrespective of male size that no clear association exists between male size and sperm quality (Zane et al., 1999). Moreover, Gambusia males may facultatively adjust how much sperm they spend depending on the perceived sperm competition risk instead of altering the quality of their ejaculate (Evans et al., 2003). Future studies are needed to understand the role and mechanism of sperm competition and to disentangle the effect of male size, sperm quantity and sperm quality, especially since internal fertilization and livebearing make poeciliids an ideal model organism for studying sperm competition.

The female choice category showed a greater estimate than the other categories, challenging us to rethink the role that female choice may play in Gambusia and also the way female choice is measured in the laboratory. Of 13 studies that investigated female choice, 11 confirmed female preference for large males, so it is possible that there is a latent female preference whose expression is hindered in the wild but is detectable in the artificial settings of dichotomous mate choice tests. However, it is unclear whether the female association preference represents a preference to reproduce with large males. In the laboratory, eastern mosquitofish females were shown to aggregate with other females to dilute the costs of excessive male sexual harassment such as increased predation risk and reduced foraging efficiency (Dadda et al., 2005). Similarly, females associated with a larger male when a harassing male was present, which may be a strategy to curtail harassment via the larger male monopolizing access to the female and fending off smaller males (Dadda et al., 2005; Searcy, 1982). In nature, eastern mosquitofish females tended to shoal with similar-sized females (Bisazza & Marin, 1995), so female preference for large males may also be a by-product of female schooling behaviour. Future studies on the role of female choice in Gambusia should consider the effect of this gregarious tendency in females.

Female choice was mostly measured in dichotomous mate choice tests with no physical interaction between the sexes, which does not reflect the ecological reality of male–female interactions. Instead, researchers could make use of recent advances in tracking technology to study female choice in this group (e.g. Pérez-Escudero et al., 2014; Sridhar et al., 2019). Our analyses revealed a larger effect in the absence than in the presence of physical interaction, so it is possible that female preference for large males was somewhat artificially inflated. When experimental fish did freely interact, experimental density, male-to-female ratio and the interaction between these and habitat complexity explained a substantial percentage of heterogeneity. When considered singly, male-to-female ratio had a positive effect on the relationship between male size and reproductive performance, explaining the second greatest amount of heterogeneity in this meta-analysis (10.4%). That is, our results suggest that male size is a stronger predictor of reproductive performance when male–male competition is high. It should be kept in mind that separating the effects of male-to-female ratio from the effects of male and female density is difficult; for example, male and female density under varying sex ratios was shown to exert different influence on patterns of male behaviour change in western mosquitofish (Smith, 2007).

Some of the limitations of our meta-analysis reside in the experimental conditions of the included studies. First, all included studies were conducted in the laboratory where Gambusia mating behaviour was often measured in unrealistically low-complexity settings, making it difficult to draw connections between the results of our meta-analysis and reproductive dynamics in natural populations. Furthermore, even the ‘high complexity’ category in our meta-analysis (small rocks and/or natural or artificial plants) did not reflect the true complexity of natural habitats and was heavily underrepresented (k = 27 effect sizes), which could explain the lack of a clear statistical effect in our meta-regression. Visual field observations revealed that male chases of females in western mosquitofish mostly came to a halt when the chased female dashed into dense vegetation in shallow water (Martin, 1975). Thus, it is likely that females use vegetation to escape from, and selectively not escape from, males, and this aspect of Gambusia mating behaviour was largely overlooked. In addition, most trials were conducted at 28℃ with 14 hr of light period, which does not reflect the natural variation since Gambusia can occupy icy lakes and ponds as well as hot springs and thermally elevated lakes reaching 42–44℃ (Meffe & Snelson Jr., 1989). Importantly, eastern mosquitofish males have been observed to reproduce across the entire test temperature range of 14–38℃ in laboratory (Wilson, 2005). Since temperature and photoperiod are generally regarded as the two most vital environmental factors in fish reproductive cycle, how photoperiod and temperature interact to control Gambusia reproduction requires further investigation. Specifically, attention should be paid to seasonal and daily fluctuations, which might have greater influence than the test temperature and photoperiod.

Female reproductive status is another important factor to consider when studying Gambusia mating behaviour. Although females try to thwart male copulatory attempts at all stages of their reproductive cycle (Bisazza & Marin, 1995), mosquitofish females have been suggested to more likely associate with males when virgin, postpartum or male-deprived (Bisazza et al., 2001; Hughes, 1985; Pilastro et al., 2003). Thus, we hypothesized larger effect sizes for virgin or postpartum females than for gravid females. Unfortunately, there were insufficient effect sizes to calculate an estimate for postpartum females because many studies excluded postpartum females due to heightened male interest (Constanz, 1989), which was deemed a confounding variable for some research questions. If female receptivity and male interest are at their peak 1–2 days after parturition, future sexual selection studies may benefit from focusing more on postpartum females, not less, which would help avoid a systematic design issue that underestimates the role of female behaviour and mate choice.

Our systematic review and meta-analysis also underscored evidence of publication bias in the published literature. First, our analyses showed some evidence of small-study bias, suggesting that some low-precision studies might still remain unavailable despite our efforts to include both published and unpublished effect sizes. Evidence for small-study bias is often found in meta-analyses in ecology and evolution and needs to be considered when interpreting meta-analytic results (e.g. Parker, 2013; Sánchez-Tójar, Lagisz, et al., 2020; Sánchez-Tójar, Moran, et al., 2020; Wang et al., 2018). The existence of publication bias was further demonstrated since published effect sizes tended to be larger than unpublished effect sizes, and studies reporting data incompletely also tended to show larger effect sizes than studies reporting data in full. Similar patterns have been shown in recent meta-analyses in the field (Moran et al., 2020; Sánchez-Tójar et al., 2018), and we expect these patterns to be more and more commonly uncovered since meta-analysts have started to make use of open primary data (Culina et al., 2018). Despite the evidence of publication bias in the published literature, our approach of combining both published and unpublished effect sizes largely mitigated its effect (Figure S7.1). However, some caution should still be taken when interpreting the results of our meta-analysis.

In sum, our meta-analysis found evidence of positive sexual selection on male body size in Gambusia that was seemingly robust across contexts. We found gaps and limitations in experimental designs used to study Gambusia mating behaviour, which should help guide the necessary future research on this topic, particularly since our meta-analysis revealed a large proportion of unexplained heterogeneity across effect sizes. Our study also identified the need to rethink the role and form of female choice in this genus and how it is measured in the laboratory. Female choice may play a subtle and underestimated part, and association preference for large males for protection could be a means through which females may exert some amount of choice in an ostensibly coercive mating system.


We thank Andrea S. Aspbury, Michael G. Bertram, Isobel Booksmythe, Thea M. Edwards, Megan L. Head, Andrew T. Kahn, Jake M. Martin, Rose E. O'Dea and Regina Vega-Trejo for kindly sharing their raw data. We are also grateful to Michael G. Bertram for commenting on our pre-registration, and to Megan L. Head, two reviewers and two editors for constructive criticism. This research received funding from the German Research Foundation (DFG) as part of the SFB TRR 212 (NC³; project numbers 316099922, 396782608) and the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 836937.


    B.K., N.P.M. and A.S.-T. were involved in conceptualization, methodology, software, formal analysis and investigation; B.K. performed the data curation and project administration; B.K. and N.P.M. performed the visualization; N.P.M. and A.S.-T. performed the supervision and validation; K.R. was involved in funding acquisition; B.K. and A.S.-T. performed the writing—original draft preparation, and B.K., N.P.M., K.R. and A.S.-T. performed writing—review and editing.


    All data and code are available on the Open Science Framework: https://doi.org/10.17605/OSF.IO/2QXT5 (Kim et al., 2021).

      Journal list menu