Count transformation models
Abstract
en
- The effect of explanatory environmental variables on a species' distribution is often assessed using a count regression model. Poisson generalized linear models or negative binomial models are common, but the traditional approach of modelling the mean after log or square root transformation remains popular and in some cases is even advocated.
- We propose a novel framework of linear models for count data. Similar to the traditional approach, the new models apply a transformation to count responses; however, this transformation is estimated from the data and not defined a priori. In contrast to simple least‐squares fitting and in line with Poisson or negative binomial models, the exact discrete likelihood is optimized for parameter estimation and inference. Simple interpretation of effects in the linear predictors is possible.
- Count transformation models provide a new approach to regressing count data in a distribution‐free yet fully parametric fashion, obviating the need to a priori commit to a specific parametric family of distributions or to a specific transformation. The models are a generalization of discrete Weibull models for counts and are thus able to handle over‐ and underdispersion. We demonstrate empirically that the models are more flexible than Poisson or negative binomial models but still maintain interpretability of multiplicative effects. A re‐analysis of deer–vehicle collisions and the results of artificial simulation experiments provide evidence of the practical applicability of the model framework.
- In ecology studies, uncertainties regarding whether and how to transform count data can be resolved in the framework of count transformation models, which were designed to simultaneously estimate an appropriate transformation and the linear effects of environmental variables by maximizing the exact count log‐likelihood. The application of data‐driven transformations allows over‐ and underdispersion to be addressed in a model‐based approach. Models in this class can be compared to Poisson or negative binomial models using the in‐ or out‐of‐sample log‐likelihood. Extensions to nonlinear additive or interaction effects, correlated observations, hurdle‐type models and other more complex situations are possible. A free software implementation is available in the cotram add‐on package to the R system for statistical computing.
Abstrakt
de
- Der Effekt erklärender ökologischer Variablen auf die Verteilung einer Art wird oft durch ein Zähldatenmodell beschrieben. Häufig werden generalisierte lineare Modelle mit Poissonverteilung oder Negativ Binomialmodelle verwendet, jedoch ist der traditionelle Ansatz der Modellierung des Mittelwerts nach einer logarithmischen oder Wurzeltransformation weiterhin beliebt und wird gelegentlich sogar empfohlen.
- Wir schlagen in dieser Arbeit eine neue Klasse linearer Modelle für Zähldatenregression vor. Dabei werden, ähnlich zum traditionellen, Ansatz, Zähldaten transformiert; allerdings wird diese Transformation datengetrieben bestimmt und ist nicht a priori definiert. Im Gegensatz zur Anwendung eines Normalverteilungsmodells nach Transformation, und in Analogie zu Poisson‐ oder Negativ Binomialmodellen, wird die exakte diskrete Likelihood für die Parameterschätzung und Inferenz verwendet. Die geschätzten Regressionskoeffizienten in dieser neuen Modellklasse sind einer einfachen Interpretation zugänglich.
- Transformationsmodelle für Zähldaten ermöglichen einen neuen parametrischen, jedoch nahezu verteilungsfreien, Ansatz, welcher ohne eine a priori festgelegte Verteilung der Zielgröße und ohne datenunabhängig bestimmte Transformation derselben auskommt. Die Modelle sind als Generalisierung von diskreten Weibullmodellen für Zähldaten zu verstehen und berücksichtigen als solche auch Über‐ und Unterdispersion. Wir demonstrieren empirisch, dass die Modelle flexibler als Poisson‐ oder Negativ Binomialmodelle sind und wie Ergebnisse mittels multiplikativer Effekte interpretiert werden können. Eine Zweitanalyse von Wildunfalldaten sowie die Resultate einer Simulationsstudie illustrieren die praktische Relevanz dieses Modellierungsansatzes.
- Unsicherheiten, ob und wie Zähldaten in ökologischen Studien transformiert werden können oder sollten, werden im Rahmen von speziell für diese Fragestellung entwickelten Zähldatentransformationsmodellen aufgelöst. Die Schätzung einer geeigneten Transformationen erlaubt es ebenfalls, Über‐ und Unterdispersion zu modellieren. Modellvergleiche sind über die auf Trainings‐ oder Testdaten evaluierte Likelihood möglich. Erweiterungen zu nichtlinearen oder Interaktionseffekten, korrelierten Beobachtungen, Schwellenwertmodellen oder anderen komplexen Analysen können durchgeführt werden. Eine Implementation der vorgeschlagenen Methode ist im quelloffenen Zusatzpaket cotram zum R‐System für statistisches Rechnen und Grafik verfügbar.
1 INTRODUCTION
Information represented by counts is ubiquitous in ecology. Perhaps the most obvious instance of ecological count data is animal abundances, which are determined either directly, for example by birdwatchers, or indirectly, by the counting of surrogates, for example the number of deer–vehicle collisions as a proxy for roe deer abundance. This information is later converted into models of animal densities or species distributions using statistical models for count data. Distributions of count data are, of course, discrete and right‐skewed, such that tailored statistical models are required for data analysis. Here, we focus on models explaining the impact of explanatory environmental variables x on the distribution of a count response
. In the commonly used Poisson generalized linear model
with log‐link, intercept α and linear predictor
, both the mean
and the variance
of the count response are given by
. Overdispersion, that is, the situation
, is allowed in the more complex negative binomial model
with mean
and potentially larger variance
. For independent observations, the model parameters are obtained by maximizing the discrete log‐likelihood function, in which an observation (y, x) contributes the log‐density
of either the Poisson or the negative binomial distribution.
Before the emergence of these models tailored to the analysis of count data (generalized linear models were introduced by Nelder & Wedderburn, 1972), researchers were restricted to analysing transformations of Y by normal linear regression models. Prominent textbooks at the time (Snedecor & Cochran, 1967; Sokal & Rohlf, 1967) suggested log transformations log(y + 1) or square root transformations
of observed counts y. The application of least‐squares estimators to the log‐transformed counts then leads to the mean
. Implicitly, it is assumed that the variance after transformation
is constant and that errors are normally distributed. Although it is clear that the normal assumption
is incorrect (the count data are still discrete after transformation) and, consequently, that the wrong likelihood is maximized by applying least‐squares to log(y + 1) for parameter estimation and inference, this approach is still broadly used both in practice and in theory (e.g. De Felipe, Sáez‐Gómez, & Camacho, 2019; Dean, Voss, & Draguljić, 2017; Gotelli & Ellison, 2013; Ives, 2015; Mooney et al., 2016). Moreover, other deficits of this approach have been discussed in numerous papers (e.g. O'Hara & Kotze, 2010; St‐Pierre, Shikon, & Schneider, 2018; Warton, 2018; Warton, Lyons, Stoklosa, & Ives, 2016). As a compromise between the two extremes of using rather strict count distribution models (such as the Poisson or negative binomial) and the analysis of transformed counts by normal linear regression models, we suggest a novel class of transformation models for count data that combine the strengths of both approaches. Briefly stated, in the newly proposed method appropriate transformations of counts Y are estimated simultaneously with regression coefficients β from the data by maximizing the correct discrete form of the likelihood in models that ensure the interpretability of a linear predictor
. We describe the theoretical foundations of these novel count regression models in Section 2. Practical aspects of the methodology are demonstrated in Section 3 in a re‐analysis of roe deer activity patterns based on deer–vehicle collision data, followed by an artificial simulation experiment contrasting the performance of Poisson, negative binomial and count transformation models under certain conditions.
2 MATERIALS AND METHODS
The core idea of our count transformation model for describing the impact of explanatory environmental variables x on counts
is the simultaneous estimation of a fully parameterized smooth transformation α(Y) of the discrete response and the regression coefficients in a linear predictor
. The aim of the approach is to model the discrete conditional distribution function
directly.
defined by some cut‐off point k. Assuming a Bernoulli distribution
with success parameter π(x), a binary GLM with link function g is given as

for a baseline configuration
and, in a logistic regression model with g = logit, the regression coefficients β have an interpretation as odds ratios
.
. For this scenario, the binary GLM can be extended to a cumulative model of the form

independent of k. For count data, there is usually no such limit K to max(Y) and thus the number of intercept thresholds αk may become quite large. The main aspect of our count transformation models is a smooth and parsimonious parameterization of the intercept thresholds. To simplify the notation, we note that the mean
has an interpretation as a distribution function giving the probability of observing a count y smaller than or equal to k given the configuration x. Furthermore, each link function g = F−1 corresponds to the quantile function of a specific continuous distribution function F (g = logit and F = g−1 = expit for logistic regression,
for probit regression, etc.). Last, using a negative sign for the linear predictor
ensures that large values of
correspond to large means
, however, in a nonlinear way. For arbitrary cut‐offs y, we introduce the count transformation model as a model for the conditional distribution function
of a count response Y given explanatory variables x, as
(1)
applied to the greatest integer
less than or equal to the cut‐off point y. Hothorn, Möst, and Bühlmann (2018) suggested the parameterization of α in terms of basis functions
and the corresponding parameters ϑ as

There is no deeper meaning behind the parameters ϑ and many basis functions a are possible; we comment on a specific choice later on. The only modification required for count data is to consider this transformation function as a step function with jumps at integers 0, 1, 2, … only. This is achieved in model (1) by the floor function
. The very same approach was suggested by Padellini and Rue (2019) to model quantile functions
of count data instead of the distribution functions we consider here. Figure 1 shows a distribution function
and the corresponding transformation function α, both as discrete step functions (flooring the argument first) and continuously (without doing so). The two versions are identical for integer‐valued arguments. Thus, the transformation function α, and consequently the transformation model (1), are parameterized continuously but evaluated and interpreted discretely. A computationally attractive, low‐dimensional representation of a smooth function in terms of a few basis functions a and corresponding parameters ϑ is therefore the core ingredient of our novel model framework. In addition to the baseline transformation and distribution functions (i.e. for a configuration with
in model (1)), the conditional transformation and distribution function for some configuration
is also depicted. The impact of
on the transformation function is given by a vertical shift but is nonlinear on the scale of the distribution function.

, red) and a corresponding continuous variable (y, blue), both functions coinciding for counts 0, 1, 2, …. The curves are shown both for the baseline configuration
and a configuration
governing a vertical shift on the scale of the transformation function α (right panel) and corresponding change on the scale of the distribution function (left panel)
On a more technical level, the basis a is specified in terms of aBs,P−1, with P‐dimensional basis functions of a Bernstein polynomial (see Farouki, 2012, for an extensive overview) of order P − 1. Specifically, the basis a(y) can be chosen as: aBs,P−1(y) or aBs,P−1(y + 1), or as a Bernstein polynomial on the log‐scale: aBs,P−1(log(y)) or aBs,P−1(log(y + 1)). The choice of a(y) = aBs,P−1(log(y + 1)) is particularly well suited for modelling relatively small counts. We chose Bernstein polynomials purely for computational convenience, as it is easy to differentiate and integrate
with respect to y. Furthermore, for P = 1, the defined basis is equivalent to a linear function of either y, log(y) or log(y + 1) and monotonicity of the transformation function α can be obtained under the constraint
of the parameters
(Hothorn et al., 2018).
Similar to binary GLMs or cumulative models, specific model types arise from the different a priori choices of the inverse link function
. This choice also governs the interpretation of the linear predictor
. The conditional distribution function
for different choices of the link function g = F−1 and any configuration x is given in Table 1, with
denoting the distribution of the baseline configuration
. Note that, with a sufficiently flexible parameterization of the transformation function
, every distribution can be written in this way such that the model is distribution‐free (Hothorn et al., 2018). Bernstein polynomials of sufficiently large order P − 1 can approximate any function on intervals and this theoretical property is practically achieved for relatively small orders P – 1 (see figure 5 in Hothorn, 2020a, for a head‐to‐head comparison with a nonparametric method).
under different link functions g = F−1| Link F−1 | Interpretation of ![]() |
|---|---|
| probit |
|
| logit |
|
| cloglog |
|
| loglog |
|
The parameters β describe a deviation from the baseline distribution FY in terms of the linear predictor
. For a probit link, the linear predictor is the conditional mean of the transformed counts α(Y). This interpretation, except for the fact that the intercept is now understood as being part of the transformation function α, is the same as in the traditional approach of first transforming the counts and only then estimating the mean using least‐squares. However, the transformation α is not heuristically chosen or defined a priori but estimated from data through parameters ϑ, as explained below. For a logit link,
is the odds ratio comparing the conditional odds
with the baseline odds
. The complementary log‐log (cloglog) link leads to a discrete version of the Cox proportional hazards model, such that
is the hazard ratio comparing the conditional cumulative hazard function log(1 − FY|X=x) with the baseline cumulative hazard function log(1 − FY). The log‐log link leads to the reverse time hazard ratio with multiplicative changes in log(FY). All models in Table 1 are parameterized to relate positive values of
to larger means independent of the specified link g = F−1.
(2)
is the probability that y counts will be observed given that at least y counts were already observed. The model is equivalent to

gives the multiplicative change in discrete hazards.
The Cox proportional hazards model with a simplified transformation function α(y) = ϑ1 + ϑ2 log(y + 1) specifies a discrete form of a Weibull model (introduced by Nakagawa & Osaki, 1975) that Peluso, Vinciotti, and Yu (2019) recently discussed as an extension to other count regression models and that serves as a more flexible approach for both over‐ and underdispersed data. The discrete Weibull model is a special form of our Cox count transformation model (2), as the former features a linear basis function a with P = 2 parameters defined by a Bernstein polynomial of order one. Thus, model (2) can be understood as a generalization moving away from the low‐parametric discrete Weibull distribution while maintaining both the interpretability of the effects as log‐hazard ratios and the ability to handle over‐ and underdispersion.


shall be assessed (see section 6 of Hothorn, 2020a). The likelihood highlights an important connection to a recently proposed approach to multivariate models (Clark, Nemergut, Seyednasrollah, Turner, & Zhang, 2017), where the main challenge is to make multiple response variables measured at different scales comparable. Latent continuous variables are used to model discrete responses by means of appropriate censoring. For the univariate case, considered here, our likelihood is equivalent to censoring a latent continuous variable at integers 0, 1, 2, …. Different choices of the link function g define the latent variable's distribution, for example, for a probit model with
a latent normal distribution is assumed.
3 RESULTS
In our empirical evaluation of the proposed count transformation models, we demonstrate practical aspects of the model framework in Section 3.1, by re‐analysing data on deer–vehicle collisions, and examine their properties in the context of conventional count regression models, assuming either a conditional Poisson or a negative binomial distribution. In Section 3.2, we use simulated count data to evaluate the robustness of count transformation models under model misspecification.
3.1 Analysis of deer–vehicle collision data
In the following, we re‐analyse a time series of 341,655 deer–vehicle collisions (DVCs) involving roe deer Capreolus capreolus that were documented over a period of 10 years in Bavaria, Germany. The data were originally analysed by Hothorn, Müller, Held, Möst, and Mysterud (2015) with the aim of describing temporal patterns in roe deer activity. The observational units are based on 48 × 3,652 half‐hour intervals between 2002‐01‐01 00:00 UTC + 1 and 2011‐12‐31 24:00 UTC + 1. For each of these 175,296 intervals, we computed the number of DVCs based on the information in the police records reported in Bavaria, south‐eastern Germany (48°46′N 11°25′E, 70,500 km2). The raw data and a detailed description of their analysis are available in the original study.
including the following effects:
- annual effect: the difference between each year and the first year, 2002;
- weekly effect: the difference between each day of the week (Sundays and bank holidays are treated the same) and Mondays;
- diurnal effect: the difference between time of day in the intervals
- ‘Night (am)’: [00:00, sunrise – 2 hr),
- ‘Pre‐sunrise’: [sunrise – 2 hr, sunrise),
- ‘Post‐sunrise’: [sunrise, sunrise + 2 hr),
- ‘Day (am)’: [sunrise + 2 hr, 12:00),
- ‘Day (pm)’: [12:00, sunset – 2 hr),
- ‘Pre‐sunset’: [sunset – 2 hr, sunset),
- ‘Post‐sunset’: [sunset, sunset + 2 hr) and
- ‘Night (pm)’: [sunset + 2 hr, 24:00)
- and the baseline category ‘Day (am)’;
- weekly/diurnal effect: an interaction of the weekly and diurnal effects;
- seasonal effect: an interaction of the diurnal effect with a smooth seasonal component s(d).
The time of day‐specific seasonal components s(d) were modelled as a superposition of sinusoidal waves of different frequencies based on Held and Paul (2012). We applied a Poisson generalized linear model with a log link, a negative binomial model with a log link and a discrete Cox count transformation model (2) with P = 7 parameters ϑ of a Bernstein polynomial. The latter two models allow for possible overdispersion. The three models were fitted to the data of the first 8 years (2002–2009) and evaluated based on the data from the remaining two years, 2010 and 2011.
For each model we computed the estimated multiplicative seasonal changes in risk depending on the time of day relative to baseline on January 1, including 95% simultaneous confidence bands. We interpreted ‘risk’ as a multiplicative change to baseline with respect to either the conditional mean (‘expectation ratio’; Poisson and negative binomial models) or the conditional discrete hazard function (‘hazard ratio’) for the Cox count transformation model (2).
The results in Figure 2 show a rather strong agreement between the three models with respect to the estimated risk (expectation ratio or hazard ratio). However, the uncertainty, assessed by the 95% confidence bands, was smaller in the Poisson model. The negative binomial and the Cox count transformation model (2) agree on the effects and the associated variability, with the possible exception of the risk at daylight (Day, am).

To assess the performance of the three count regression models, we computed the out‐of‐sample log‐likelihoods of each model based on the data of the validation sample (year 2010 and 2011). The out‐of‐sample log‐likelihood of the Cox count transformation model (2) with a value of –58,164.47 was the largest across the three count regression models. The Poisson model, with an out‐of‐sample log‐likelihood of –67,192.75, was the most inconsistent with the data. Allowing for possible overdispersion by the negative binomial model increased the out‐of‐sample log‐likelihood to –58,234.72, which was closer to but did not match the out‐of‐sample log‐likelihood of model (2). Practically, the count transformation model performed as good as the negative binomial model; however, the necessity to choose a specific parametric distribution was present in the latter model only, owing to the distribution‐free nature of the former.
We further compared the three different models in terms of their conditional distribution functions for four selected time intervals of the year 2009. The discrete conditional distribution functions of the models, evaluated for all integers between 0 and 38, are given in Figure 3. The conditional medians obtained from all three models are rather close, but the variability assessed by the Poisson model is much smaller than that associated with the negative binomial and count transformation models, thus indicating overdispersion.

In addition to confidence statements about regression coefficients β or functions thereof as in Figure 2, it is also possible to obtain confidence bands for transformation functions α or for the conditional distribution function itself. Figure 4 presents the estimated conditional distribution for a specific half‐hour interval, along with a confidence band calculated from the joint asymptotic normal distribution of the estimated ϑ parameters.

3.2 Artificial count‐data‐generating processes
We investigated the performance of the different regression models in a simulation experiment based on count data from various underlying data‐generating processes (DGPs). Count responses Y were generated conditionally on a numeric explanatory environmental variable
following a Poisson or negative binomial distribution or one of the discrete distributions underlying the four count transformation models corresponding to the four link functions from Table 1. For the Poisson model, the mean and variance were assumed to be
. The negative binomial data were chosen to be moderately overdispersed, with
and
. The four data‐generating processes arising from the count transformation models were specified by the different link functions in Table 1, a Bernstein polynomial aBs,6(log(y + 1)) and a regression coefficient β1 = 0.8. The conditional distribution functions defined by each of the six data‐generating processes are visualized in Figure 5 and reveal different functional forms.

We repeated the simulation experiment for each data‐generating process 100 times, with learning and validation sample sizes of 250 and 750 observations, respectively. The centred out‐of‐sample log‐likelihoods, contrasting the model fit, were computed by the differences between the out‐of‐sample log‐likelihoods of the models and the out‐of‐sample log‐likelihoods of the true data‐generating processes.
The results as given in Figure 6 follow a clear pattern. When misspecified, the model fit of the Poisson model is inferior to that of all other models. As expected, the negative binomial model fits both the data arising from the Poisson distribution (limiting case of the negative binomial distribution with ν → ∞) and the moderately overdispersed data well. However, it lacks robustness for more complex data‐generating processes, such as the underlying mechanisms specified by a count transformation model. The fit of the count transformation models is satisfactory across all DGPs, albeit with some differences within the model framework.

4 DISCUSSION
Motivated by the challenges posed by the statistical analysis of ecological count data, such as discreteness, skewness, overdispersion, restrictive distributional assumptions, appropriate choice of a priori transformations, or interpretability, we present a novel framework of count transformation models that provide a unified approach tailored to the analysis of count responses. The models, as outlined in Section 2, offer a diverse set of parameter interpretations and can be specified, estimated and evaluated in a simple but flexible maximum likelihood framework. The direct modelling of the conditional discrete distribution, while preserving the interpretability of the linear predictor
, is a key feature of our count transformation model. Furthermore, it eliminates the need to impose restrictive distributional assumptions (such as the Poisson), to choose transformations (such as a log or square root) in a data‐free manner or to rely on rough continuous approximations of the exact discrete likelihood, for example when applying a normal linear model to log‐transformed counts. The models are flexible enough to handle different dispersion levels adaptively, without being restricted to either over‐ or underdispersion. Our results from the re‐analysis of deer–vehicle collision data, presented in Section 3.1, demonstrate the distribution‐free nature of count transformations in practice. They are especially compelling for the analysis of count responses arising from more complex data‐generating processes. Moreover, conditional quantiles can be easily extracted from the fitted model by numerical inversion of the smooth conditional distribution function
. An additional advantage of count transformation models is that the model framework allows researchers to flexibly choose the scale of the interpretation of the linear predictor
by specifying a link function g = F−1 from Table 1.
The models can be easily tailored to the experimental design using stratum‐ or site‐specific transformation functions
or response‐varying effects
. Correlated observations arising from clustered data either require the inclusion of random effects with subsequent application of a Laplace approximation to the likelihood or a marginal approach (Hothorn, 2019). Accounting for varying observation times or batch sizes is straightforward by the inclusion of an offset in the model specification. Random censoring is easy to incorporate in the likelihood (Hothorn et al., 2018), which can then appropriately handle uncertain recordings (for example, the observation ‘more than three roe‐deer vehicle collisions in half an hour’ corresponds to right‐censoring at three). The same applies to truncation. By contrast, hurdle‐like transformation models require modifications of the basis functions as well as interactions between the response and explanatory variables (see section 4.5 in Hothorn et al., 2018).
Extensions to the proposed simple shift count transformation model can be made by boosting algorithms (Hothorn, 2020c) that allow the estimation of conditional transformation models (Hothorn, Kneib, & Bühlmann, 2014) featuring complex, nonlinear, spatial, spatio‐temporal, additive or completely unstructured tree‐based conditional parameter functions ϑ(x). Similarly, count transformation models can be partitioned by transformation trees (Hothorn & Zeileis, 2017), which in turn lead to transformation forests, as a statistical learning approach for computing predictive distributions. The transformation approach seems also promising for the development for multivariate species distribution models, because different marginal transformation models can be combined into a multivariate model on the same scale (the idea was developed for continuous responses by Klein, Hothorn, & Kneib, 2019 and recent research focuses on discrete or count variables).
The greatest challenge in applying count transformation models is their interpretability. The effects of the explanatory environmental variables are not directly interpretable as multiplicative changes in the conditional mean of the count response, as is the case in Poisson or negative binomial models with a log link. For the logit, cloglog and log‐log link functions, the effects are still multiplicative, but at the scales of the discrete odds ratio, hazard ratio or reverse time hazard ratio, which might be difficult to communicate to practitioners. If the probit link is used, the effects are interpretable as changes in the conditional mean of the transformed counts. This interpretation is the same as that obtained from running a normal linear regression model on, for example, log‐transformed counts, with the important difference that (i) the transformation was estimated from data by optimizing (ii) the exact discrete likelihood. Nonetheless, it is possible to plot the estimated transformation function
against log(y + 1) ex post to assess the appropriateness of applying a log‐transformation.
4.1 Computational details
All computations were performed using R version 3.6.2 (R Core Team, 2019). A reference implementation of transformation models is available in the mlt R add‐on package (Hothorn, 2020a, 2020b). A simple user interface to linear count transformation models is available in the cotram R add‐on package (Siegfried and Hothorn, 2019). The package includes an introductory vignette and reproducibility material for the empirical results presented in Section 3.
The following example demonstrates the functionality of the cotram package in terms of a Cox count transformation model (2) with a cloglog link explaining how the number of tree pipits Anthus trivialis varies across different percentages of canopy overstorey cover (coverstorey).
-
### package cotram available from CRAN.R-project.org
-
### install.packages(c("cotram", "coin"))
-
library("cotram")
-
### tree pipit data; doi: 10.1007/s10342-004-0035-5
-
data("treepipit", package = "coin")
-
### fit discrete Cox model to tree pipit counts
-
m <- cotram(counts ~ coverstorey, ### log-hazard ratio of coverstorey
-
data = treepipit, ### data frame
-
method = "cloglog", ### cloglog link
-
order = 5, ### order of Bernstein polynomial
-
prob = 1) ### support is 0…5
-
logLik(m) ### log-likelihood
-
## 'log Lik.' -38.27244 (df=7)
-
exp(coef(m)) ### hazard ratio
-
## coverstorey
-
## 0.9805453
-
exp(confint(m)) ### 95% confidence interval
-
## 2.5 % 97.5 %
-
## coverstorey 0.9697581 0.9914526
-
### more illustrations
-
# vignette("cotram", package = "cotram")
The data are shown in Figure 7 overlayed with the smoothed version of the estimated conditional distribution functions for varying values of coverstorey.

ACKNOWLEDGEMENTS
The authors would like to thank Wendy Ran for improving the language. Constructive feedback by the handling editor, Prof. O'Hara, and two anonymous referees is highly appreciated.
AUTHORS' CONTRIBUTIONS
S.S. implemented the cotram package for count transformation models on top of the tram package for general linear transformation models written by T.H.; Empirical results were obtained using R code by S.S. and the cotram package vignette was also written by S.S. Both authors designed the simulation experiments, drafted and revised the manuscript.
Open Research
DATA AVAILABILITY STATEMENT
The code necessary to reproduce the empirical results presented in Section 3 is available from within the cotram R add‐on package http://CRAN.R‐project.org/package=cotram (Siegfried & Hothorn, 2019). The deer–vehicle collisions data re‐analysed in Section 3.1 were retrieved from Hothorn (2015; https://doi.org/10.5281/zenodo.17179).





