BiMat: a MATLAB package to facilitate the analysis of bipartite networks

Bipartite networks are ubiquitous in community ecology, including examples of facilitative interactions, such as plant‐pollinator networks, and antagonistic interactions, such as virus–host infection networks. Statistical network analysis is increasingly used to identify emergent, nonrandom patterns of interaction and the effect of interaction patterns on ecological and evolutionary dynamics. Two recurring patterns are that of modularity and nestedness. Modularity is a feature of networks in which there are densely interacting subgroups. Nestedness is a feature of networks in which the interactions form ordered subsets. Here we describe BiMat, an open‐source MATLAB package for the study of the structure of bipartite ecological networks. Unlike alternative tools, BiMat enables both multiscale analysis of the structure of a bipartite ecological network – spanning global (i.e. entire network) to local (i.e. module‐level) scales – and meta‐analyses of many bipartite networks simultaneously. In common with other tools, BiMat calculates the degree to which a bipartite network is modular and/or nested, the statistical significance of these patterns, and enables visualization of latent structures in the network. BiMat is available as an open‐source MATLAB package, with a quick‐start guide and worked examples at: http://bimat.github.io/.


Introduction
Interactions amongst distinct sets of individuals in an ecological community can be represented as bipartite networks. Prominent examples of bipartite networks include plant-pollinator networks (Bascompte et al. 2003;Dunne 2006;Olesen et al. 2007;Bastolla et al. 2009) and phage-bacteria infection networks (Flores et al. 2011;Poisot et al. 2011;Poisot, Lounnas & Hochberg 2013;Weitz et al. 2013). The nodes in the network are members of two distinct sets: plants and pollinators in the first example and phage and bacteria in the second example. An edge connecting two nodes denotes that an interaction takes place and the absence of an edge denotes that an interaction does not take place. In a bipartite network, interactions, for example, pollination or infection, are considered only between nodes of distinct sets.
Analysis of plant-pollinator and host-parasite systems has identified emergent patterns of modularity and nestedness. A modular network is one in which subsets of nodes often preferentially connect to each other, rather than to other nodes (Olesen et al. 2007). For bipartite networks, we assume that the nodes within a module must necessarily connect to nodes from the other set (Barber 2007) [but see (Guimer a, Sales-Pardo & Nunes Amaral 2007; Larremore, Clauset & Jacobs 2014)]. A nested network is one in which the interaction between nodes form (partially) ordered subsets of each other (Bascompte et al. 2003;Flores et al. 2011). For bipartite networks, this ordering applies to both sets (e.g. plants and pollinators or phage and bacteria). Understanding the structure of ecological networks can provide insights into the generation and maintenance of diversity (Bascompte et al. 2003;Bastolla et al. 2009;Joppa et al. 2009;Stouffer & Bascompte 2011). Similarly, characterizing an interaction network may also help predict which species are most likely to go extinct (Harfoot et al. 2014).
A number of tools have been developed to analyse the structure of complex networks. These tools enable researchers to interactively query and analyse a network [e.g. Gephi (Bastian, Heymann & Jacomy 2009), Cytoscape (Shannon et al. 2003)] and to embed network analysis into workflows via a library framework [e.g. NetworkX (Hagberg, Swart & Chult 2008), iGraph (Csardi & Nepusz 2006). Available tools focus largely on the analysis of unipartite networks, that is, bipartite networks are represented as a special case of unipartite networks. In contrast, bipartite is a general package for the analysis of bipartite networks (Dormann et al. 2009). bipartite nestedness and modularity, within a R language framework (Dormann, Gruber & Fr€ und 2008;Dormann & Strauss 2014). Additional tools focusing on bipartite networks have been developed specifically for the analysis of one kind of network pattern, for example, nestedness estimation as in ANINHADO (Guimaraes & Guimarães 2006), WINE (Galeano, Pastor & Iriondo 2009), and recently FALCON (Beckett, Boulton & Williams 2014), or modularity estimation as in MODULAR (Marquitti et al. 2014).
Here we present BiMat, a MATLAB library for the analysis and visualization of bipartite networks, that introduces new features not available in other bipartite network analysis tools and replicates some of the features of the bipartite R package in a different language and framework. Consistent with standards developed in established packages, BiMat enables the analysis of bipartite network structure, including modularity and nestedness, and the significance of identified patterns. In addition, BiMat includes meta-analysis tools for the analysis of many related networks simultaneously. Crucially, the structure of a network may be different at different scales, as was recently demonstrated in the case of phage-bacteria infection networks that were found to be modular at the entire network scale, yet the identified modules had internal nested structure . In response, we have included a suite of methods to analyse the multiscale structure of bipartite networks. The integration of all of these features into a single package is not available in other tools. Finally, BiMat provides a range of visualization tools for exploring bipartite network structure in either matrix or graph layouts.
We recognize that there is not a single language used and preferred by researchers interested in bipartite networks. The use of MATLAB may represent a feature for some and an obstacle for others. In order to facilitate the use of the tool and the replication of its methods in other languages, we have released the BiMat package as an open-source library, with a quick-start guide and worked examples at http://bimat.github.io/. We have also released a port of BiMat for use in the open-access GNU Octave language, on the same repository.

T E R M I N O L O G Y
A bipartite network, B = (R,C,E), denotes two nonoverlapping sets of nodes u 2 R (or row nodes) and v 2 C (or column nodes). In a phage-bacteria infection network, one set of nodes corresponds to host bacteria and the other to viruses. In the case of phages and hosts, a link e = (u,v) 2 E denotes infection. In the pollinator example, there will be a link between an insect/bird and a plant if the former can pollinate the latter. Links between nodes of the same kind are forbidden.
The adjacency matrix B of size m9n is an alternative definition of a bipartite network, where m = |R| and n = |C| is the size of the sets R and C, respectively. Network links and entries of the adjacency matrix only indicate the presence (1) or absence (0) of pairwise interactions, that is, we do not consider weighted bipartite networks. The total number of links in the

NODF
The NODF nestedness metric is based on the extent to which a network exhibits decreasing fill and paired overlap. NODF measures nestedness across rows by assigning a value M row ij to each pair i, j of rows in the interaction matrix (Almeida-Neto et al. 2008): where k i is the number of ones in row i, k j is the number of ones in row j, and n ij is the number of shared interactions between rows i and j (so-called paired overlap). Formally, n ij is equal to the number of entries in which B(i,:) = B(j,:). Notice that positive contributions to NODF require pairs of columns to exhibit decreasing fill. The term 'fill' denotes the degree of a node, and hence, decreasing fill is satisfied when k i [ k j , such that the degree of node j is less than that of node i. A similar term M col ij is used to compute column contributions. The total nestedness is the sum of columns and row contributions: The NODF metric normalizes for matrix size and thus allows matrices of different sizes to be compared. The nestedness range is 0≤N NODF ≤1, where 0 indicates the lack of any decreasing fill and paired overlap (e.g. a block-like matrix or checkerboard matrix) and 1 corresponds to a perfectly nested structure.
In practice, MATLAB vectorization capabilities allow for the efficient calculation of (1) and its column version: where r i is a vector that represents the row i of the adjacency matrix, and dðk i ; k j Þ ¼ 1 if and only if k i ¼ k j . Eqn (2) can be expressed in terms of matrix multiplications (while normalizing for double-counting) that exploit the implicit parallelism provided by MATLAB (see code for details).

Temperature
The temperature metric for nestedness is estimated by sorting rows and columns such that the largest quantity of interactions falls above the isocline. An isocline is a curve that divides the interaction from the noninteraction zone of a perfectly nested matrix of the same size and connectance. In doing so, the value of T quantifies the extent to which interactions only take place in the upper left (T%0) or are equally distributed between the upper left and the lower right (T%100). The interactions within perfectly nested interaction matrices can be sorted to lie exclusively in the upper left portion and hence have a temperature of 0. The value of temperature depends on the size, connectance and structure of the network. We define the nestedness, N NTC ¼ ð100 À TÞ=100, such that N NTC ¼ 1 when T = 0 (corresponding to perfect nestedness) and N NTC ¼ 0 when T = 100 (corresponding to 'checkerboard' matrices). We use the phrase 'Nestedness Temperature Calculator' (NTC) to denote the value of nestedness calculated using this approach (Atmar & Patterson 1993; Rodr ıguez-Giron es & Santamar ıa 2006).

M O D U L A R I T Y
Modularity indicates the presence of dense clusters of nodes with many overlapping interactions embedded within the network (Fortunato 2010). Clusters are considered dense when they have high internal edge density relative to the expected edge density in the null model. These dense clusters are termed modules. Identifying modules and estimating the associated modularity require a partitioning of the network. The modularity of a bipartite network is defined as: where g i and h i are the module indices of nodes i (that belongs to set R) and j (that belongs to set C), k i is the degree of node i, d j is the degree of node j, and |E| is the number of links in the network. The expression for calculating modularity can be written in matrix notation, thus allowing an efficient MATLAB implementation: whereB ij ¼ B ij À kidj jEj is the so-called modularity matrix and c is the number of modules. In this case, g and h vectors are replaced by the m9c index matrix R ¼ r 1 jr 2 j:::jr m ½ T and the n9c index matrix T ¼ t 1 jt 2 j:::jt n ½ T .
The modularity algorithms within BiMat attempt to maximize eqn (4) by partitioning the network. The standard maximization approach is the Bipartite Recursively Induced Modules (BRIM) algorithm (Barber 2007). The BRIM algorithm computes the optimal modularity by assigning nodes into modules successively to maximize the per-node contribution of modularity given prior assignments. In that way, each set of nodes (e.g. vector T) recursively induces the other set of nodes (e.g. vector R). BRIM assigns nodes of each type to modules until a local maximum is reached. Maximizing eqn 4 is NP-hard (Miyauchi & Sukegawa 2014), and so BiMat implements two different heuristic approaches, adaptive BRIM as well as LP & BRIM: • AdaptiveBRIM: The algorithm uses the BRIM algorithm with a heuristic for identifying the optimal network partition. The heuristic involves doubling the number of modules by a factor of two until Q decreases and then using a bisection method to identify the maximal value of Q between the current and previous partition.
• LP&BRIM: This method combines label propagation (LP) with the BRIM algorithm to partition the community (Liu & Murata 2010). The algorithm consists of two stages. First during the LP phase, neighbouring nodes (i.e. those which share links) exchange their labels representing the community they belong to, with each node receiving the most common label amongst its neighbours. We iterate this process until densely connected groups of nodes reach a consensus of what is the most representative label, as indicated by the fact that the modularity is not increased by additional exchanges. Secondly, the BRIM algorithm refines the partitions found with label propagation.

S T A T I S T I C A L S I G N I F I C A N C E
The values of modularity and nestedness depend on network size and the density of links (Dormann et al. 2009). Null models can be used to correct for size and density effects. We calculate the statistical significance p as the likelihood that the measured value of nestedness and modularity of the original network is greater than or equal to that measured in an ensemble of suitably chosen random networks. The choice of the random network depends on the system under observation. BiMat provides several null models used to generate ensembles of random networks that make different assumptions about the expected probability of interaction P ij between two species. The null models include 'equiprobable', 'average', 'columns', 'rows' and 'fixed'. Each class of null models preserves some feature of the marginal distribution of the original network, either exactly or on average (see http://bimat.github.io/ for visual illustrations of each of these null models): Equiprobable: Random networks have, on average, the same overall connectance as the original network. Average: Random networks have, on average, the same overall connectance as the original network, and each node has the same expected number of interactions as in the original. Columns: Random networks have, on average, the same overall connectance as the original network, and the expected num-ber of interactions of row nodes is preserved, that is, preserving the sum across columns. Rows: Random networks have, on average, the same overall connectance as the original network, and the expected number of interactions of column nodes is preserved, that is, preserving the sum across rows. Fixed: The exact marginal distributions are preserved (i.e. row and column sums). A swapping algorithm is utilized to generate random networks.

I M P L E M E N T A T I O N
BiMat is developed in MATLAB using an Object-Oriented Programming (OOP) paradigm. The use of OOP is meant to facilitate maintainability and extensibility of the codebase; prior experience with OOP is not required for use of BiMat. Access to BiMat functions is granted (with the exception of some static classes) using instances of the class that implements the functions. The main package class is the bipartite class, which works as a common interface to the available statistical, algorithmic, plotting, input and output classes. Because of this OOP design pattern, most of the functionality is accessible using the following syntax: bip.class_instance_in_bip.method_name (arguments) where bip is a bipartite instance created by the user, class_instance_in_bip is a property of the bipartite class which represents an instance of the class which has access to the method method_name. The method that is called will frequently have direct read and writable access to other properties inside bip. Documentation on the use of the OOP framework for estimation of network structure is available in the quick-start guide released in the BiMat package. In addition to conventional estimation of network patterns, BiMat also enables researchers to perform a meta-analysis of many networks and a multiscale analysis of a single network. The meta-analysis feature is described on the project home page, in the quick-start guide, and an example of a multiscale analysis is described next.

Multiscale analysis
The multiscale analysis within BiMat consists of the following steps: Module detection: The modular structure of the entire network is identified. Submodule structural analysis: Network analysis is performed at the intermodule scale. BiMat evaluates the extent to which modules exhibit internal structure, including modularity and/or nestedness. Submodule label analysis: Additional node information may be available, for example, the size of individuals of a node or the location of collection. This information is termed a node 'label'. As an optional step, BiMat can evaluate whether or not the partitioning of nodes within modules is related to properties of the node labels.
As an example, consider the phage-bacteria infection network of Moeb€ us and Nattkemper (Moebus & Nattkemper 1981). The individual phage and bacteria in this study were isolated from different locations across the Atlantic Ocean. In a previous study, we developed a multiscale analysis of network structure in this data set, including 286 phage and 215 bacteria types . BiMat can be used to first identify the global-scale structure of a bipartite network. Assuming that the original interaction matrix is stored in the variable moebus.weight_matrix, then the left panel of Fig. 1 shows a visual representation of this data in matrix layout after the following sequence of commands: The modularity significance detection suite confirms that the network is modular. Yet, the modules seem to have nested structure. For example, there is a triangular pattern of interactions with most of the links above the temperature isocline. To measure nestedness within modules, BiMat makes use of the Internalstatistics class by treating each of the modules as an independent network: The results of the last two plot commands can be seen in the right panels of Fig. 1. The meta_statistics property is an instance of the class MetaStatistics. As such, it can use any of methods inside MetaStatistics including its property plotter in the internal modules. This feature is a consequence of the use of OOP in developing BiMat.
Finally, BiMat can evaluate whether there is a relationship between node labels and module distribution. This feature is of particular use when the node information is available, for example, with respect to their study origin or other categorical (i.e. nonmetric) feature. If there is a strong relationship between label and module, then every node inside the same module will share the same (or similar) label. BiMat makes use of both Shannon's and Simpson's indices to analyse the label variation inside and between modules (Schloss & Handelsman 2005). The heterogeneity of label indices is measured within each module. Then, node labels are randomly swapped, generating an ensemble from which to compare the measured relationship.

Conclusion and future work
We have developed the BiMat package to enable structural analysis of bipartite ecological networks within a MATLAB environment. We have also ported BiMat for use in the GNU Octave environment. BiMat enables the identification of key patterns, including modularity and nestedness, and the evaluation of their statistical significance. In that sense, BiMat is similar to existing tools, including bipartite (Dormann et al. 2009), a software library written in R. However, unlike other tools, BiMat also includes capabilities to perform a meta-analysis of the structure of multiple networks and to analyse the multiscale structure of networks. In future releases, BiMat will incorporate methods to analyse the structure of bipartite networks where interactions are weighted, rather than strictly Boolean.