2013 Nat Comm SI

Here we propose a new method to compare the modular structure of a pair of node-aligned networks. The majority of current methods, such as normalized mutual information, compare two node partitions derived from a community detection algorithm yet ignore the respective underlying network topologies. Addressing this gap, our method deploys a community detection quality function to assess the fit of each node partition with respect to the other network’s connectivity structure. Specifically, for two networks A and B, we project the node partition of B onto the connectivity structure of A. By evaluating the fit of B’s partition relative to A’s own partition on network A (using a standard quality function), we quantify how well network A describes the modular structure of B. Repeating this in the other direction, we obtain a two-dimensional distance measure, the bi-directional (BiDir) distance. The advantages of our methodology are three-fold. First, it is adaptable to a wide class of co...

Supplementary Information Network modularity reveals critical scales for connectivity in ecology and evolution Robert J. Fletcher, Jr.1, Andre Revell1, Brian E. Reichert1, Wiley M. Kitchens2, Jeremy D. Dixon3, and James D. Austin1 1 Department of Wildlife Ecology and Conservation, PO Box 110430, 110 Newins-Ziegler Hall, University of Florida, Gainesville, FL 32611-0430. 2 U.S. Geological Survey, Florida Cooperative Fish and Wildlife Research Unit, University of Florida, Gainesville, FL 32611-0430 3 Crocodile Lake National Wildlife Refuge, 10750 County Rd 905, Key Largo, FL 33037 Supplementary Figures 0.5 0.4 14 patches, 2 modules 0.8 a links weights modules observed Modularity, Q 0.3 28 patches, 4 modules b links weights modules observed 0.6 0.4 0.2 0.2 0.1 0.0 0.0 -0.1 random random -0.2 0.9 0.8 0.7 0.6 8 0.5 0.4 links weights modules glm c 6 0.8 0.6 0.4 16 14 0.2 links weights modules glm d 12 z score 10 4 8 6 2 4 2 0 0 random -2 random -2 0.9 0.8 0.7 0.6 0.5 Proportion of wi within module 0.4 0.8 0.6 0.4 0.2 Proportion of wi within module Supplementary Figure S1 | Ability of a variety of significance tests to identify known modules of different strengths. (a, b) Modularity (+ SD) of the observed simulated network and modularity of randomized networks based on randomizing links, weights, or module labels as a function of module strength described as the proportion of movements within modules. (c,d) Z scores (+ SD) for link randomization, weight randomization, module (membership) randomization, and a Poisson GLM. Dashed line indicates a z-score with P = 0.05. Arrow denotes random networks for each network size, where E(Aij) is equal among all links. Metapopulation capacity 84 84 a 80 80 76 76 72 72 68 b 68 rk = -0.23 rk = -0.50 64 -1.0 Metapopulation capacity 84 -0.5 0.0 0.5 1.0 1.5 64 2.0 0.0 84 c 80 80 76 76 72 72 68 0.2 0.4 0.6 0.8 d 68 rk = -0.26 rk = -0.71 64 -1.0 -0.5 0.0 0.5 1.0 1.5 within-module strength of patch removed 64 2.0 0.0 0.2 0.4 0.6 0.8 1.0 participation coefficient of patch removed Supplementary Figure S2 | Metapopulation capacity of cactus bugs as a function of the withinmodule strength and participation coefficient of removed patches. (a,b) Module identification does not account for distance effects (Qng, see Fig. 1a). (c, d) Module identification accounts for distance effects (Qspa, see Fig. 1c). Dashed line shows metapopulation capacity in the absence of patch removal; rk = Kendall’s tau rank correlation. A model based on modularity metrics from Qspa fit the data better than for metrics from Qng (Akaike’s information criterion, 151.1 versus 165.5). For both models, participation coefficient (Qng:  = -6.5  2.1 SE, P = 0.004; Qspa:  = -9.9  1.7, P < 0.001) better predicted metapopulation capacity than did within-module strength (Qng:  = -2.0.99  0.7, P = 0.006; Qspa:  = -2.7  0.6, P = 0.653). 1.00 a 0.97 Qspa 0.94 0.36 0.91 0.34 0.88 0.32 0.85 NMI 0.30 0.82 0.28 0.79 0.26 0.76 0 50 100 150 200 250 300 0.14 1.0 b Modularity, Qspa 0.9 0.13 NMI 0.8 0.12 0.7 Qspa 0.11 0.6 0.10 0.5 0 100 200 300 Normalized mutual information, NMI Modularity, Qspa 0.38 Normalized mutual information, NMI 0.40 400 Bin width (km) Supplementary Figure S3 | Sensitivity of modularity to bin size. Shown are changes in modularity values, Qspa, with increasing bin size distances for (a) bullfrog and (b) black bear, as well as the normalized mutual information (NMI) between each bin size and all other bin sizes considered (the relative similarity of modules identified). Dotted line highlights bin size, with maximum Qspa, which was used for assessments shown in the main text. 4 a b c d rk = 0.56 rk = 0.40 rk = 0.60 rk = 0.69 rk = 0.40 rk = 0.40 rk = 0.30 rk = 0.47 rk = 0.10 rk = 0.28 rk = 0.08 rk = 0.39 rk = 0.05 rk = 0.24 rk = 0.03 rk = 0.32 Strength 3 2 1 0 -1 Betweenness 6 4 2 0 Metapopulation 3 2 1 0 -1 -2 3 Habitat area 2 1 0 -1 -2 -2 -1 0 1 2 0.0 Within-module strength 0.2 0.4 0.6 0.8 1.0 -2 Participation coefficient -1 0 1 2 0.0 Within-module strength 0.2 0.4 0.6 0.8 1.0 Participation coefficient Supplementary Figure S4 | Comparison of within-module strength and participation coefficient with four common connectivity measures in cactus bugs. (a,b) Module identification does not account for distance effects (Qng, see Fig. 1a). (c, d) Module identification accounts for distance effects (Qspa, see Fig. 1c). Patch strength, betweenness, metapopulation connectivity, and habitat area were centered and scaled (mean = 0, var = 1). rk = Kendall’s tau rank correlation. 3 a b c d rk = 0.78 rk = 0.43 rk = 0.83 rk = 0.24 rk = 0.15 rk = 0.32 rk = 0.23 rk = -0.14 rk = -0.01 rk = -0.15 rk = -0.04 rk = -0.60 rk = -0.17 rk = -0.45 rk = -0.25 rk = -0.39 Strength 2 1 0 -1 Betweenness 3 2 1 0 -1 Metapopulation 3 2 1 0 -1 -2 Habitat area 3 2 1 0 -1 -2 -1 0 1 2 0.0 Within-module strength 0.2 0.4 0.6 0.8 1.0 -2 Participation coefficient -1 0 1 2 0.0 Within-module strength 0.2 0.4 0.6 0.8 1.0 Participation coefficient Supplementary Figure S5 | Comparison of within-module strength and participation coefficient with four common connectivity measures in snail kites. (a,b) Module identification does not account for distance effects (Qng, see Fig. 1a). (c, d) Module identification accounts for distance effects (Qspa, see Fig. 1c). Patch strength, betweenness, metapopulation connectivity, and habitat area were centered and scaled (mean = 0, var = 1). rk = Kendall’s tau rank correlation. a 1.5 b c d rk = 0.17 rk = -0.03 rk = 0.30 rk = -0.17 rk = -0.11 rk = 0.05 rk = 0.00 rk = -0.14 rk = 0.16 rk = 0.01 rk = 0.18 rk = 0.06 rk = -0.10 rk = 0.08 rk = 0.05 rk = -0.12 Strength 1.0 0.5 0.0 -0.5 -1.0 -1.5 0.4 Dst 0.3 0.2 0.1 0.0 1.0 0.8 h 0.6 0.4 0.2 0.0 -0.2 1.0 0.8 ChD 0.6 0.4 0.2 0.0 -0.2 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 -2 Within-module strength Participation coefficient -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 Within-module strength Participation coefficient Supplementary Figure S6 | Comparison of within-module strength and participation coefficient with four common genetic connectivity measures in bullfrogs. (a,b) Module identification does not account for distance effects (Qng, see Fig. 1a). (c, d) Module identification accounts for distance effects (Qspa, see Fig. 1c). Patch strength was centered and scaled (mean = 0, var = 1). rk = Kendall’s tau rank correlation. a 4 b c d rk = 0.61 rk = -0.09 rk = 0.43 rk = 0.39 rk = 0.59 rk = 0.17 rk = 0.62 rk = 0.31 rk = 0.16 rk = -0.37 rk = -0.07 rk = 0.16 rk = 0.61 rk = 0.03 rk = 0.43 rk = 0.28 Strength 3 2 1 0 -1 0.4 Dst 0.3 0.2 0.1 0.0 1.0 h 0.5 0.0 -0.5 3 ChD 2 1 0 -1 -2 -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 -2 Within-module strength Participation coefficient -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 Within-module strength Participation coefficient Supplementary Figure S7 | Comparison of within-module strength and participation coefficient with four common genetic connectivity measures in black bears. (a,b) Module identification does not account for distance effects (Qng, see Fig. 1a). (c, d) Module identification accounts for distance effects (Qspa, see Fig. 1c). Patch strength was centered and scaled (mean = 0, var = 1). rk = Kendall’s tau rank correlation. Supplementary Tables Supplementary Table S1. Summary of generalized linear models testing for differences in movement or genetic covariance within versus between identified modules. Network Z-value/t-value* P-value wwithin/ wbetween Cactus bug 3.88 0.0001 2.89 Snail kite 8.07 <0.0001 3.91 Bullfrog 2.97 0.0045 2.33 Black bear 1.12 0.2814 1.94 Cactus bug 2.54 0.019 1.50 Snail kite -3.50 0.0004 0.61 Bullfrog 2.62 0.012 1.88 Black bear 0.66 0.517 1.19 Qng Qspa *Z-value for Poisson GLMs, t-value for zero-adjusted gamma GLMs. Both types of models were fit with the gamlss package in R 2.15. Supplementary Methods Modularity optimization. Several modularity optimization techniques have been proposed. It is frequently argued that for small networks, optimization based on simulated annealing65 is preferred (e.g., <200 patches), for moderate-sized networks (~200-1000 patches), spectral methods66 are preferred, and for very large networks (>1000 patches), so-called ‘greedy’ techniques are preferred67. Simulated annealing has been consistently shown to be a powerful approach for identifying modules under a variety of scenarios65,68. We used a simulated annealing algorithm for optimizing the modularity function based on the general approach of Guimera and Amaral65. Note that our code is slightly different (and more general) that the NETCARTO program of Guimera and Amaral65, because it allows for the ability to consider a variety of null models of relevance to ecology and evolution (see main text), which facilitates comparisons among networks and null models. Our simulated annealing algorithm was written in R 2.15 (Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government). Simulated annealing is useful for optimizing the modularity function because of the large number of possible module combinations. Simulated annealing uses an iterative process relying on a unique feature T (“temperature”) that decreases by a factor of c (“cooling factor”) after each iteration. This feature allows the algorithm to explore areas of high modularity without getting stuck in local maxima because at an initial high T, the algorithm can explore multiple local maxima in order to eventually find the absolute maximum. Each iteration of the algorithm is divided into two main sections, which themselves are iterated fN2 times and fN times, respectively, where N is the number of nodes (patches) in the network and we set f = 165. In the first section, a random patch is chosen and moved to another module. The algorithm either accepts or rejects this update based on the new modularity value and T65. In the second section, the algorithm randomly chooses to merge two modules or to split one module into two modules. Within each iteration of the second section, only one update is performed on one randomly selected module (in the case of splitting) or two randomly selected modules (in the case of merging). For all analyses, we started the algorithm with T = 1, c = 0.95, and ran the algorithm until T = 0.00003. We initially tested the performance of the algorithm with random networks with known modules. These random networks varied in size similar to the range our paper used and the algorithm reliably found the known modules (see below). We also note that because simulated annealing is a stochastic optimization function, we ran the optimization 100 times and used the maximum Q from these runs. The modules identified consistent among these runs. To compare among module partitions, we used a variant of normalized mutual information68, NMI, calculated with the clue package in R. Normalized mutual information is an index based on entropy and ranges between 0-1. Comparing two identical module assignments (partitions) will result in NMI = 1, whereas two module assignments that show no overlap of mutual assignments will result in NMI = 0. Significance tests. While the modularity, Q, is zero for situations of no identified modularity (i.e., one module), and is bound between -1 and 1, random networks can often generate non-zero Q values69. However, there has been limited consideration of appropriate ways to assess the significance of observed modularity. Here, we describe four different ways to understand if Q and the modules identified are statistically meaningful. We then use simulations to determine the ability of such approaches to identify significant modularity when modularity is known. Randomization tests are often used to assess significance of network patterns in biology70, and are the most common approach for assessing the significance of observed modularity71. We consider three potential randomization tests: randomizing links of the observed network71 , randomizing the weights of the observed network72, and randomizing the module assignments70. The general question that these randomization tests address is: what are the range of Q values that would be expected from a random network of the same dimension and movement? Each test constrains the randomization process in different ways. Randomizing links is most commonly done by constraining the degree distribution of the randomized networks to be the same as the observed distribution71. This has the effect that randomized networks have patches with the same frequency of movements among patches. For randomizing links, we used a local rewiring algorithm73. For weighted networks, like those considered here, one could alternatively randomize the observed weights, while leaving the overall topology intact72. Finally, another approach to randomization tests is to shuffle observed module assignments, leaving the network intact. This approach is similar in concept to the use of Analysis of Similarity Matrices in community ecology. As an alternative to randomization tests, observed modules can be tested for significance by comparing the amount of movement within modules to the amount of movement between modules74. This approach addresses the question: is there a significant variation in movement within versus among modules? While the simulated annealing algorithm finds the best partition, it does not guarantee that this partition has a strong classification of within versus between module movements. As such, this test allows for understanding if the observed modules are biologically meaningful, in terms of partitioning variation in movements across networks. Simulations assessing significance tests. We assessed the ability of these approaches to identify significant modularity using simulations based on networks with similar properties to those considered here. We simulated networks with two scenarios. In the first scenario, we considered a network containing 14 patches and 2 modules (7 patches/module). In the second scenario, we considered a network with 28 patches and 4 modules (7 patches/module). Based on the observed movement in snail kites, where average patch strength wi, was 7. 2 movements, we simulated a gradient of modularity strength where we varied the relative amount (based on a Poisson distribution) of within versus between module movements, ranging from 90% of movements being within modules to 50% of movements being within modules30,65. For each scenario and modularity strength, we simulated 10 replicate networks. For each network , we assessed statistical significance of modularity using link randomization, weight randomization, membership randomization, and testing within versus between module movement using a generalized linear model (GLM) with a log link function and a Poisson error distribution. For randomization tests, we used 100 replicate randomizations. We report z scores for each test as a metric of statistical significance32. Overall, these statistical methods varied in their ability to identify significant modularity (Supplementary Fig. S1). We found that weight randomization and membership randomization were poor tests for assessing modularity significance, with weight randomization consistently underestimating the significance of modularity whereas membership randomization tests concluded modularity was always significant, even for entirely random networks (where E(Aij) was the same within and between modules). For small networks, link rewiring tests only identified significant modularity in the most extreme situations, where 90% of movement was contained within modules, whereas for larger networks this test concluded significant modularity > 60% of movement was within modules. Note that results from randomizing links and weights also illustrate that random networks can have values of Q > 0. Using a simple GLM to test for within versus between module movement was the most powerful test for small networks, where it concluded no significant modularity on random networks but significant modularity for networks with > 60% of movement within modules and for large networks with > 50% of movement within modules. Based on these simulations, we focus on the use of GLMs to assess significance of modularity in our empirical datasets. Modularity metrics relative to common connectivity metrics. Several connectivity metrics have been proposed in ecology and evolution 75-77. In the main text, we compare connectivity metrics for patch importance based on within-module strength and the ‘participation coefficient’ (equations 5-6 of main text). To do so, we contrasted these module-derived metrics with that of patch strength ( wi   j Aij ). We used patch strength, wi, because it is the metric most directly comparable to within-module importance and participation coefficient. Here we also contrast within-module importance and participation coefficient to other patch-based metrics for connectivity in ecology and evolution. For mark-recapture data of individual movements, we focus on three metrics: betweenness centrality77, habitat area in the surrounding landscape78,79, and a common metapopulation metric for connectivity79,80. We focus on betweenness centrality (i.e., the number of shortest paths going through a focal patch) because it is thought to be a highly relevant metric for identifying stepping stones or key bottlenecks in landscapes51, such that we expected betweenness centrality may be highly correlated with between-module connectivity (participation coefficients). Indeed, the initial approach to optimizing the modularity algorithm used a heuristic based on betweenness centrality30. Habitat area (where the buffer radius, r, is = 1/α;79) is a common metric for landscape investigations that does not require information on movement. Finally, because metapopulation theory has made major advances in our knowledge of connectivity and its effects on metapopulations81, we consider the commonly used metric: mpi   j (exp(-dij)) SjIj (S1) where Ij is an indicator of species presence in patch j. For genetic data, we focused on comparisons with genetic metrics that assess patch diversity including the relative differentiation of individual patches (DST) and gene diversity (h)82. We decompose the contribution of each patch to the total h into components relating to the contribution due to divergence (ChD) following Petit et al.83. These measures quantify the effect of a specific patch (k) to all others (n-1), excluding k. For each of these metrics, we contrast patch prioritization for connectivity with withinmodule strength and participation coefficients calculated from Qng and Qspa. In the main text, we illustrate patch connectivity roles for within-module and between module connectivity (Fig. 2 ac) and changes in rank importance (Fig. 2d-f) in comparison to patch strength. Here, we also show rank correlations of within-module strength and participation coefficient with these other commonly used metrics for connectivity. For mark-recapture data, within-module strength and participation coefficient were generally most correlated with patch strength, which was expected given that this metric shares the most similarity with the module-based connectivity metrics, and least with metapopulation connectivity, mpi, and habitat area (Supplementary Figures S3-S4). Beyond patch strength correlations, all other correlations were < 0.47, with some correlations being strongly negative for snail kites. For example, some of the most connected patches according to the metapopulation connectivity metric exhibited the lowest participation coefficients in the snail kite network. For genetics data, patch diversity correlations were generally weaker than for connectivity metrics used in the mark-recapture data. Correlations were weaker for the bullfrog than for the black bear (Supplementary Figures S6-S7), and were generally weaker in reference to the participation coefficient than within-module strength. Supplementary References 65 Guimera, R. & Amaral, L. A. N. Cartography of complex networks: modules and universal roles. Journal of Statistical Mechanics-Theory and Experiment (2005). 66 Newman, M. E. J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577-8582 (2006). 67 Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics-Theory and Experiment (2008). doi:10.1088/1742-5468/2008/10/P10008 68 Didham, R. K. & Ewers, R. M. Predicting the impacts of edge effects in fragmented habitats: Laurance and Yensen's core area model revisited. Biol. Conserv. 155, 104-110 (2012). 69 Guimera, R., Sales-Pardo, M., & Amaral, L. A. N. Modularity from fluctuations in random graphs and complex networks. Physical Review E 70(2), 1-4 (2004). 70 Croft, D. P., Madden, J. R., Franks, D. W., & James, R. Hypothesis testing in animal social networks. Trends Ecol. Evol. 26, 502-507 (2011). 71 Olesen, J. M., Bascompte, J., Dupont, Y. L., & Jordano, P. The modularity of pollination networks. Proc. Natl. Acad. Sci. USA 104, 19891-19896 (2007). 72 Barrat, A., Barthelemy, M., Pastor-Satorras, R., & Vespignani, A. The architecture of complex weighted networks. Proc. Natl. Acad. Sci. USA 101, 3747-3752 (2004). 73 Kimura, M. & Weiss, G. H. Stepping stone model of population structure and decrease of genetic correlation with distance. Genetics 49, 561-576 (1964). 74 Wang, Z. & Zhang, J. Z. In search of the biological significance of modular structures in protein networks. Plos Computational Biology 3, 1011-1021 (2007). 75 Calabrese, J. M. & Fagan, W. F. A comparison-shopper's guide to connectivity metrics. Front Ecol Environ 2, 529-536 (2004). 76 Minor, E. S. & Urban, D. L. A graph-theory framework for evaluating landscape connectivity and conservation planning. Conserv. Biol. 22, 297-307 (2008). 77 Estrada, E. & Bodin, O. Using network centrality measures to manage landscape connectivity. Ecol. Appl. 18, 1810-1825 (2008). 78 Fahrig, L. Effects of habitat fragmentation on biodiversity. Ann Rev Ecol Evol Syst 34, 487-515 (2003). 79 Moilanen, A. & Nieminen, M. Simple connectivity measures in spatial ecology. Ecology 83, 1131-1145 (2002). 80 Hanski, I. A practical model of metapopulation dynamics. J. Anim. Ecol. 63, 151-162 (1994). 81 Hanski, I. Metapopulation dynamics. Nature 396, 41-49 (1998). 82 Gahl, M. K., Calhoun, A. J. K., & Graves, R. Facultative use of seasonal pools by American bullfrogs (Rana catesbeiana). Wetlands 29, 697-703 (2009). 83 Dixon, J. D. et al. Effectiveness of a regional corridor in connecting two Florida black bear populations. Conserv. Biol. 20, 155-162 (2006). View publication stats

Log In

2013 Nat Comm SI

2013 Nat Comm SI

Related Papers

RELATED PAPERS