“Diagnosing Multicollinearity in Exponential Random Graph Models”
Forthcoming, Sociological Methods and Research.
Code to implement measure
Scott W Duxbury
Doctoral Candidate
The Ohio State University
Department of Sociology
ABSTRACT: Exponential random graph models (ERGM) have been widely applied in the social
sciences in the past ten years. However, diagnostics for ERGM have lagged behind their use.
Collinearity-type problems can emerge without detection when fitting ERGM, yielding
inconsistent model estimates and problematizing inference from parameters. This article
provides a method to detect multicollinearity in ERGM. It outlines the problem and provides a
method to calculate the variance inflation factor from ERGM parameters. It then evaluates the
method with a Monte Carlo simulation, fitting 216,000 ERGMs and calculating the variance
inflation factors for each model. The distribution of variance inflation factors is analyzed using
multilevel regression to determine what network characteristics lend themselves to collinearitytype problems. The relationship between variance inflation factors and unstable standard errors
(a standard sign of collinearity) is also examined. The method is shown to effectively detect
multicollinearity and guidelines for interpretation are discussed.
1
Advances in the statistical modeling of social networks have provided a much-needed
analytic tool for social scientists. Exponential random graph models (ERGM) have been widely
applied to test theories of racial boundaries (Wimmer and Lewis 2010), group processes
(Goodreau, Kitts, and Morris 2009), gang violence (Papachristos 2009; Papachristos, Hureau,
and Braga 2013), and social structure in total institutions (Kreager et al., forthcoming), among
other topics. The main benefit of ERGM is the ability to simultaneously model and compare
(exogenous) actor level attributes against contextual (endogenous) network effects (Robins et al.
2007; Lusher, Robins and Koskinen 2013). The basics of using exponential models for
understanding social networks has been around for some time (e.g. Holland and Leinhardt 1981;
Frank and Strauss 1986); however, initial statistical models based on pseudo-likelihood
estimation were shown to be highly problematic (see Handcock et al. 2003a) and have only
recently become widespread through the application of Markov Chain Monte Carlo (MCMC)
maximum likelihood (MLE) methods (Geyer and Thompson 1992). The larger social science
community has embraced these methods due to novel software made available in the R
environment (Handcock et al. 2003b).
Despite widespread use of ERGM, the variety of diagnostic tools commonly used to
evaluate issues with correlation among the explanatory variables and model assessment in more
conventional statistical models have yet to be adapted for ERGM. Particularly, the issue of
multicollinearity— when at least one explanatory variables varies as a linear function of other
explanatory variables —often goes unchecked and, by extension, undetected. The lack of
multicollinearity diagnostics for statistical network models is not surprising. The standard
approach to diagnosing multicollinearity in a regression model—the variance inflation factor
(VIF)—implies independent observations. Consequently, translating this tool to network data is
2
an unclear task. Moreover, extremely high multicollinearity in an ERGM may lead to model
degeneracy and non-convergence (Lusher et al. 2013; O’Malley and Onnela 2014), and so highly
collinear models may not yield problematic estimates because they fail to yield estimates at all.
These practical constraints do not subvert the issue that multicollinearity in ERGM can
affect parameter and variance estimates, inhibiting inference from coefficients. Moreover, unique
collinearity-type problems arise when using ERGM. First, highly collinear models may have
multiple solutions to the likelihood function. In such cases, a single ERGM specification may
result in multiple models from which the best fitted model cannot be determined (Hunter et al.
2008; Salter-Townshend and Murphy 2015). Second, variance estimates calculated through
Markov Chain Monte Carlo (MCMC) maximum likelihood estimation (MLE) may be unstable
when collinearity exists. Consequently, the same model can be rerun and yield different
parameter estimates (Chandrasekhar and Jackson 2014). Third, since endogenous network
parameters all often vary in tandem with the networks’ density, an increasing number of
structural terms may increase the likelihood of collinearity-type problems (Snijders, Robins, and
Handcock 2006). This is particularly problematic since the inclusion of endogenous parameters
is often the primary appeal of using ERGM, and dyad-dependency—where global network
structures influence the probability of tie formation—may be especially prevalent in dense
networks. Finally, since endogenous covariates are often highly correlated, a collinear network
variable may affect the parameter and variance estimates of other network covariates.
How can the VIF be translated to ERGM? What VIF value would indicate problematic
levels of multicollinearity? What ERGM specifications are more or less vulnerable to
collinearity-type problems? This article addresses these questions by developing a method to
calculate the VIF in ERGM. It introduces a technique to calculate shared variance between
3
parameters using a distribution of networks simulated from the parameter estimates of an
ERGM. The method is evaluated using a Monte Carlo simulation experiment. ERGM
specifications which correlate with large VIFs are identified using multilevel regression and the
relationship between VIFs and standard error instability is investigated.
PROBLEMS DIAGNOSING MULTICOLLINEARITY IN EXPONENTIAL RANDOM
GRAPH MODELS
Formally, ERGM is defined as:
(1)
1
Pr(𝐘 = 𝒚) = ( ) exp{∑ 𝜂𝐴 𝑔𝐴 (𝐲)}
𝑘
Where A is a type of network configuration, 𝜂𝐴 is the parameter term, 𝑔𝐴 (𝐲) is the network
statistic, and k is a normalizing constant that ensures interpretable results and that the probability
model sums to 1. The model assumes that the observed network is the result of a stochastic
process and is only one realization from a population of possible networks. For (endogenous)
change statistics, coefficients represent the log-odds of an edge connecting two vertices; for
node-level covariates, coefficients indicate whether actors with a specific attribute are more or
less likely to be connected to alters in the network (Lusher, Koskinen, and Robins 2013).
The general approach to fitting an ERGM is to assume (1) that the likelihood principle is
true, (2) that the data (e.g, the observed network) is good evidence of the generative (stochastic)
distribution, and (3) that the likelihood formulation of the problem is reasonable. The researcher
then provides the structural parameters of interest and finds the parameter weights which
maximize the likelihood of the observed network. Computationally, this is done using MCMC
MLE methods (see Handcock et al. 2008) because of the intractability of k in large networks.
4
Oftentimes, pseudo-likelihood estimates are used as starting values for the MCMC algorithm,1
though this is not necessary for the model to succeed.
The MCMC procedure simulates a distribution of sufficient network statistics by making
incrementally small changes to the network in sequential steps, where each step toggles whether
a pair of actors are connected by an edge. If the new network has a higher probability than the
preceding network, or if the ratio of the probability of the old network to the probability of the
new network is relatively low, the new network is used as the starting network in the next step of
the sequence. If this is not the case, then the old network is used again (see Koskinen and
Snijders 2013 for a technical introduction). As simulated networks converge on the
parameterized sufficient statistics, formal tests (such as a Gelman-Rubin test) are used to
evaluate whether the Markov Chain has reached equilibrium. The result of the procedure is a
joint probability distribution of sufficient network statistics. Random draws are made from the
distribution to derive parameter estimates. The log-likelihood of the ERGM is evaluated
iteratively until it reaches convergence. The MCMC procedure is often repeated many times to
ensure that the solution to the log-likelihood is a global maximum rather than a local one
(Snijders 2002). The benefit of this approach is that contextual network effects, such as triadic
closure or the influence of shared partnerships, can be accurately estimated.
Some techniques exist to evaluate multicollinearity, but these are rudimentary and often
imprecise. First, the posterior bivariate correlation matrix can be used to gauge when correlations
are high. However, bivariate correlations are often very large when working with network data
(Ripley et al. 2017:59), and so determining when a correlation should warrant concern is a
difficult task. Further, multicollinearity often results from a linear combination of correlations—
a condition which is undetectable through the bivariate correlation matrix alone. This problem
5
may be particularly acute when a researcher includes multiple endogenous terms because higher
order substructures often depend on lower-order ones and most endogenous effects are somewhat
influenced by network density (Faust 2007).
Second, multicollinearity may cause problems in calculating the covariance matrix
between the parameter estimates. The Hessian matrix is often unsolvable when ERGM
parameters are highly correlated. Since the covariance matrix is retrieved by solving the Hessian
matrix, a problematic Hessian matrix may indicate multicollinearity. However, the Hessian
matrix may be unsolvable for any variety of reasons that causes the likelihood function to fail;
for example, insufficient Markov Chain length. Moreover, the Hessian matrix is not always
unsolvable when multicollinearity is present. Thus, while an unsolvable Hessian matrix or large
bivariate correlations may suggest multicollinearity, they cannot identify the problem directly;
nor can these methods pinpoint when combinations of explanatory variables are highly
correlated. These shortcomings highlight the need for a diagnostic tool for multicollinearity.
Evaluating correlations
One large problem in detecting multicollinearity in ERGM relates to translating the VIF
to the ERGM context. VIFs in GLM are based on the assumption of accurate and otherwise
independent correlations between predictor variables (Fox and Monelli 1992; Fox 2008). The
general approach is to regress the term of interest on all other explanatory variables in the model
using OLS. The resulting 𝑅 2 is used to calculate a VIF for that variable. In this approach, 𝑅 2 is
calculated as:
(2)
𝑅𝑦2
∑(𝑦𝑖 − ŷ𝑖 )2
= 1 −
∑(𝑦𝑖 − 𝑦̅)2
6
Where y is the outcome variable and ŷ is the predicted value of y. VIF is calculated as:
(3)
𝑉𝐼𝐹𝑦 =
1
1 − 𝑅𝑦2
This approach works well in GLMs because observations are independent and errors are
assumed to be uncorrelated. When predictor variables are highly correlated, variance estimates
inflate and can be subsequently detected.
However, variance estimates resulting from an OLS regression are inaccurate when
working with network data because observations are not independent and identically distributed
(iid). Utilizing this method on network data may lead to 𝑅 2 values that do not reflect shared
variance among network parameters, and therefore VIFs that do not accurately estimate
multidimensional correlations between explanatory variables. This problem is likely to increase
as contextual network effects are included in an ERGM.
The proposed alternative for ERGMs is to simply calculate 𝑅 2 from the fitted parameters
of an existing model. This approach is particularly appealing for statistical network models,
which explicitly seek to evaluate independent correlations between variables when using non-iid
data. By calculating 𝑅 2 directly from the correlation matrix between parameters of a converged
ERGM, resulting VIFs will capture shared variance between explanatory variables.
This method assumes that an ERGM has already yielded some set of parameter estimates.
After yielding estimates, a distribution of networks can be simulated from the parameters of an
ERGM using MCMC techniques (Geyer and Thompson 1992; Hunter et al. 2008). An accurate
bivariate correlation matrix R can be calculated from the sufficient statistics of these networks. R
can then be broken into a vector of correlations 𝒓𝒙𝒚 , where subscript y represents the explanatory
variable of interest and x represents all other explanatory variables in the ERGM, and a square
7
correlation matrix 𝑹𝒙𝒙 consisting of all explanatory variables in the model other than y.
Correlations with the edges term (the number of the edges in the network) should be excluded
from both 𝒓𝒙𝒚 and 𝑹𝒙𝒙 . Thus, in the fitted ERGM, 𝑅 2 for a focal predictor variable y as
explained by other predictor variables 𝑥𝑖 can be calculated as (e.g. Rencher 2002):
(4)2
𝑅𝑦2 = 𝒓𝑻𝒙𝒚 𝑹−𝟏
𝒙𝒙 𝒓𝒙𝒚
This 𝑅 2 can then be used in Eq. (3) to calculate the VIF for y.
It may seem curious to simulate a distribution of networks rather than to calculate R from
the ERGM directly. MCMC MLE explores unlikely network configurations when approaching
convergence. Consequently, the correlation matrix of a fit ERGM may reflect some error implicit
in the simulation procedure. It is therefore important that the researcher simulate a distribution of
networks from a converged ERGM and use this posterior distribution of networks to calculate R.
To summarize, the proposed approach is to:
1) Fit an ERGM.
2) Simulate a large distribution of networks from the ERGM parameters.3
3) Calculate R from the distribution of networks.
4) Calculate 𝑅𝑦2 from R (Eq. (4)), treating the explanatory variable of interest as y.
5) Calculate the VIF for y using Eq. (3).
Proof of concept
Since the properties of VIF in OLS regression are well understood, a simple way to
evaluate this method is to compare the VIFs in ERGM to the VIFs retrieved from OLS
regression. A Monte Carlo experiment compared estimates from a dyadic independent ERGM to
8
those of an OLS regression. The ERGM was binary, predicting an undirected 75-vertex network
with three node-level attributes. The same predictors for the OLS model were used as node-level
main-effects in the ERGM. The outcome (network) was a random network, and the attributes
were simulated random continuous variables. Both models were fit to the same simulated data
1,000 times to calculate three VIFs for OLS regressions and ERGMs. The ERGM correlation
matrix was calculated from a posterior distribution of 1,000 simulated networks. MCMC MLE
was used instead of pseudo-likelihood to estimate the ERGM. In the OLS regression, VIFs were
calculated using Eq. (2) and Eq. (3); the ERGM used the approach described above.
The distribution of VIFs for the OLS regression yielded a mean of 1.027, a standard
deviation of 0.028, and a range of 1.000 to 1.278. The distribution of VIFs for ERGM yielded a
mean of 1.030, a standard deviation of 0.032, and a range of 1.00 to 1.284. The two distributions
were positively correlated with a correlation coefficient of 0.876 (p < 0.001).
In a second condition, the variance-mean ratio for the randomized independent variables
was set to 0.2 to introduce relatively high levels of collinearity among the predictor variables.
The distribution of VIFs for the OLS regression in this condition yielded a mean of 3.343, a
standard deviation of 0.528 and a range of 2.069 to 5.737. The distribution of VIFs for ERGM
yielded a mean of 3.919, a standard deviation of 0.685, and a range of 2.174 to 7.648. The two
distributions were positively correlated with a correlation coefficient of 0.893 (p < 0.001). These
results indicate that the proposed method for estimating VIFs yields comparable estimates to
OLS VIFs when calculated on network outcomes.
Evaluating ERGM VIFs
9
It is worth discussing here the meaning of VIFs in the ERGM context. There currently is
not a clear estimate of variance explained for ERGMs, including the 𝑅 2 in Eq. (4). ‘Spectral’
goodness of fit measures for social networks have only been developed recently (Shore and
Lubin 2015) and they do not easily translate to estimating multicollinearity. Instead, the
calculation for VIFs proposed in Eq. (4) estimates shared variance, where increasing values
indicate higher multivariate correlations between explanatory variables. Thus, the method detects
correlations as a linear combination of variables, but it is not clear how VIFs should be
interpreted on their own or how multidimensional correlations are related to parameter or
variance estimates. The remainder of this paper is dedicated to 1) defining when VIF values are
problematic and 2) identifying ERGM specifications that are particularly vulnerable to
multicollinearity.
AN EXPERIMENTAL APPROACH
The main ambiguity in evaluating VIFs in ERGM is that MCMC estimation is especially
tolerant to highly correlated predictor variables. Consequently, the standard rule of thumb for
evaluating VIFs—where a VIF larger than 4 indicates concerning levels of collinearity, and a
VIF greater than 10 indicates problematic levels of collinearity (Chatterjee and Price 1991;
O’Brien 2007)—does not translate well to the ERGM context because network data is often
highly correlated (e.g. Snijders 2002; Ripley et al. 2017), and so VIFs will most likely be much
larger than in GLM. Moreover, the VIF in ERGM does not have a straightforward interpretation
as in OLS, where √𝑉𝐼𝐹 is the factor by which variance estimates are inflated. How should
researchers interpret VIFs in ERGM? What ERGM specifications may increase the risk of
multicollinearity?
10
To answer these questions, I conducted a Monte Carlo experiment to investigate the
properties (or distribution) of VIFs under a variety of ERGM specifications. Monte Carlo
simulation generates random samples of data from fixed specifications (Mooney 1997). The
underlying logic of this strategy is to create a distribution of VIFs that reflect a wide array of
ERGM specifications and network configurations. The distribution of VIFs resulting from these
experiments are then analyzed using multilevel regression to evaluate which ERGM
specifications may result in multicollinearity. While an uncommon approach, regression on
simulated data is useful to evaluate multidimensional correlations that may not be apparent
through descriptive statistics alone (Kim and Bearman 1997; Hanaki et al. 2007).
The simulation entailed generating the data, estimating an ERGM on the data, and then
saving the VIF for each parameter in the model. The simulated networks are all undirected and
each have 75 vertices. I focus on the simplest case of unweighted networks, where all cells in the
adjacency matrix have a value of either 0 or 1. An ERGM is fit to the network per the
specifications of that factorial condition. A posterior distribution of 1,000 networks is then
simulated from the fitted ERGM. A correlation matrix is retrieved from this distribution, and the
VIF for each explanatory variable is calculated using Eq. (4) and Eq. (3).
Manipulations
The factorial design of the experiment varies network density, network topology, ERGM
parameters, and variance-mean ratios of node-level attributes to create a distribution of ERGM
VIFs that reflect variation in realistic ERGM specifications. Each network is predicted with three
random node-level attributes specified as main-effects. The variance-mean ratio of the node-level
attributes is varied between 0.5, 1, and 5 to introduce or reduce correlation. Network density
11
varies within realistic values (0.1, 0.2, 0.3) (Steglich, Snijders, and Pearson 2010) to gain insight
to the range of potential VIFs in networks with different densities. The simulated networks’
topological features are either random, scale-free, or small-world—global network structures that
often emerge in real-world social networks (Watts and Strogatz 1998; Barabasi and Albert 1999;
Newman, Strogatz, and Watts 2001).
[Table 1 here]
The final factor in the experiment is the number and type of endogenous parameters.
When these are included, there are three different endogenous terms: degree correlation, gwesp,
and degree popularity. Degree correlation measures the (Pearson’s) correlation of degree scores
between vertices in the network (Newman 2002); degree popularity is an actors’ degree score
multiplied by its square root (Snijders et al 2010); gwesp measures localized clustering through
edgewise shared partnership—a transitivity measure of shared edges between three or more
vertices.4 Formally, the terms can be defined as:
𝐷𝑒𝑔𝑟𝑒𝑒𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 = ∑ 𝑐𝑖 √𝑐𝑖
where c is the degree score for actor i,
𝐷𝑒𝑔𝑟𝑒𝑒𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 =
∑𝑦𝑐 𝑦𝑐(𝑤𝑦𝑐 − 𝑞𝑦 𝑞𝑐 )
𝜎2𝑞
where y is a tie, w is the joint probability distribution of the degree scores of two actors
connected by y, and q is the normalized distribution of 𝑤𝑦𝑐 , and
𝛼
𝑛−2
𝑔𝑤𝑒𝑠𝑝 = e ∑{1 − (1 − e−𝑎 )𝑖 }𝑝𝑖
𝑖=1
where p is the number of edgewise partnerships for i and α is a decay parameter (Hunter 2007).
Although not necessary, α is typically provided by the researcher to reduce computational
12
burden. These terms were chosen for their theoretical relevance and common usage (e.g. the
effects of global and local status hierarchies) and to mirror the data generating process for certain
network topologies, namely localized clustering and preferential attachment (Snijders et al.
2010). Either none, one, two or all three terms were included, depending on the condition. One
level of this factor corresponds to no endogenous terms, three levels correspond to one
endogenous factor (gwesp, degree correlation, or degree popularity), three levels correspond to
two endogenous factors (gwesp and degree correlation, gwesp and degree popularity, or degree
correlation and degree popularity), and the final level corresponds to all three endogenous
variables.5 These combinations yield eight categories for the condition. Thus, a single ERGM
could yield three (no endogenous variables), four (one endogenous variable), five (two
endogenous variables), or six VIFs (three endogenous variables), depending on the level of the
factor.
To summarize, the Monte Carlo experiment varied the network density (3 categories),
network topology (3 categories), the variance-mean ratio of node-level variables (3 categories),
and specifications for the right-hand side of the ERGM equation (8 categories). The entire
experiment included 216 conditions (3 x 3 x 3 x 8 = 216). With 1,000 replications per condition,
the entire sample space is the estimates from 216,000 ERGMs.6 Below are summary statistics of
the estimates; I also model the differences between models as a function of the manipulated
factors in the Monte Carlo experiment. With an understanding of the properties of the VIF, the
method’s association with instability in parameter variance estimates is evaluated. Supplemental
analyses are also available for large networks with 750 vertices (online supplement). An
extended examination of the parameter space of the ERGMs is provided in the Appendix.
13
RESULTS
The distribution of ERGM VIFs
The first concern is to characterize the distribution of VIFs retrieved from the Monte
Carlo experiment to understand how the proposed method behaves across a range of ERGM
specifications. Unlike VIFs in GLM, ERGM estimation is uniquely tolerant to high correlations,
and so models with multicollinearity problems may yield absurdly high VIFs (especially those
models which have a problematic Hessian matrix). The range of VIFs in the experiment stretches
from 1.00 to 1.28x1012 . The mean VIF is 6.07x106 with a standard deviation of 2.04 x 109 .
The median VIF is 1.64.
VIFs vary by network topology. Particularly, ERGMs fit to scale-free networks tend to
generate large VIFs. The range of VIF in scale-free networks stretches between 1 and
1.28x1012 . To compare, the mean VIF for scale-free networks is 1.83 x107 with a standard
deviation of 3.54 x109 , while the VIF for random networks is 456 with a standard deviation of
9.57 x105 and an upper limit of 2.75 x107 . The mean VIF for small-world networks is 2,007
with a standard deviation of 2.75 x105 and upper limit of 1.31x108 . This suggests that
networks exhibiting preferential attachment may be at high risk for multicollinearity.
What ERGM specifications are likely to produce collinearity-type problems?
The next step is to identify ERGM specifications which may increase the likelihood of
multicollinearity. Prior research suggests that an increasing amount of endogenous terms may
increase the likelihood of encountering collinearity-type problems (Lusher et al. 2013;
Chandrasekhar and Jackson 2014). However, it is unclear whether a simple increase in the count
of endogenous parameters increases the risk of multicollinearity by complicating the MCMC
14
procedure or if certain endogenous variables simply tend to be highly correlated with other
variables at a greater rate than node-level characteristics. Multicollinearity may also decrease as
network density increases because the specific attributes or endogenous effects which correlate
with selection patterns may be easier to identify as actors connect with one another more
frequently. There may also be an interactive effect, where endogenous parameters specified on
high density networks generate multicollinearity because endogenous parameters tend to be
highly correlated with the edges of the network.
These potential trends are investigated empirically using multilevel regression on the
distribution of VIFs (VIFs nested in ERGMs). Since VIFs are heavily skewed, 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 is
taken to trend the data linearly. The mean 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 is 0.929 with a standard deviation of
0.165, and a range of 0.228 to 1.395. Figure 1 demonstrates that this transformation
approximates a normal distribution. 𝐿𝑜𝑔(𝑉𝐼𝐹)0.1 is predicted with network topology, the
variance-mean ratio of node-level attributes, network density, and the right-hand specifications
of the ERGM equation. The right-hand specification of the ERGM equation is operationalized as
seven dummy variables, one for each combination of endogenous effects. Dyad-independent
ERGMs are omitted as the reference category. The model includes a randomly varying intercept
at the second level to account for potential nesting in ERGMs.
[Figure 1 here]
Model 1 includes network density, variance mean ratios, network topology, and the
effects of endogenous parameters (Table 2). Increases in density are correlated with growing
VIFs. VIFs decrease as the variance-mean ratio of exogenous variables increases. Scale-free
networks are associated with higher VIFs than random networks, while small-world networks are
associated with lower VIFs. When only one endogenous effect is specified in the ERGM, VIFs
15
tend to be lower than when only node-level variables are specified. However, when more than
one endogenous parameter is specified in the model, VIFs increase. These trends are
demonstrated in Figure 2, which shows that VIFs tend to cluster around higher values as the
number of endogenous variables in an ERGM increases. Notably, when an ERGM is fit with
two-degree related parameters (degree popularity and degree correlation), the effect size is larger
than in other two-endogenous variable combinations. These results indicate that researchers
should use caution when specifying multiple endogenous variables in ERGM, especially multiple
degree-related parameters.
[Table 2 here]
[Figure 2 here]
Model 2 includes interactions between ERGM specifications and network density. For
the most part, specifying one endogenous variable correlates with lower VIFs. However, as more
endogenous variables are specified and network density grows, VIFs tend to increase. Figure 3
shows this graphically. VIFs tend to be low in models specifying a single endogenous parameter,
net of network density. As density increases, VIFs decrease when gwesp is the only endogenous
parameter in the model. These results suggest that specifying a single endogenous effect may
help offset multicollinearity in otherwise dyad-independent models of dense networks.
[Figure 3 here]
Still, endogenous parameters generate larger VIFs as more parameters are included in an
ERGM (Figure 3). The most consistent effect is when degree popularity and degree correlation
are simultaneously specified. This variable combination shows little fluctuation across levels of
network density. When gwesp is specified alongside either degree correlation or degree
popularity, the effect size is approximately linear across levels of network density. Importantly,
16
the effect of specifying three endogenous parameters is greater at even low levels of network
density than specifying any other combination of gwesp, degree popularity, and degree
correlation at high levels of network density. The predicted value of VIFs in an ERGM fit to a
0.1 density network with three endogenous variables is 7.17 (𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 = 1.07); at high levels
of network density (0.3) it is 22.42 (𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 = 1.12). To compare, the predicted value for an
ERGM specifying degree popularity and degree correlation on a dense network is 5.10
(𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 = 1.05). Thus, researchers should be aware of the risk of multicollinearity when
specifying endogenous effects in ERGMS of dense networks.
Since the range of VIFs varies between networks with different topologies, it is useful to
examine how the VIF behaves in networks with diverse global structures. As in the full model,
the dependent variable is transformed to 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 to trend the data linearly. For random
networks, the mean 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 is 0.925, with a standard deviation of 0.158 and a range of
0.305 to 1.329. For scale-free networks, the mean is 0.945, with a standard deviation of 0.173
and a range of 0.228 to 1.395. For small-world networks, the mean to 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 is 0.917, with
a standard deviation of 0.164 and a range of 0.299 to 1.340. This transformation approximates a
normal distribution in all topologies.
Table 3 shows the results for multilevel regression of networks with different topologies.
There is little variation between models. However, the main effect of high network density (0.3)
is only statistically significant for scale-free networks, indicating that dense scale-free networks
are at greater risk for multicollinearity than either random or small-world networks. Similarly,
the effect of gwesp on VIFs is negative for scale-free networks, but positive for random and
small-world networks. Specifying three endogenous variables in scale-free networks also has a
noticeably larger effect on VIFs compared to either random or small-world networks. Thus,
17
researchers should use caution when specifying numerous endogenous effects in networks that
exhibit preferential attachment, regardless of network density. Researchers should also be aware
of the increased risk of multicollinearity in ERGMs fit to dense scale-free networks.
[Table 3 here]
Figure 4 explores interaction effects across network topologies. When three variables are
in the model, VIFs increase alongside network density. The effect size is greatest, as well as
most consistent, for scale-free networks. Even at low levels of network density (0.1), the
predicted VIF value for ERGMs fit to scale-free networks with all three endogenous parameters
is 22.42 (𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 = 1.12). ERGMs with three endogenous variables fit to random networks
exhibit a marginal decrease in VIF size as network density increases. Effects are mostly
consistent across network topologies when there are two variables in the ERGM. The
combination of degree parameters does not fluctuate across levels of network density.
Alternatively, the interactive effect of gwesp and degree correlation is approximately linear as
network density rises. One notable difference is that the combination of degree popularity and
gwesp exhibits a steep marginal increase between network densities of 0.2 and 0.3 in scale-free
networks. Turning to models predicted with a single endogenous variable, two trends stand out.
First, at high levels of network density, node-level variables show a marginal increase in effect
size for scale-free networks. This is not surprising, since most edges are connected to only a few
actors in scale-free networks. Second, a model specifying only gwesp in small-world networks is
associated with decreasing VIFs as the network becomes denser. This is also intuitive, since
gwesp parameterizes local clustering—a defining trait of small-world networks (Watts and
Strogatz 1998).
[Figure 4 here]
18
Figure 5 explores how the effects of specific variables vary across network topologies.
Across the plots, violins for the three topologies are similarly dispersed with comparable
medians. This indicates that variable specifications tend to have similar effects on multivariate
correlation across network topologies. However, there are a few notable differences. Random
networks tend to have the widest violins across types of variables, indicating that VIFs in random
networks concentrate more densely around similar values compared to either small-world or
scale-free networks. The densest concentrations of values for gwesp in small-world networks sit
near the median, indicating that the effects of gwesp tend to be most consistent in small-world
networks compared to either scale-free or random networks. Scale-free networks have the most
variability in the concentration of VIF values, with larger concentrations of values for structural
characteristics of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 > 1.25 than either random or scale-free networks. These values
reflect the trends revealed in Table 3: structural specifications may elicit collinearity problems in
networks formed through preferential attachment, while the gwesp statistic tends to be a
consistent estimator in networks exhibiting localized clustering.
[Figure 5 here]
A Monte Carlo simulation of ERGM VIFs gives insights to their properties. VIFs are
highly skewed in ERGM and frequently generate values far above what would be acceptable in
GLM. Multilevel regression identifies conditions which may increase or reduce VIFs in ERGM.
Scale-free networks tend to exhibit higher VIFs than random networks, while small-world
networks tend to exhibit lower VIFs. Similarly, increasing network density is associated with
larger VIFs in scale-free networks. For the most part, specifying a single endogenous variable is
associated with lower VIFs, but VIFs tend to increase as more endogenous variables are
19
specified. This is especially true when multiple degree-related parameters are specified or when
network density rises.
Interpreting ERGM VIFs
A final step is to identify when a VIF is large enough to warrant concern. This a problem
which requires creative thinking. Since VIFs can be used to detect the degree of inaccuracy in a
standard error, the association between a VIF and the size of a standard error will only be
moderate at best. One way to detect the relationship is to measure the association between VIFs
and fluctuations in an ERGM’s variance estimates. When ERGMs exhibit multicollinearity, the
same model can be rerun and yield different parameter estimates (Chandrasekar and Jackson
2014). Similarly, as an ERGM loses accuracy due to multicollinearity, parameter estimates
approach randomness, increasing the range of potential fluctuations for variance estimates
between model reruns. Thus, when an ERGM is rerun, the size of the change in the variance of
the diagonal of the variance/covariance matrix for the parameter estimates between models
should be positively correlated with large VIFs.7
To test this, I ran a four condition Monte Carlo simulation experiment. Conditions were
fixed to yield an increasing rate of multivariate correlations. As above, each condition generated
a 75-vertex random network and assigned three random continuous node-level attributes to the
vertices. The variance-mean ratio for the variables was set at 0.5; each ERGM was estimated
with three endogenous variables (gwesp, degree popularity, and degree correlation). Since the
interaction between dense networks and endogenous variables has been shown to yield higher
VIFs, network density is toggled between values of 0.1, 0.2, and 0.3 (3 categories). The final
condition fit an ERGM to a 0.1 density network with three random node-level variables and no
20
endogenous variables using MCMC MLE. The purpose of this condition was to generate
ERGMs with little-to-no multicollinearity. The simulation draws a random network, fits an
ERGM to the network, records the VIFs, parameter variance estimates (diagonal of the
covariance matrix), and whether the Hessian Matrix is unsolvable or near unsolvable.8 It then
reruns the exact same ERGM on the same network and records the same statistics. This enables
comparisons of VIFs with estimates of parameter variance between models. With 500
replications per condition, the sample space for the experiment is the summary statistics of 2,000
ERGM pairs.9
Since VIFs are highly skewed, it is necessary to transform the data to gauge the
correlation between variance estimates and VIFs. Table 4 reports the bivariate correlations
between average VIFs across both models and absolute differences in the parameter variance
estimates in log-log space.10 The tenth root of VIFs is taken to trend the data linearly.11 The
mean difference for VIFs is -0.11. Since multicollinearity in an endogenous effect can skew the
parameter estimates of other network processes (e.g. Faust 2007), correlations are also reported
for the maximum VIF in a model and maximum variance fluctuations in a model.
VIFs effectively predict instability in parameter estimates caused by multicollinearity.
VIFs are highly correlated with fluctuations in the focal variable’s variance (0.715) (p<0.001).
As VIFs increase, the size of parameter fluctuations for a focal variable also increase. Increasing
VIFs are also positively correlated with the largest variance (0.873) fluctuations in a model
(p<0.001). This reflects that, in many cases, the variable most affected by multicollinearity is not
the one causing it. Indeed, a highly collinear variable may skew variance estimates in the entire
model. Higher VIFs are also associated with higher average variance (0.853) fluctuations in a
model (p<0.001).
21
[Table 4 here]
Figure 6a examines (smoothed) trends in the distribution of VIFs. The solid line tracks
the average absolute difference in parameter variance estimates for a focal variable’s VIF. The
dashed line tracks the average size of the greatest absolute difference in parameter variance
estimates for any variable in a model as a function of the model’s highest VIF. Figure 6a shows
that differences in parameter variance estimates tend to increase at VIFs of around 20
(untransformed). An unsmoothed version of the plot (Figure 6b) shows considerable overlap
between the model and variable-level absolute differences in parameter variance estimates when
40 < VIF < 150. This suggests that the offending variable tends not to be the variable most
affected by multicollinearity in models with VIFs lower than 40 and in models with VIFs greater
than 150. The size of absolute differences in parameter variance estimates also plateau and trend
slightly downward beyond VIFs of 20. This is explored further below.
[Figure 6a here]
[Figure 6b here]
Table 5 reports the average size of absolute difference in parameter variance estimates.
When a variable’s VIF is between 20 and 150, the standard error for the variable fluctuates by an
average of 0.23. When a model has at least one VIF between 20 and 150, the average maximum
change in standard errors is 0.51. This means that, in an average model rerun where at least one
VIF is between 20 and 150, the least stable standard error in the model fluctuates by 0.51. In the
same range, the average difference in standard errors across all variables in the model is 0.09.
When VIFs are above 150, the average change in standard error for a focal variable is 0.10. For a
model, VIFs above 150 are associated with a mean maximum standard error change of 0.38 and
a mean change of 0.06 across all variables in the model. Variance estimates tend not to fluctuate
22
at any level of observation for VIFs below 20. Despite large VIFs and unstable standard errors,
the Hessian matrix was solvable in every ERGM.
[Table 5 here]
The results of Table 5 and Figures 6a and 6b imply that variance instability tends to be
equally problematic for any VIF above 20, and slightly less problematic when VIFs pass 150.
However, this is not the case. As VIFs increase, multicollinearity problematizes the MCMC
procedure. Consequently, the accuracy of parameter estimates plummets and standard errors
grow increasingly random. The result of an unstable Markov Chain and increasingly random
parameter estimates is that the absolute difference in parameter variance estimates will not
always be large in models exhibiting severe multicollinearity. They may be quite small by pure
chance. Thus, the decrease in average absolute difference in parameter variance estimates when
VIFs > 150 reflects heteroskedastic fluctuations in a sparse sample space (only 7.6% of VIFs
were between 150 and 1,686, the maximum VIF).
Figure 7 exhibits this trend for the VIF of a focal variable and the absolute difference in
parameter variance estimates for that focal variable across model reruns. Figure 8 shows the
same trend for the maximum absolute difference in parameter variance estimates in a given
model. As VIFs increase, the size of the absolute difference in parameter variance estimates
between model reruns grows inconsistent, indicating that the ERGM has difficulty reaching the
target distribution. Absolute differences in parameter variance estimates are especially
inconsistent when VIFs pass 150. This shows that, in some severe cases, standard errors in
ERGMs with a VIF above 150 are essentially random. Thus, although the mean size of absolute
differences in parameter variance estimates is largest when 20<VIF<150 (Table 5), variance
estimates are particularly unreliable when VIFs pass 150 (Figure 7, Figure 8).
23
[Figure 7 here]
[Figure 8 here]
Results indicate that VIFs are strongly and positively correlated with the size of the
absolute difference in parameter variance estimates. Figures 6a and 6b identified critical regions
for parameter instability. VIFs greater than 20 are associated with unstable variance estimates.
Similarly, if there is a VIF over 150 in a model, then it drives instability in other covariates in the
model. Figures 7 and 8 demonstrate that when VIFs exceed 150, standard errors are often
especially unreliable, and in some cases essentially random. With these results in mind, it is
recommended that researchers recognize the risk of multicollinearity when VIFs exceed 20, and
should be especially concerned if any VIF is above 150.
Empirical applications on five networks
To build intuition regarding the application of this measure, I provide empirical examples
on five networks. Four of the networks have been used in prior research or online ERGM
tutorials. First, I provide an in-depth empirical example that estimates all models twice. The
benefit of this approach is to illustrate parameter and variance instability when multicollinearity
exists in an ERGM. Next, I fit four additional ERGMs to four empirical networks and discuss
how the VIFs for these ERGMs can be interpreted. Finally, I plot the mean VIFs from all models
and discuss what a researcher can infer about the model using the VIF.
The Faux Mesa high school network is a simulated network designed to reflect real-world
social networks. The network is based on the characteristics of the friendship network in one
rural high school in the AddHealth dataset (see Hunter et al. 2008). Below, I replicate the ERGM
specifications used by Shore and Lubin (2015) to model the Faux Mesa network, predicting
24
homophily attributes for sex, grade, and race, and the geometrically weighted degree distribution
(gwdegree) for the network (see Hunter and Handcock 2006).12 The final model adds a gwesp
term. Each ERGM forces MCMC MLE estimation instead of pseudo-likelihood. VIFs are
evaluated across models. Each model is rerun under the same model specifications to evaluate
the stability of the standard errors.
Model 1 predicts the Faux Mesa network using homophily attributes. Each of these
parameters remains relatively uncorrelated, with VIFs staying well below a value of 3. Model 2
adds the gwdegree statistic for the network. VIF values increase noticeably, above levels that
most researchers would be comfortable with in GLM, but which are not problematic in ERGM.
[Table 6 here]
Model 3 includes the gwesp parameter. The VIFs for all variables increase, except for
gwdegree, which diminishes. Particularly, VIFs for homophily (Grade) and gwesp jump
significantly to values over 80. This suggests that these two terms are highly correlated. This is
reflected in the correlation matrix of the posterior distribution of the ERGM: for gwesp and grade
homophily, r = 0.989 (Table 7). Notably, standard errors are less stable in the third model
compared to Models 1 and 2. The high correlation between gwesp and grade homophily affects
the stability of standard errors for other covariates in the model. In the case of racial homophily
and gwdegree, the level of significance changes from p<0.05 to p<0.01. Similarly, the standard
errors for sex homophily and racial homophily—both of which are highly correlated with gwesp
and grade homophily (Table 7)—fluctuate by roughly 40%. This points to multicollinearity in
the third model and indicates that the cause of multicollinearity is the association between gwesp
and grade homophily. In this case, where the source of the problem is a bivariate correlation,
either of the offending terms can be removed from the model to reduce collinearity.
25
[Table 7 here]
To provide further intuition on how the method can be applied in substantive research, I
present VIFs for ERGMs fit to four networks that are often used as example cases in ERGM
tutorials: the Faux Magnolia network, Sampson’s monk network, the Florentine Marriage
network, and the Local Health Department Services network (LHDS). Like the Faux Mesa
network, the Faux Magnolia network is a simulated network designed to mirror real-world high
school friendship networks. It is frequently used by the statnet team—developers of the leading
ERGM statistical software—in online and conference ERGM tutorials (Goodreau et al. 2008).
The variable specifications available for this network are identical to the Faux Mesa network.
The Sampson Monks network is also a classic network study used in a popular online ERGM
tutorial (Levy 2016). The Sampson network is a directed network of affiliations in a monastery.
Monks are classified in the network by their group membership, which indicates whether or not
they belong to a certain faction within the monastery. The LHDS network is a network compiled
by Harris (2014) in her introductory book on ERGM. The LHDS network is a network of
affiliations between leaders of health department offices. Actors in this network are classified by
how long they have been in their leadership position and by the population in millions of the
jurisdiction over which they preside. The final network, the Florentine marriage (Medici)
network, is the network studied by Padgett and Ansell (1993) in their classic research. Ties in the
network are marital associations between powerful families in 15th century Florence. Members in
this network are classified by their wealth and whether they have some pre-existing business
relationship.
Model building for these networks proceeds as follows. In Model 1, I fit an ERGM to
each network with three explanatory variables. This is the minimum number of variables
26
necessary for the VIF to yield insight beyond a simple Pearson’s correlation. In Model 2, ERGM
specifications are identical to the respective tutorials mentioned above, with the exception of the
Florentine marriage network. In Model 3, I add additional network statistics to all models to
force multicollinearity in some models and not in others. The goal of this step is to help delineate
when a researcher may or may not be concerned with multicollinearity. Variables in these
models were chosen to represent a wide range of commonly used ERGM parameters.
Table 8 presents VIFs for the four ERGM examples. In the Faux Magnolia network, the
largest VIF is 3.72 for the grade homophily statistic, 3.53 for race homophily, and 2.88 for sex
homophily. These are all extremely low correlations by ERGM standards. From these results, a
researcher can infer that multicollinearity is not a problem. The same conclusion can be reached
for the Sampson network, where the largest VIF is 3.8 for the reciprocity parameter. As a dyadindependent ERGM, the LHDS network also yields extremely low VIFs. Here, the largest VIF is
1.1, indicating that correlations between explanatory variables are almost not existent. This is not
surprising, since the variable categories here are a priori mutually exclusive. Model 1 for the
Florentine marriage network also exhibits extremely low VIFs. Here the largest VIF is 3.34 for
the node-level wealth covariate, and the lowest VIF is for the gwdegree statistic.
[Table 8 here]
In Model 2, where ERGM specifications in the first three models are identical to existing
tutorials, VIFs increase noticeably across the networks. The largest VIF in the Faux Magnolia
network is 8.66 for the grade homophily parameter, while race homophily VIF is 6.93, and the
sex homophily VIF is 4.78. The new term, gwesp, has a VIF of 4.19. Although VIFs double in
size in this model, they are still well below what ERGMs can tolerate. Thus, a researcher can
27
conclude that multicollinearity is not a problem in this ERGM, even though VIFs of this size
should raise concern in GLM.
The trends in Model 2 for the Faux Magnolia network are comparable to the Sampson
Monk ERGM. Here, the group homophily term yields the largest VIF of 8.67, followed by
gwesp, reciprocity, and finally gwdegree. As above, collinearity does not appear to be affecting
model estimates. VIFs in the LHDS network remain extremely low, with no VIF exceeding 1.15.
As a dyad-independent ERGM, this is not surprising. Finally, of the four empirical networks,
Model 2 for the Florentine marriage network yields the largest overall VIF at 9.09 for the 2-star
parameter, followed by wealth, gwesp, and then gwdegree. Since the VIFs for wealth and 2-stars
are much larger than the VIFs for either of the gw terms, a researcher may conclude that the 2star and wealth parameters are correlated. However, as above, the levels of correlation are well
below what an ERGM can be reasonably expected to tolerate.
In Model 3, I add terms beyond what is specified in current tutorials to force collinearity
in some models. The Faux Magnolia network maintains low VIFs by ERGM standards. The
grade homophily and sex homophily terms show small gains in the VIFs, indicating that the two
terms are likely correlated with the gwdegree statistic added in Model 3. Turning to the Sampson
Monk network, VIFs remain small and unconcerning even after adding gwodegree. This is also
true of the Florentine Marriage network, where the largest VIF is 8.60 for the 2-star parameter
after controlling for whether a Florentine marriage is correlated with a business relationship.
These trends do not hold, however, for the LHDS network. Adding gwesp to the ERGM
increases VIFs across all variables in the model. The newly added gwesp term is the largest VIF
at a value of 327.9, followed by the jurisdiction population at 144.5. A VIF of this size clearly
indicates multicollinearity in Model 3. Further evidence for multicollinearity in this model is the
28
jurisdiction parameter, where the coefficient changes direction (from positive to negative)
between Model 2 and Model 3, as well as noticeable change in the strength and size of
coefficients and standard errors for the leadership position parameters in the model. Since
correlations were extremely low in prior models and no other VIF is of comparable size to the
gwesp parameter, a researcher can conclude that gwesp is the source of multicollinearity. A
researcher examining this network should remove the leading VIF—gwesp—to reduce
multicollinearity.
To summarize these trends graphically, Figure 9 plots the mean VIF across all three
models for the five empirical networks: Faux Mesa, Faux Magnolia, Sampson, LHDS, and the
Florentine Marriage (Medici) network. The dashed lines plot the three networks where
multicollinearity did not affect ERGM estimates. In these models, the mean VIF stays below 8
and the largest VIF stays below 10. In the Faux Mesa network—a network with a moderate
collinearity problem—the largest VIF reaches 80 and the mean VIF reaches 29.7. In this case,
researchers taking a cautious approach should omit the offending variable (gwesp) to reduce
collinearity. This is the preferred approach. However, in some cases, researchers may be
especially interested in the offending variable on theoretical grounds. In cases of light to
moderate collinearity, it may be possible to extend the length of the Markov Chain or the size of
the MCMC sample and to rerun the model multiple times to evaluate the extent of collinearity in
the ERGM. If p-values, standard errors, and parameter estimates stay relatively stable with these
adjustments, a researcher may be able to interpret the parameters anyways. However, this
approach is not recommended for a VIF above 50, and should be treated with scrutiny in general.
[Figure 9 here]
29
Turning to the LHDS network Model 3, the largest VIF is over 300 and the mean model
VIF is 134. This is well beyond the threshold for severe multicollinearity (discussed above) and
should be raise concern. The severity of the problem is reflected in the unstable parameter and
variance estimates for Model 3 in Table 8, where one coefficient changes direction and three
others dramatically change in strength. In this case, a researcher cannot interpret any parameter
estimates reliably; gwesp should be removed from the ERGM.
In summary, these results demonstrate the utility of the method. VIFs pinpoint the
sources of multicollinearity by capturing multivariate correlations. In cases where the VIF is
relatively small to moderate in size (e.g. 20-50), the problem may be subverted by increasing the
length of the Markov Chain or the size of the MCMC sample. However, in moderate or severe
cases, the offending variable will usually need to be omitted from the model entirely. Theoretical
reasoning should be used to inform which variable to remove in cases where multiple variables
exhibit high VIFs.
DISCUSSION
This article proposed a method to diagnose multicollinearity in ERGM. The method for
calculating VIFs in ERGM is robust to both network data as well as contextual network effects.
Thus, it resolves the problems outlined in the introduction by allowing researchers to pinpoint
which predictors exhibit high multicollinearity and to gauge the severity of multicollinearity in
an ERGM. The method has been shown to perform comparably to well-known methods for
GLM, yielding similar results and predicting unstable variance estimates with high reliability. As
such, it outperforms other approaches to detecting multicollinearity in ERGM, such as relying on
bivariate correlations or the Hessian matrix. Moreover, the method requires little computational
30
power or specialization in statistical network analysis to execute (outlined under Eq. (4)). Thus,
the approach is relatively parsimonious and should be easily applied by researchers who may not
be familiar with network analysis.
Since the properties of VIFs in ERGM are different from VIFs observed in GLM, a 216
condition Monte Carlo simulation experiment was conducted to evaluate ERGM VIFs. The VIFs
showed significant kurtosis and extremely high values. Results from multilevel regression give
insight to ERGM conditions which increase the risk of multicollinearity. First, VIFs tend to
increase as more endogenous variables are included in the model. Thus, researchers should be
cautious when specifying endogenous effects to improve model fit in ERGM, as the
overspecification of endogenous effects may induce collinearity-type problems. Second, ERGMs
specifying numerous endogenous effects on dense networks tend to exhibit high VIFs. Thus,
researchers modelling dense networks should be particularly aware of how endogenous effects
may introduce collinearity problems. Third, networks with certain global properties are at greater
risk for multicollinearity. Scale-free networks tend to generate larger VIFs than either random or
small-world networks. Researchers should be aware of the increased risk of multicollinearity
when modelling networks that exhibit preferential attachment or as the degree distribution of the
network approaches a power-law (e.g. Barabasi and Albert 1999), especially in dense networks.
When interpreting VIFs, it is recommended that researchers consider multicollinearity a
possibility when VIFs rise above 20. VIFs above 150 should be considered especially
concerning. Moreover, researchers using ERGM should note that as VIFs rise, there is a greater
likelihood that parameter instability will not be located in the focal variable. In GLM, if two
variables are not correlated, multicollinearity may not warrant concern. However, network data
31
is usually highly correlated and high collinearity may affect MCMC estimation. Thus, as VIFs
rise, researchers should grow skeptical of the standard errors for all covariates in the model.
If a researcher is wed to a variable exhibiting a high VIF, multicollinearity can be
investigated more closely by observing fluctuations in standard errors with model reruns.
However, a single model rerun may not capture parameter instability because the margin of error
for variance estimates grows as multicollinearity becomes increasingly problematic. Thus, if a
researcher resorts to rerunning models, they should rerun the model numerous times. Of course,
reruns are not always feasible when fitting ERGMs to large networks or when specifying
complicated models. If an offending variable is not specified on theoretical grounds, it should
simply be removed.
This advice comes with a caveat. Although multicollinearity can often be detected
through unstable standard errors, it can still affect relatively stable ERGM estimates. Large VIFs
should raise concern even when standard errors do not change across model reruns. For this
reason, researchers should note that rerunning the same model multiple times may not
approximate ‘true’ standard errors when VIFs are high, even if standard errors only exhibit mild
fluctuations.
One benefit of the proposed approach to calculating VIFs in ERGM is that the general
method may be translatable to dynamic network models with little modification (Hanneke, Fu,
and Xing 2010; Snijders et al. 2010; Krivitsky and Handcock 2014). Namely, the stochastic
actor-oriented model (SAOM) and temporal exponential random graph model (TERGM) both
utilize a similar MCMC estimation procedure as ERGM and face comparable collinearity issues.
The proposed method can potentially be adapted to evaluate SAOM or TERGM, though what the
properties of VIFs will be in the longitudinal context is unclear.
32
Future research may also consider the effects of other endogenous specifications. Due to
constraints on computational power and processing time, the number of endogenous terms
included in the ERGMs for these simulations has been limited. The terms provided in this paper
have been selected due to their widespread popularity and common usage (Hunter 2007; Robins
et al. 2007; Snijders et al. 2010). A promising direction for future research is to investigate how
variance estimates change with the inclusion of less common endogenous effects, such as the arc
tangent or k-cores. These effects are sometimes applied as specific measures in research (e.g.
Hipp et al. 2013), but may have less clear behaviors when included with other network terms.
Similarly, selective-sorting parameters such as homophily and heterophily are widely used by
networks scholars, but may have distinct behaviors.
APPENDIX
Table A1 presents additional information on how VIFs vary by ERGM specification and
variable type. As demonstrated in the multilevel regression, ERGMs with three endogenous
variables tend to generate the largest VIFs, followed by ERGMs specifying degree popularity
and degree correlation, ERGMs specifying gwesp and degree popularity, and ERGMs specifying
gwesp and degree correlation. Moving down the “Overall” column, results indicate that dyadindependent ERGMs tend to generate low VIFs. Notably, the average VIF tends to be lower in
models specifying a single endogenous parameter compared to dyad-independent ERGMs.
However, beyond a single endogenous variable, adding additional endogenous effects increases
the mean VIF in an ERGM.
[Table A1 here]
This is consistent across variable types as well. The lowest observed mean VIFs are for
endogenous effects specified in ERGMs with only a single endogenous variable. Inversely, the
33
largest mean VIFs are for endogenous effects in complex (three endogenous variable) ERGMs.
Even in ERGMs with only two endogenous variables, the mean VIF for these variables is
sometimes quite large (e.g. exp(1.16^10)=82.39). Moving down the column for actor-level
attributes, there is relatively little variation in the mean VIF for actor-level variables as more
endogenous effects are added to the model. Consistent with the endogenous effects, the largest
mean VIF for actor-level attributes is among ERGMs specified with three endogenous
covariates.
Figure A1 provides violin plots examining how VIFs vary by variable type and the
variance-mean ratio of exogenous variables. The top panel shows VIFs in ERGMs with highly
correlated endogenous variables. Unsurprisingly, the actor-level covariates in these ERGMs have
the largest modes and medians of the three panels. Interestingly, the density of the endogenous
covariates is also widely dispersed in this condition. The panel for the equidispersed (variancemean ratio = 1) actor-level variables shows a similar pattern as the under-dispersed panel. One
notable difference is that the violin for the equidispersed actor-level covariates has a lower
median than the under-dispersed violin for actor-level covariates. This is also distinct from the
over-dispersed panel, where actor-level covariates have the lowest median and the lowest mode
of any panel.
[Figure A1]
Table A2 explores how VIFs vary by variable type, ERGM specification, and network
density. For the most part, results in this table are consistent with Table A1 in low density (0.1)
networks. The lowest VIFs in this model are for endogenous effects in ERGMs with only one
endogenous covariate, while the largest VIFs are for endogenous variables in ERGMs specified
with three endogenous covariates. Notably, in low density networks, specifying both degree
34
popularity and degree correlation in a model yields a mean VIF for each of those effects larger
than the mean VIF for gwesp in an ERGM specified with three endogenous effects.
[Table A2]
Turning to ERGMs specified on networks with moderate density (0.2), results for less
complicated models—dyad-independent models and those with a single endogenous covariate—
are consistent with low density networks and the aggregate results in Table A1. Here, however,
the mean VIF for the gwesp statistic in ERGMs specified with two or three covariates is
noticeably larger. Moreover, the size of the VIF for endogenous covariates specified alongside
gwesp in ERGMs with only two structural effects are noticeably larger in moderate density
networks compared to low density networks. The mean VIFs for degree effects specified in the
“Deg. Pop. + Deg. Cor.” models are only slightly larger than in low density networks. This is
also true of the three-endogenous-effect ERGMs, where the gwesp term is noticeably larger, but
the degree statistics only increase slightly.
These trends continue into the high density (0.3) networks. Again, actor-level covariates
remain relatively unaffected by increasing network density; neither do single endogenous
variables in less complex models. In more complex models, however, the gwesp statistic again
shows noticeable increases. The degree statistics also show gains, but they are smaller. Here, the
largest VIF is for the degree popularity statistic. In the simulated models with three endogenous
covariates, the mean degree popularity score is 1487.72 (exp(1.22^10)).
Figure A2 breaks down variation in specific variables by network density. The top panel
plots low density networks. Variable effects in this network have the strongest central
tendencies, with medians and modes being closest in value in these models. In the moderate
density network, values tend to be more dispersed, with a higher concentration of VIFs at higher
35
values and weaker central tendency. This trend comes to a head in the third plot, where medians
are noticeably below modes in high density networks. Another notable trend is the steady
increase in the gwesp VIF as network density increases. The degree statistics also show a similar
pattern, but it is less dramatic. These results indicate that gwesp—and possibly other triangle
terms—may generate high correlations when specified with other endogenous effects in dense
networks. Moreover, parameterizing multiple degree effects may increase the risk of
multicollinearity regardless of network density.
[Figure A2]
36
1
Originally, this was accomplished using a Gibbs sampler, but more recent approaches rely on variants of the
Metropolis Hastings algorithm.
2
This can be reduced to Eq. (2) for a more parsimonious equation in the regression context. However, this reduction
changes the input values on the right-hand side of the equation from correlations (parameter space) to observations
and is therefore inappropriate for ERGM.
3
A distribution of 1,000 networks is sufficient for exploratory purposes, but the number of networks should be
increased in larger networks or ones with potentially problematic coefficients.
4
The gwesp term is a geometrically weighted term for edgewise shared partnership. It falls under the ‘curved’
family of ERGM terms and is generally a more robust estimate than linear triadic closure. See Hunter (2007) for an
introduction and discussion.
5
All dyad-independent models are estimated with MCMC MLE instead of pseudo-likelihood.
6
Since the purpose of this simulation was to force varying degrees of collinearity, many of the more complex
ERGMs did not converge (presumably due to multicollinearity). Since these ERGMs did not yield estimates, results
from these simulations were not stored or recorded. Results are restricted to only those ERGMs which yielded
estimates, as per the assumptions of the method outline in Eq. (4).
7
Of course, this correlation will not be perfect since models will also be inconsistent in their degree of instability. In
other words, the rate of change in variance estimates between model reruns will approach randomness as variance
estimates grow increasingly unreliable. Therefore, differences in parameter variance estimates will not necessarily
be larger as when they grow less reliable.
8
Since VIFs detect inflation in the variance estimates of the model and only evaluate standard errors by extension, I
focus on instability in variance estimates.
9
To ensure that parameter instability was not an artifact of insufficient MCMC sampling, a smaller experiment was
run with the same conditions as outlined in this section, but with a longer Markov Chain (MCMC interval = 5,000)
and larger MCMC sample size (5,000). Each condition fit 100 ERGM pairs and yielded almost identical results.
10
Absolute differences in parameter variance estimates are total values between models. Rate of change is not
reported because small fluctuations may yield large percent change. For example, a variance estimate shifting
between 0.001 and 0.002 would be a 100% increase but would not be cause for concern.
11
Absolute values are used so that negative fluctuations trend in the same direction as VIFs, which have a lower
bound of 1. All analyses were also performed using VIFs in the first ERGM and VIFs in the second ERGM. Results
were almost identical.
12
Like the gwesp term, the gwdegree decay parameter is fixed at 0.7.
37
References
An, Weihua. 2016. “Fitting ERGMs on big networks.” Social Science Research 59: 107-119.
Barabasi, Albert-Laszlo and Reka Albert. 1999. “Emergence of Scaling in Random Networks.”
Science 286: 509-512.
Chandrasekhar, Arun G., and Matthew O. Jackson. 2014. “Tractable and Consistent Random
Graph Models.” Physics and Society. DOI: arXiv:1210.7375v4
Chatterjee, Samprit and Bertram Price. 1991. Regression Analysis by Example. New York:
Wiley.
Faust, Katherine. 2007. “Very Local Structure in Social Networks.” Sociological Methodology
37(1): 209-256.
Fox, John, and Georges Monette. 1992. “Generalized Collinearity Diagnostics.” Journal of the
American Statistical Association 87(417): 178-183.
Fox, John. 2008. Applied Regression Analysis and Generalized Linear Models. Thousand Oaks,
CA: Sage Inc.
Frank, Ove, and David Strauss. 1986. “Markov graphs.” Journal of the American Statistical
Association 81: 832-842.
Geyer, Charles J., and Elizabeth A. Thompson. 1992. “Constrained Monte Carlo Maximum
Likelihood for Dependent Data.” Journal of the Royal Statistical Society Series B 54(3):
657-699.
Goodreau, Steven M., Mark S. Handcock, David R. Hunter, Carter T. Butts, and Martina
Morris. 2008. “A statnet Tutorial.” Journal of Statistical Software 24(9): 1-26.
38
Goodreau, Steven M., James A. Kitts, and Martina Morris. 2009. “Birds of a Feather, or Friend
of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social
Networks.” Demography 46(1): 103-125.
Hanaki, Nobuyuki, Alexander Peterhansl, Peter S. Dodds, Duncan J. Watts. 2007. “Cooperation
in Evolving Social Networks.” Management Science 53(7):1036-1050.
Handcock, Mark S. 2003. “Statistical models for social networks: degeneracy and inference.” Pp.
229-240 in Dynamic Social Network Modeling and Analysis, ed. Ronald Breiger and
Phillipa Pattison. National Academies Press.
Handcock, Mark S., Garry Robins, Tom Snijders, Jim Moody, and Julian Besag. 2003a.
“Assessing Degeneracy in Statistical Models of Social Networks.” Journal of the
American Statistical Association 76:33-50.
Handcock, Mark S., David R. Hunter, Carter T. Butts, Steven M. Goodreau, and Martina Morris.
2003b. statnet: Software tools for the Statistical Modeling of Network Data. Retrieved
May 22nd, 2017 (http://statnetproject.org).
Hanneke, Steve, Wenjie Fu, and Eric P. Xing. 2010. “Discrete temporal models of social
networks.” Electronic Journal of Statistics 4:585-605.
Harrell, Franke E. 2001. Regression Modeling Strategies. Springer Series in Statistics.
Harris, Jenine K. 2014. An Introduction to Exponential Random Graph Modeling. Thousand
Oaks: Sage.
Hipp, John R., Carter T. Butts, Ryan Acton, Nicholas N. Nagle, and Adam Boessen. 2013.
“Extrapolative simulation of neighborhood networks based on population spatial
distribution: Do they predict crime?” Social Networks 35:614-625,
39
Holland, Paul W., and Samuel Leinhardt. 1981. “An Exponential Family of Probability
Distributions for Directed Graphs.” Journal of the American Statistical Association
76(737): 33-50.
Hunter, David R. 2007. “Curved exponential family models for social networks.” Social
Networks 29(2): 216-230.
Hunter, David R., Steven M. Goodreau, and Mark S. Handcock. 2008. “Goodness of Fit of
Social Network Models.” Journal of the American Statistical Association 103(481): 248258.
Hyojoung, Kim, and Peter S. Bearman. 1997. “The Structure and Dynamics of Movement
Participation.” American Sociological Review 62(1): 70-93.
Koskinen, Johan, and Tom A. B. Snijders. 2013. “Simulation, estimation, and goodness of
fit.” In Ed. Dean Lusher, Johan Koskinen, and Garry Robins. Exponential Random
Graph Models for Social Networks: Theory Methods and Applications. Cambridge
University Press: Cambridge.
Kreager, Derek A., Jacob T. N. Young, Dana L. Haynie, Martin Bouchard, David R. Schaefer,
and Gary Zajac. Forthcoming. “Where ‘Old Heads’ Prevail: Inmate Hierarchy in a Men’s
Prison Unit.” American Sociological Review.
Krivitsky, Pavel N., and Mark S. Handcock. 2014. “A separable model for dynamic networks.”
Journal of the Royal Statistical Society B 76(1): 29-46.
Levy, Michael. 2016. “ERGM Tutorial.” Retrieved October 11th, 2017
http://michaellevy.name/blog/ERGM-tutorial/
Lusher, Dean, Johan Koskinen, and Garry Robins. 2013. Exponential Random Graph Models for
Social Networks: Theory, Methods, and Applications. Cambridge, UK: Cambridge Press.
40
Mooney, Christopher Z. 1997. Monte Carlo Simulation. Thousand Oaks, CA: Sage.
Newman, Mark E. J., Steven H. Strogatz, and Duncan J. Watts. 2001. “Random graphs with
arbitrary degree distributions.” Phyiscal Reviews E 64: 026118.
Newman, Mark E. J. 2002. “Assortative mixing in networks.” Physical Review Letters 89:
208701.
O’Brien, Robert M. 2007. “A Caution Regarding Rules of Thumb for Variance Inflation
Factors.” Quality and Quantity 41: 673-690.
O’Malley, James A, and Jukka-Pekka Onnela. 2014. “Topics in Social Network Analysis and
Network Science.” Physics and Society. DOI: arXiv:1404.0067v1.
Padgett, John F., and Christopher K. Ansell. 1993. “Robust Action and the Rise of the
Medici, 1400-1434. American Journal of Sociology 98(6): 1259-1319.
Papachristos, Andrew V. 2009. “Murder by Structure: Dominance Relations and the Social
Structure of Gang Homicide.” American Journal of Sociology 115(1): 74-128.
Papachristos, Andrew V., David Hureau, and Anthony Braga. 2013. “The Corner and the Crew:
The Influence of Geography and Social Networks on Gang Violence.” American
Sociological Review 78(3): 417-447.
Rencher, Alvin C. 2002. Methods of Multivariate Analysis. Second edition. John Wiley & Sons
Inc.
Ripley, Ruth M., Tom A. B. Snijders, Zsofia Boda, Andras Voros, and Paulina Preciado. 2017.
Manual for RSiena. Retrieved May 22nd, 2017
(https://www.stats.ox.ac.uk/~snijders/siena/RSiena_Manual.pdf).
41
Robins, Garry, Pip Pattison, Yuval Kalish, and Dean Lusher. 2007. “An Introduction to
Exponential Random Graph (p*) Models for Social Networks.” Social Networks 29(2):
173-191.
Salter-Townshend, Michael, and Thomas Brendan Murphy. 2015. “Role Analysis in Networks
using Mixtures of Exponential Random Graph Models.” Journal of Computational
Graph Statistics 24(2): 520-538.
Shore, Jesse, and Benjamin Lubin. 2015. “Spectral goodness of fit for network models.” Social
Networks 43:16-27.
Snijders, Tom A. B. 2002. “Markov Chain Monte Carlo estimation of exponential random graph
models.” Journal of Social Structure 3.
Snijders, Tom A. B., Phillipa E. Pattison, Garry L. Robins, Mark S. Handcock. 2006. “New
specifications for exponential random graph models.” Sociological Methodology 36(1):
99-153.
Snijders, Tom A. B., and Roel J. Bosker. [1999] 2008. Multilevel Analysis: An introduction to
basic and advanced multilevel modeling. Third edition. London: Sage.
Snijders, Tom A. B., Gerhard G. van de Bunt, and Christian E.G. Steglich. 2010. “Introduction
to actor-based models for network dynamics.” Social Networks 32:44-60.
Steglich, Christian, Tom A. B. Snijders, and Michael Pearson. 2010. “Dynamic Networks and
Behavior: Separating Selection from Influence.” Sociological Methodology 40(1): 329393.
Thiemichen, Stephanie, and Goeran Kauermann. 2017. “Stable exponential random graph
models with non-parametric components for large dense networks.” Social Networks 49
(1): 67-80.
42
Watts, Duncan J. and Steven H. Strogatz. 1998. “Collective dynamics of ‘small-world’
networks.” Nature 393: 440-442.
Wimmer, Andreas, and Kevin Lewis. 2010. “Beyond and Below Racial Homophily: ERG
Models of a Friendship Network Documented on Facebook.” American Journal
Sociology 116(2): 583-642.
43
Table 1: Conditions of the experiment
Conditions
Values
Network density
.1
.2
.3
Variance mean ratios of exogenous variables
.5
1
5
Topology
Random
Scale-free*
Small-world
Endogenous parameters
gwesp
(0.7)**
Degree
popularity
Degree
correlation
Network size
75
*All scale-free networks were simulated by rewiring a random network with a scaling parameter
of 2.
** Decay (α) parameter of the gwesp term is fixed at 0.7. See Hunter (2007) for an introduction.
44
Table 2: Multilevel regression analysis of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1
Independent Variables
Model 1
Model 2
Network density
-
-
0.2
0.013 (0.000)
-0.000* (0.002)
0.3
0.022 (0.000)
0.009 (0.002)
-
-
1
-0.075 (0.000)
-0.074 (0.000)
5
-0.174 (0.000)
-0.174 (0.000)
-
-
Scale-free
0.020 (0.000)
0.020 (0.000)
Small-world
-0.007 (0.000)
-0.007 (0.000)
-
-
gwesp
-0.001 (0.001)
0.010 (0.001)
Deg. Pop.
-0.024 (0.001)
-0.024 (0.001)
Deg. Cor.
-0.021 (0.001)
-0.016 (0.001)
gwesp + Deg. Pop.
0.105 (0.001)
0.077 (0.001)
gwesp + Deg. Cor.
0.101 (0.001)
0.073 (0.001)
Deg. Pop. + Deg. Cor.
0.143 (0.001)
0.140 (0.001)
gwesp + Deg. Pop. + Deg. Cor.
0.200 (0.001)
0.187 (0.001)
-
-
0.2*gwesp
-
-0.016 (0.001)
0.2*Deg. Pop.
-
0.001 (0.001)
0.2*Deg. Cor.
-
-0.002* (0.001)
0.2*gwesp + Deg. Cor.
-
0.035 (0.001)
0.2*gwesp + Deg. Pop.
-
0.038 (0.001)
0.2*Deg. Cor. + Deg. Pop.
-
0.007 (0.001)
0.2*Deg. Cor. + Deg. Pop. + gwesp
-
0.016 (0.001)
0.3*gwesp
-
-0.030 (0.001)
0.3*Deg. Pop.
-
0.002* (0.001)
0.3*Deg. Cor.
-
-0.011 (0.001)
0.3*gwesp + Deg. Cor.
-
0.049 (0.001)
0.3*gwesp + Deg. Pop.
-
0.048 (0.001)
0.3*Deg. Cor. + Deg. Pop.
-
0.002* (0.001)
Variance mean ratios
Topology
Endogenous parameters
Endogenous parameters*density
45
0.3*Deg. Cor. + Deg. Pop. + gwesp
Intercept
-
0.024 (0.001)
0.917 (0.001)
0.927 (0.001)
0.000* (0.000)
0.000* (0.000)
-1.29 x106
-1.30 x106
Variance components (std. dev.)***
Level 2 intercept
AIC
𝑅 2 ****
0.44
0.44
*Not statistically significant at the 0.1 level; N for count of VIFS is 972,000, N for models is
216,000.
**Reference category for network density is 0.1; reference category for variance mean ratios is
0.5 reference category for endogenous parameters is whether the VIF observation was an
exogenous variable; reference category for topology is random networks; reference category for
interaction is density (0.1)*exogenous variable.
***In the null model, the variance component for the model level intercept is equal to 0.104 with
a standard deviation of 0.102. Thus, all regressions are constrained with a random second level
intercept. A likelihood ratio test confirms that this is the correct modelling choice compared to a
standard linear model (𝜒 2 = 199.01).
****𝑅 2 is Snijder and Bosker’s ([1999]2008) total variance method.
46
Table 3: Multilevel regression for three different network topologies
Independent Variables
Random
Scale-free
Small-world
Network density
-
-
-
0.2
-0.002* (0.002)
0.002* (0.002)
-0.001* (0.002)
0.3
0.001* (0.002)
0.023 (0.002)
0.002* (0.002)
-
-
1
-0.078 (0.001)
-0.070 (0.001)
-0.077 (0.001)
5
-0.176 (0.001)
-0.164 (0.001)
-0.184 (0.001)
-
-
-
gwesp
0.013 (0.002)
-0.011 (0.002)
0.027 (0.002)
Deg. Pop.
-0.040 (0.002)
-0.007 (0.002)
-0.025 (0.002)
Deg. Cor.
-0.006 (0.002)
-0.007 (0.001)
-0.034 (0.002)
gwesp + Deg. Pop.
0.084 (0.002)
0.084 (0.002)
0.061 (0.002)
gwesp + Deg. Cor.
0.084 (0.002)
0.084 (0.002)
0.059 (0.002)
Deg. Pop. + Deg. Cor.
0.136 (0.002)
0.136 (0.002)
0.149 (0.002)
gwesp + Deg. Pop. + Deg. Cor.
0.140 (0.002)
0.278 (0.002)
0.143 (0.002)
-
-
-
0.2*gwesp
-0.022 (0.002)
-0.000* (0.002) -0.026 (0.000)
0.2*Deg. Pop.
0.032 (0.002)
-0.004* (0.002) 0.001* (0.002)
0.2*Deg. Cor.
-0.003* (0.002)
-0.006* (0.002) 0.002* (0.002)
0.2*gwesp + Deg. Cor.
0.033 (0.002)
0.031 (0.002)
0.041 (0.002)
0.2*gwesp + Deg. Pop.
0.038 (0.002)
0.034 (0.002)
0.038 (0.002)
0.2*Deg. Cor. + Deg. Pop.
0.008 (0.002)
0.006* (0.002)
0.006* (0.002)
0.2*Deg. Cor. + Deg. Pop. + gwesp
0.027 (0.002)
0.001* (0.002)
0.020 (0.002)
0.3*gwesp
-0.020 (0.002)
-0.021 (0.002)
-0.049 (0.002)
0.3*Deg. Pop.
0.030 (0.002)
-0.026 (0.002)
-0.000* (0.002)
0.3*Deg. Cor.
-0.005* (0.002)
-0.027 (0.002)
0.015 (0.002)
0.3*gwesp + Deg. Cor.
0.050 (0.002)
0.030 (0.002)
0.067 (0.002)
0.3*gwesp + Deg. Pop.
0.056 (0.002)
0.034 (0.002)
0.064 (0.002)
0.3*Deg. Cor. + Deg. Pop.
0.009 (0.002)
-0.016* (0.002) 0.008 (0.002)
0.3*Deg. Cor. + Deg. Pop. + gwesp
0.041 (0.002)
-0.007 (0.002)
0.037 (0.002)
0.933 (0.001)
-0.927 (0.001)
0.936 (0.001)
Variance mean ratios
Endogenous parameters
Endogenous parameters*density
Intercept
47
Variance component
Level 2 intercept
𝑅 2 ***
0.000* (0.000)
0.000* (0.000)
0.000* (0.000)
0.41
0.52
0.43
*Not statistically significant at the 0.1 level; N for count of VIFS is 324,000, N for models is
72,000.
**Reference category for network density is 0.1; reference category for variance mean ratios is
0.5 reference category for endogenous parameters is whether the VIF observation was an
exogenous variable; reference category for topology is random networks; reference category for
interaction is density (0.1)*exogenous variable.
***𝑅 2 is Snijder and Bosker’s ([1999]2008) total variance method.
48
Table 4. Correlations between 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 and absolute difference of log(variance) estimates
VIF
Variance
Focal variable
0.715
Total model (maximum) 0.873
Total model (mean)
0.853
49
Table 5. Mean absolute difference in parameter variance estimates and standard errors for ranges
of VIFs
VIF value range
Variance
Std. Error
VIF < 20
0.00
0.00
20 < VIF
0.55
0.20
20 < VIF < 150
0.65
0.23
150 < VIF
0.20
0.10
VIF < 20
0.00
0.00
20 < VIF
1.29
0.47
20 < VIF < 150
1.51
0.51
150 < VIF
0.83
0.38
VIF < 20
0.00
0.00
20 < VIF
0.22
0.08
20 < VIF < 150
0.25
0.09
150 < VIF
0.14
0.06
Focal variable
Total model (maximum)
Total model (mean)
50
Table 6: Variance inflation factors in the Faux Mesa ERGM
Independent
Variables
Model 1
Model 1
Rerun
Model 2
Model 2
Rerun
Model 3
Model 3
Rerun
Edges
-6.602***
(0.201)
-6.601***
(0.197)
-5.695***
(0.237)
-5.686***
(0.234)
-7.135***
(0.305)
-7.235***
(0.269)
Homophily (Race)
0.431**
(0.149)
0.437**
(0.141)
0.356**
(0.135)
0.355*
(0.141)
0.298*
(0.139)
0.285**
(0.101)
Homophily (Sex)
0.647***
(0.154)
0.643***
(0.149)
0.652***
(0.152)
0.644***
(0.146)
0.504***
(0.139)
0.512***
(0.096)
Homophily (Grade)
2.835***
(0.177)
2.830***
(0.175)
2.632***
(0.178)
2.624***
(0.179)
2.021***
(0.177)
2.048***
(0.174)
gwdegree
-
-
-1.043***
(0.154)
-1.041***
(0.152)
0.526*
(0.208)
0.615**
(0.196)
gwesp
-
-
-
-
1.340***
(0.122)
1.334***
(0.114)
VIFs
-
-
-
-
-
-
Homophily (Race)
1.711
1.714
2.929
2.756
7.840
11.150
Homophily (Sex)
2.113
2.119
4.701
4.213
9.337
14.902
Homophily (Grade)
2.419
2.381
5.191
4.686
83.869
64.468
gwdegree
-
-
3.603
3.366
2.837
3.920
gwesp
-
-
-
-
80.198
62.724
1919
1919
1885
1885
1729
1730
BIC
1951
1951
***p<0.001; **p<0.01; *p<0.05
1925
1925
1777
1778
AIC
51
Table 7. Posterior correlation matrix for Model 3 in Faux Mesa ERGM
Homophily
(Race)
Homophily
(Sex)
Homophily
(Grade)
gwdegree
Homophily (Race)
1.00
Homophily (Sex)
0.882
1.00
Homophily (Grade)
0.930
0.938
1.00
gwdegree
0.602
0.648
0.617
1.00
gwesp
0.926
0.929
0.989
0.545
52
gwesp
1.00
Table 8. VIFs in ERGMs fit to four networks.
Independent Variables
Model 1
Model 2
Model 3
θ (SE)
VIF
θ (SE)
VIF
θ (SE)
VIF
Edges
-10.01***
(.115)
-
-9.78***
(.105)
-
-9.90***
(.134)
-
Homophily (Race)
1.20***
(.081)
3.53
.906***
(.073)
6.93
.939***
(.079)
6.41
Homophily (Grade)
3.23***
(.088)
3.72
2.74***
(.086)
8.66
2.75***
(.087)
9.50
Homophily (Sex)
.884***
(.071)
2.88
.778***
(.064)
4.78
.765***
(.062)
4.49
gwesp (.25)
-
-
1.80***
(.051)
4.19
1.85***
(.061)
5.13
gwdegree (.25)
-
-
-
-
.120
(.083)
2.95
Edges
-1.96***
(.257)
-
-1.13*
(.448)
-
-3.44**
(1.06)
-
Homophily (Group)
2.09***
(.318)
3.73
2.65***
(.459)
8.67
2.51***
(.498)
4.65
gwidegree (1)
-1.83*
(.750)
2.25
-3.07**
(.961)
2.32
-3.31**
(1.02)
2.36
Reciprocity
1.42**
(.475)
3.80
1.40**
(.469)
4.69
1.56**
(.523)
2.39
gwesp (.5)
-
-
-.445*
(.172)
6.72
-.303
(.224)
5.82
gwodegree (1)
-
-
-
-
1.48*
(.666)
3.85
Edges
-6.13***
(.063)
-
-6.23***
(.064)
-
-8.39***
(.075)
-
Leader in position
3 – 5 years
.136**
(.045)
1.07
.144**
(.045)
1.11
.076***
(.015)
65.70
Leader in position
6 – 10 years
.261***
(.042)
1.09
.279***
(.042)
1.08
.076***
(.010)
53.74
Faux Magnolia
Sampson
LHDS
53
Leader in position
11+ years
.322***
(.040)
1.10
.337***
(.040)
1.11
.107***
(.013)
80.59
Jurisdiction population -
-
.197***
(.014)
1.15
-.176***
(.024)
144.53
gwesp (.7)
-
-
-
-
2.80***
(.042)
327.86
Edges
-3.24*
(1.35)
-
-.439
(3.67)
-
-1.60
(3.67)
-
Wealth
.012*
(.006)
3.34
.016*
(.008)
6.49
.015
(.008)
5.99
gwesp (.7)
.073
(.320)
2.36
.124
(.315)
2.24
.097
(.312)
2.25
gwdegree (.7)
.751
(.137)
1.83
-1.12
(2.57)
2.00
-.514
(2.56)
1.83
kstar (2)
-
-
-.448
(.557)
9.09
-.369
(.528)
8.60
Business network
-
-
-
-
Florentine Marriage
2.22***
1.24
(.641)
***p<0.001; **p<0.01; *p<0.05. The gwidegree and gwodegree statistics are the directed
equivalent of gwdegree score for indegree and outdegree, respectively
54
Table A1. Means (standard deviations) of VIFs across ERGM specifications. VIFs are calculated
as 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1. All ERGMs include at least three actor-level covariates specified as maineffects; dyad-independent ERGMs are those that only have three actor-level effects.
ERGM
specifications
Actor-level
attributes
gwesp
Deg. Pop.
Deg. Cor.
Overall
Dyadindependent
.85
(.14)
-
-
-
.85
(.14)
gwesp
.88
(.12)
.75
(.08)
-
-
.85
(.12)
Deg. Pop.
.87
(.12)
-
.73
(.07)
-
.83
(.13)
Deg. Cor.
.87
(.12)
-
-
.72
(.08)
.83
(.13)
gwesp +
Deg. Cor
.87
(.12)
1.07
(.06)
-
1.07
(.06)
.95
(.14)
gwesp +
Deg. Pop.
.87
(.12)
1.08
(.07)
1.08
(.06)
-
.96
(.14)
Deg. Pop. +
Deg. Cor.
.88
(.11)
-
1.16
(.02)
1.16
(.02)
.99
(.16)
gwesp +
Deg. Pop. +
Deg. Cor.
.93
(.13)
1.12
(.10)
1.20
(.05)
1.20
(.05)
1.05
(.16)
Overall
.88
(.12)
1.01
(.17)
1.04
(.20)
1.04
(.20)
.93
(.17)
55
Table A2. Means (standard deviations) of VIFs across ERGM specifications by network density.
VIFs are calculated as 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 . All ERGMs include at least three actor-level covariates
specified as main-effects; dyad-independent ERGMs are those that only have three actor-level
effects.
Actor-level
attributes
gwesp
Deg. Pop.
Deg. Cor.
Overall
Dyadindependent
.85
(.14)
-
-
-
.85
(.14)
gwesp
.89
(.11)
.77
(.09)
-
-
.86
(.11)
Deg. Pop.
.87
(.12)
-
.73
(.07)
-
.84
(.13)
Deg. Cor.
.87
(.12)
-
-
.72
(.08)
.83
(.13)
gwesp +
Deg. Cor
.87
(.12)
1.00
(.04)
-
1.00
(.04)
.93
(.12)
gwesp +
Deg. Pop.
.87
(.12)
1.00
(.03)
1.01
(.03)
-
.93
(.11)
Deg. Pop. +
Deg. Cor.
.88
(.11)
-
1.15
(.02)
1.15
(.02)
.99
(.16)
gwesp +
Deg. Pop. +
Deg. Cor.
.93
(.13)
1.06
(.13)
1.19
(.06)
1.19
(.06)
1.04
(.15)
Total
.88
(.12)
.96
(.14)
1.02
(.19)
1.02
(.19)
.92
(.15)
Dyadindependent
.85
(.14)
-
-
-
.85
(.14)
gwesp
.87
(.12)
.75
(.08)
-
-
.84
(.12)
Deg. Pop.
.87
(.12)
-
.73
(.07)
-
.83
(.13)
ERGM
specifications
Density = .1
Density = .2
56
Deg. Cor.
.87
(.12)
-
-
.72
(.08)
.83
(.13)
gwesp +
Deg. Cor
.87
(.12)
1.09
(.03)
-
1.09
(.03)
.96
(.13)
gwesp +
Deg. Pop.
.85
(.12)
1.10
(.02)
1.10
(.02)
-
.95
(.15)
Deg. Pop. +
Deg. Cor.
.88
(.11)
-
1.17
(.02)
1.17
(.02)
1.00
(.16)
gwesp +
Deg. Pop. +
Deg. Cor.
.93
(.13)
1.12
(.09)
1.20
(.05)
1.20
(.05)
1.05
(.16)
Total
.87
(.13)
1.01
(.17)
1.05
(.20)
1.04
(.20)
.93
(.17)
Dyadindependent
.86
(.13)
-
-
-
.86
(.13)
gwesp
.87
(.12)
.73
(.07)
-
-
.84
(.13)
Deg. Pop.
.87
(.12)
-
.73
(.07)
-
.83
(.13)
Deg. Cor.
.87
(.12)
-
-
.72
(.08)
.83
(.13)
gwesp +
Deg. Cor
.88
(.11)
1.13
(.02)
-
1.13
(.01)
.98
(.15)
gwesp +
Deg. Pop.
.89
(.11)
1.15
(.02)
1.15
(.02)
-
1.00
(.15)
Deg. Pop. +
Deg. Cor.
.88
(.11)
-
1.17
(.02)
1.17
(.02)
1.00
(.17)
gwesp +
Deg. Pop. +
Deg. Cor.
.93
(.13)
1.17
(.07)
1.22
(.05)
1.21
(.05)
1.07
(.17)
Total
.88
(.12)
1.05
(.19)
1.07
(.20)
1.06
(.20)
.94
(.17)
Density = .3
57
Figure 1: Distribution of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1
58
Figure 2: Frequency of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 by count of endogenous variables in an ERGM. y-axis is
frequency of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 values; x-axis is 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 values. Lighter shades correspond with
denser concentration of values.
59
Figure 3: Interactions for variable type*density. y-axis is mean 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 for the interaction;
x-axis is the density of the network. Black dashed lines are actor-level attributes, red solid lines
are the endogenous parameter specifications for the model.
60
Figure 4: Interactions for variable type*density by network topology. The y-axis is the mean
𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 value for the interaction; x-axis is the network density.
*Note: The graphs for a single endogenous variable are plotted over a smaller range of
𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 than the other graphs in the figure.
61
Figure 5. Violin plots of VIFs for different variables by network topology. Y-axis is
𝑙𝑜𝑔(𝑉𝐼𝐹)0.1.
62
Figure 6a. Log(variance) fluctuations for 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 of a focal variable (solid line) and
maximum log(variance) fluctuation of a variable in a model for the maximum 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 of a
variable in a model (dashed line). Smoothed Lowess lines.
*Note: the x-axis was plotted to scale of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 , but untransformed VIF values are marked
to make interpretation easier. Similarly, the y-axis was plotted to scale log(variance) fluctuations,
but untransformed values are marked.
**Note: the linear increase when 10 < VIF < 20 connects the control group of dyad-independent
ERGMs to ERGMs with three endogenous effects. The largest VIF in dyad-independent ERGMs
was < 10, and the smallest VIF for ERGMs with endogenous effects was >20.
63
Figure 6b. Log(variance) fluctuations for 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 of a focal variable (solid line) and
maximum log(variance) fluctuation of a variable in a model for the maximum 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 of a
variable in a model (dashed line). Unsmoothed Lowess lines.
*Note: the x-axis was plotted to scale of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 , but untransformed VIF values are marked
to make interpretation easier. Similarly, the y-axis was plotted to scale log(variance) fluctuations,
but untransformed values are marked.
**Note: the linear increase when 10 < VIF < 20 connects the control group of dyad-independent
ERGMs to ERGMs with three endogenous effects. The largest VIF in dyad-independent ERGMs
was < 10, and the smallest VIF for ERGMs with endogenous effects was >20.
64
Figure 7. Heteroskedasticity plot of absolute difference in parameter variance estimates as
𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 increases for a focal variable.
*Note: the x-axis was plotted to scale of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 , but untransformed VIF values are marked
to make interpretation easier. y-axis is untransformed.
65
Figure 8. Heteroskedasticity plot of maximum absolute difference in parameter variance
estimates in a model as the highest 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 in the model increases.
*Note: the x-axis was plotted to scale of 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 , but untransformed VIF values are marked
to make interpretation easier. y-axis is untransformed.
66
Figure 9. Mean VIFs for ERGMs of five empirical networks. In Model 3, the Faux Mesa
network had a mean VIF of 29.8 and the LHDS network had a mean VIF of 134.8.
67
Figure A1. Violin plots of VIFs for different variables by variance mean ratio of exogenous
actor-level variables. Y-axis is 𝑙𝑜𝑔(𝑉𝐼𝐹)0.1 .
68
Figure A2. Violin plots of VIFs for different variables by network density. Y-axis is
𝑙𝑜𝑔(𝑉𝐼𝐹)0.1.
69