[go: up one dir, main page]

Academia.eduAcademia.edu
Journal of Social St ruct ure JoSS Art icle: Volum e 12 JoSS D e t e ct in g Ch a n ge in Lon git u din a l Socia l N e t w or k s Ian McCu llo h Network Science Center, U.S. Military Academy, West Point, NY ian.mcculloh@usma.edu Kath le e n M. Carle y Center for Computational Analysis of Social and Organizational Systems, School of Computer Science, Carnegie Mellon University kathleen.carley@cs.cmu.edu Abst r a ct Changes in observed social networks may signal an underlying change within an organization, and may even predict significant events or behaviors. The breakdown of a team’s effectiveness, the emergence of informal leaders, or the preparation of an attack by a clandestine network may all be associated with changes in the patterns of interactions between group members. The ability to systematically, statistically, effectively and efficiently detect these changes has the potential to enable the anticipation, early warning, and faster response to both positive and negative organizational activities. By applying statistical process control techniques to social networks we can rapidly detect changes in these networks. Herein we describe this methodology and then illustrate it using four data sets, of which the first is the Newcomb fraternity data, the second set of data is collected on a group of mid-career U.S. Army officers in a week long training exercise, the third is the perceived connections among members of al Qaeda based on open source, and the fourth data set is simulated using multi-agent simulation. The results indicate that this approach is able to detect change even with the high levels of uncertainty inherent in these data. Ke yw or ds Statistical models for social networks, longitudinal social network analysis, Statistical Process Control, CUSUM, change detection Ack n ow le dge m e n t s This research is part of the ARO Change Detection project with the USMA Network Science Center and the Dynamics Networks project in CASOS (Center for Computational Analysis of Social and Organizational Systems, http://www.casos.cs.cmu.edu) at Carnegie Mellon University. This work was supported in part by:      The Army Research Organization, MIPR No. 9FDATXR048. The Office of Naval Research (ONR), United States Navy Grant No. N00014-06-1-0104 The Army Research Labs DAAD19-01-2-0009. The Army Research Institute ARI—W91WAW07C0063. Additional support on measures was provided by the National Science Foundation IGERT 9972762 in CASOS and the Department of Defense. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the ARO, ONR, ARL, NSF, DOD or the U.S. government. Page 2 of 37 I n t r odu ct ion Social network change detection (SNCD) represents an exciting new area of research. It combines the area of statistical process control and social network analysis. The combination of these two disciplines is likely to produce significant insight into organizational behavior and social dynamics. Immediate applications to counter terrorism and organizational behavior are possible due to the sheer volume of available electronic communications network data (McCulloh et al., 2008; Ring, Henderson & McCulloh, 2008). Much research has been focused in the area of longitudinal social networks (Sampson, 1969; Newcomb, 1961; Romney et al., 1989; Banks & Carley, 1996; Sanil, Banks & Carley, 1995; Snijders, 1990, 2007; Frank, 1991; Huisman & Snijders, 2003; Johnson et al., 2003; McCulloh et al., 2007a, 2007b). Wasserman et al. (2007) state that, “The analysis of social networks over time has long been recognized as something of a Holy Grail for network researchers.” Doreian & Stokman (1997) produced a seminal text on the evolution of social networks. In their book they identified as a minimum, 47 articles published in Social Networks that included some use of time, as of 1994. They also noted several articles that used over time data, but discarded the temporal component, presumably because the authors lacked the methods to properly analyze such data. An excellent example of this is the Newcomb (1961) fraternity data, which has been widely used throughout the social network literature. More recently, this data has been analyzed with its’ temporal component (Doreian & Stokman, 1997; Krackhardt, 1998; Baller, et al. 2008). Methods for the analysis of over-time network data have actually been present in the social sciences literature for quite some time (Katz & Proctor, 1959; Holland & Leinhardt, 1977; Wasserman, 1977; Wasserman & Iacobuccci, 1988; Frank, 1991). Continuous time Markov chains for modeling longitudinal networks were proposed as early as 1977 by Holland & Leinhardt and by Wasserman. Their early work has been significantly improved upon (Wasserman, 1979; 1980; Leenders, 1995; Snijders & van Duijn, 1997; Snijders, 2001; Robins & Pattison, 2001) and Markovian methods of longitudinal analysis have even been automated in a popular social network analysis software package SIENA. A related body of research focuses on the evolution of social networks (Dorien, 1983; Carley (1990, 1991, 1995, 1999); Dorien & Stokman, 1997) to include three special issues in the Journal of Mathematical Sociology (JMS 21, 1-2; JMS 25, 1; JMS 27, 1). Others have focused on statistical models of network change (Feld, 1997; Sanil, Banks, & Carley, 1995; Snijders, 1990, 1996; Van de Bunt et al., 1999; Snijders & Van Duijn, 1997). Robins & Pattison (2001, 2007) have used dependence graphs to account for dependence in over-time network evolution. We can clearly see that the development of longitudinal network analysis methods is a well established problem in the field of social networks. We nominate four types of dynamic network behaviors for investigation in this paper. These behaviors are not comprehensive; however, it is necessary to define a set of behaviors to focus our investigation of network change. The four behaviors we focus our attention on include: network stability; endogenous change; exogenous change; and initiated change. Stability occurs when the underlying relationship between agents in a network remains the same. It is possible that observed networks may contain error (Killworth & Bernard, 1976; Bernard & Killworth, 1977). If the network is stable, then changes in the network over time are due to observation error alone. An example of stability occurs in work environments where the underlying relationships remain unchanged, however, fluctuations exist as a result of stochastic noise, variations in daily work requirements, and sampling error. En d o ge n o u s ch an ge occurs when the goals and motives of an individual, among other factors may drive the network to evolve. For example, a military platoon consisting of 20 to 30 soldiers can experience endogenous change as individuals interact, share beliefs and experiences. This is the focus of actor- Page 3 of 37 oriented models (Snjiders, 2007) which attempt to estimate statistically significant behaviors, both structural and compositional, that drive network evolution. In a similar fashion, multi-agent simulation approaches attempt to investigate endogenous change by specifying agent-level behavior in order to infer network evolution. Exo ge n o u s ch an ge occurs when a change is introduced separate from the agent interaction. With this type of change future events are independent from previous events. This implies that no inference can be drawn from the present model about the future network dynamics. An example of exogenous change might occur in the form of an enemy attack on a military platoon consisting of 20 to 30 soldiers. During the attack there is something fundamentally different about the relationships among the soldiers. There is nothing about the individual interactions that could predict this change caused by an exogenous source. In other situations, exogenous change can occur for many reasons. A shortage of economic resources could lead to job lay-offs that will significantly affect the social network, regardless of endogenous effects. These are of course drastic changes, presented here to illustrate abrupt forms of network change. It is also possible to have smaller change, such as when a new person joins a social group, a company finds new access to less expensive resources, or a group member finds a better way of accomplishing required tasks. The final longitudinal network behavior we discuss is in itiate d ch an ge . We define this behavior as occurring when an exogenous change initiates a sequence of endogenous change. In our military example, it is possible that the heroic or cowardly actions of individuals in the platoon may affect the way other platoon members see them, thereby affecting the interaction among agents in the network and initiating endogenous network evolution. It is important to delineate the difference between stability, endogenous, exogenous and initiated change if we are to understand network dynamics and any underlying processes governing network behavior. Again these changes are not comprehensive as one might imagine periodic change, event driven change, and other forms of change found in the dynamics literature. A first step toward the problem of longitudinal network analysis is to statistically determine that an organization has changed over time. For example, Johnson et al. (2003) studied people wintering over at the South Pole. There were three similar groups corresponding to three different years. A whole-network survey design was used to collect social network data once per month for eight months for each of the three groups. Johnson studied longitudinal change on the social networks of the three groups. Theoretically, these similar groups should exhibit similar evolutionary behavior. In one of the groups, there was an exogenous change that involved the “disappearance” of an expressive leader “due in part to harassment by a marginalized crew member.” This exogenous change significantly affected the evolutionary behavior of the network. This behavior was only apparent as a result of the similarity between the three groups and the large magnitude of the difference in network behavior, which enabled Johnson to determine the significant cause of this difference. In practice, this type of similarity among groups may be rare. SNCD offers a method to identify statistically significant abrupt change in network behavior in real-time, and to identify a likely change point of when the change occurred. This change point will allow a social scientist to identify potential causes of change, such as the disappearance of the crew member, and isolate that exogenous abrupt change from typical longitudinal behavior. Our approach for detecting changes in longitudinal networks rapidly detects an abrupt change in some network measure over time. We are not predicting a future change, but rather rapidly identifying that a change has occurred; and then providing a statistically sound indication of when that change was likely to have occurred. Rapid detection and identification of change is important for two key reasons. First, it allows an analyst monitoring a network in real time to respond quickly to organizational change, facilitating the change if it Page 4 of 37 is positive, and mitigating the effects of negative change on the organization. For example, ideas and policies are discussed and communicated within a network of people, long before organizational implementation. Sometimes, individual politics (network evolution) can prevent the implementation of good ideas (Rogers, 2003). Rapid detection of organizational change may cause a manager to investigate the presence of good initiatives and see them through to implementation. On the other hand, terrorist organizations will begin planning their attacks, long before they are actually carried out. Rapid change detection could alert military intelligence analysts to the shift in planning activities prior to the attack occurring. The proposed approach may also be useful to social scientists investigating organizational change. This approach provides another tool for the exploration of longitudinal networks. Common problems with existing methods such as exponential random graphs and actor-oriented models include degeneracy and non-convergence (Handcock, 2003). SNCD can identify changes in longitudinal networks to help identify abrupt changes induced by some exogenous factor, such as the removal of the agent in the Johnson wintering over data (Johnson et al., 2003). With SNCD, the social scientist can identify shorter periods within the longitudinal network data where other methods may provide useful insight without convergence and degeneracy issues. The third key reason that rapid change detection is important is that it limits the scope of explanation for network change. A sound statistical estimate of when a network change occurred can help a social scientist identify potential abrupt exogenous changes and thereby isolate periods of the network for more in-depth investigation. Determining the likely time of change in a network helps us understand where to look for fundamental conditions that cause groups to transform themselves. If we as social scientists could monitor networks in a daily or weekly basis, we could open a new line of research within longitudinal network analysis. SNCD is essentially a statistical approach for detecting abrupt persistent changes in organizational behavior over time. Organizations are not static, and over time their structure, composition, and patterns of communication may change. These changes may occur quickly, such as when a corporation restructures, but they often happen gradually, as the organization responds to environmental pressures, or individual roles expand or contract. Often, these gradual changes reflect a fundamental qualitative shift in an organization, and may precede other indicators of change. It is important to note, however, that a certain degree of change is expected in the normal course of an unchanging organization, reflecting normal day-to-day variability. The challenge of Social Network Change Detection is whether metrics can be developed to detect signals of meaningful change in social networks in a background of normal variability. This paper will introduce an application of statistical process control to detect change in longitudinal network data. A brief background is provided on statistical process control which is used extensively in manufacturing. Statistical process control is extended to social networks with important limitation and distribution assumptions being addressed. The newly proposed method is demonstrated on three longitudinal data sets. The performance of the method is then explored using multi-agent simulation. Ba ck gr ou n d Longitudinal social network data is becoming increasingly more common. Longitudinal network data can be readily obtained in a semi-autonomous fashion from the internet, blogs, and email. Longitudinal network analysis is becoming increasingly relevant for the analysis of online citation networks, internet movie data, massive multi-player on-line games (MMPOG), patent data bases, phone-networks, emailbased-networks, social-media networks and more. Page 5 of 37 Current methods of change detection in social networks, however, are limited. Hamming distance (Hamming, 1950) is often used in binary networks to measure the distance between two networks. Euclidean distance is similarly used for weighted networks (Wasserman & Faust, 1994). While these methods may be effective at quantifying a difference in static networks, they lack an underlying statistical distribution. This prevents an analyst from identifying a statistically significant change, as opposed to normal and spurious fluctuations in the network. Jaccard indices are used by SIENA (Snijders et al., 2007) users to assess the amount of turnover from one observation of network panel data to the next. The amount of turnover may indicate a number of important features of the data, including whether an actor-oriented model is likely to have convergence issues. This index is not ideal for detecting network change for similar reasons as the Hamming distance. The quadratic assignment procedure (QAP) and its multiple regression counterpart MRQAP (Krackhardt, 1987, 1992) has been used to detect structural similarity and compare networks in terms of their correlation. This is not the same as detecting a statistically significant change in the network over-time. The procedure could probably be adapted for such purpose, but this is not a trivial task and certainly beyond the scope of this paper. Markovian approaches to longitudinal network analysis such as SIENA are good methods for modeling evolutionary change and determining structural factors that affect network change; however, these models may have convergence issues in the presence of sufficiently large abrupt endogenous or exogenous changes. These models also assume an underlying statistical process within the network that drives change, and models exogenous change with time dummies that requires some a priori knowledge of the change. SNCD is a process of monitoring networks to determine when significant changes to their network structure occur so that analysts and researchers can more efficiently search for potential causes of change. We propose that techniques from social network analysis, combined with those from statistical process control can be used to detect when significant changes occur in longitudinal network data. In application, it requires the use of statistical process control charts to detect changes in observable network measures. By taking longitudinal measures of a network, a control chart can be used to signal when significant changes occur in the network. For those unfamiliar with statistical process control, it should be noted that the word “control” can be very misleading. In fact, nothing is controlled at all. Statistical process control is a collection of algorithms that monitor a stochastic process over time and rapidly detect statistically significant departures from typical behavior. Control charts refer to the individual algorithms used to monitor a process. The word “control” is derived from their application in quality control. Quality engineers attempt to control production lines by monitoring them and investigating any statistical anomalies. Through investigation, they attempt to mitigate negative process behavior and continue any newly discovered process improvements. In our application of SNCD, we use statistical process control to monitor longitudinal social networks and detect any statistically significant departures from typical behavior that may correspond to a change in the network. While the quality engineer uses this technique to “control” a manufacturing process, we envision that the social scientist will use it to gain insight in network dynamics. There are many network measures that can be calculated from a given network. These include graph level measures, e.g., density, and node level measures, e.g., degree centrality. The SNCD technique is applicable to any measure of the network regardless of whether it is a graph level or a node level measure. In this paper for exposition purposes we focus on graph level measures rather than node level measures in order to investigate changes in the network as a whole as opposed to changes in the level of influence of a particular agent. For example, for each time period, we use the average of the betweenness (Freeman, Page 6 of 37 1977) over all nodes in the graph rather than the betweenness of a single node. The average betweenness may provide insight into group cohesion and the distribution of informal power throughout the organization. We also illustrate SNCD using density (Coleman & Moré, 1983), average closeness (Freeman, 1979), and average eigenvector centrality (Bonacich, 1972). Again, these measures provide slightly different insight into group cohesion. These four measures are chosen because they are commonly used in the literature and represent many potential measures available for change detection. Additional measures such as the maximum, minimum, and the standard deviation of the above node level measures are considered in a virtual experiment to explore limitations of the proposed method. A complete exploration of all social network measures and all possible types of changes to a network is certainly beyond the scope of this initial paper on the subject, however, we hope to have sufficiently illustrated the promise of this approach. Another concern with these measures is their scale invariance. In order to compare measures across different time periods, they must be standardized. For a steady sized group this should not be an issue, but in the case of an expanding or contracting group, issues arise as to whether results can be used across the different scales of group size. In other words, the network measures may change in different ways with respect to the current group size and thus provide inconsistent information about the group even absent of any stochastic changes within the group. For more detailed information on the standardization of network measures, see Bonacich, Oliver & Snijders (1998). For this research, *ORA1 developed by Kathleen Carley at the Center for Computational Analysis of Social and Organizational Systems at Carnegie Mellon University is used to compute the average network measures from all group information (Carley et al., 2009). St a t ist ica l Pr oce ss Con t r ol SPC is a technique used by quality engineers to monitor industrial processes. They use control charts to detect changes in an industrial process by taking periodic samples from the process, calculating a statistic based on some process metric, and comparing the statistic against a decision interval. If the statistic exceeds the decision interval, the “control chart” is said to “signal” that a change may have occurred in the process. Once a potential change has been “signaled,” quality engineers investigate the process to determine if an actual change occurred, what the most likely time the change occurred was, and whether the process needs to be reset or improved to avoid financial loss for the company. Control charts are usually optimized for their processes to increase their sensitivity for detecting changes, while minimizing the number of “false positives”—signals when no change has actually occurred in the process. Three control chart schemes are investigated in this paper; the cumulative sum (CUSUM) (Page, 1961); the Exponentially Weighted Moving Average (Roberts, 1959); and the Scan Statistic (Fisher & MacKenzie, 1922; Naus, 1965; Priebe et al., 2005). The CUSUM will be the primary method considered and recommended for longitudinal network analysis. This procedure provides an estimate of when the change actually occurred (change point detection) as opposed to simply signaling that a change occurred (change detection). The other two methods are applied to simulated networks in a virtual experiment to explore the performance of SNCD. CUSUM The CUSUM control chart (Page, 1961) was proposed as an improvement over the traditional Shewhart (1927) x-bar chart. The strength of the CUSUM was its use of sequential probability ratio testing which used information of previous observations to determine change in a stochastic process. Moustakides (2004) showed that the CUSUM procedure was a uniformly most powerful test for normally distributed processes with a specified size step change in the mean of the process. Unfortunately, in most applications Page 7 of 37 the investigator does not know a priori the size and type of the change. Furthermore, the underlying process may not be normally distributed. The quality engineering literature contains much exploration of the performance of the CUSUM under conditions of different magnitudes of change, types of change, and distributional assumptions. The CUSUM control chart sequentially compares the statistic Ct against a decision interval h until Ct > h. Since one is not interested in concluding that the network process is unchanged, the cumulative statistic is C t  max{ 0 , Z t  k  C t1 } If this rule was not implemented the control chart would require more observations of the network to signal if Ct < 0 at the time of abrupt change. The statistic C t is compared to a constant, h+. If C t  h  , then the control chart signals that an increase in the network measure might have occurred. In a similar fashion, C t  max{ 0 ,  Z t  k  C t1 } and is compared to a constant, h . If C t  h  , then the control chart signals that a decrease in the network measure may have occurred. To monitor for both directions of network change, two one-sided control charts are employed. One chart is used for monitoring increases in the monitored network property and the other is used for detecting decreases in the property. If the process remains in-control then C t will fluctuate around zero. When C t > h+ or C t > h-, the two one-sided CUSUM control chart scheme signals that the network may have changed. Ex pon e n t ia lly W e ight e d M oving Ave r a ge Con t r ol Cha r t The Exponentially Weighted Moving Average (EWMA) control chart was introduced by Roberts (1959) for monitoring changes in the mean of a process. The EWMA associated with subgroup t is wt  xt  (1   ) wt 1 , where 0    1 is the weight assigned to the current subgroup average and w0   0 . Common values of λ are 0.1    0.3 . Having observed a total of T subgroups, the statistic wT is plotted against the decision interval   2T 1/ 2 1  1   , 2    where L is a constant that 0  L x    scales the width of the decision interval. Lucas & Saccucci (1987) (see also Saccucci & Lucas, 1990) investigated the impact of different combinations of L and λ on the average number of observations before the EWMA signals a change. The combinations that were investigated were chosen such that the false positive rate for each chart was the same. They found that EWMA charts with small values of λ perform well at detecting small changes in a process mean. Conversely, EWMA charts with large values of λ perform well at detecting large changes in a process mean. Hunter (1986) and Montgomery (1996) investigated the performance of the EWMA chart Page 8 of 37 and concluded that it iss similar to th he performance of the CUS SUM chart. In n addition, thee EWMA is a time series app proach for SPC C. Therefore, the EWMA seems a good ccandidate forr comparison to the CUSUM M. Sca n St a t ist ic Scan statiistics (Fisher & Mackenzie,, 1922; Naus, 1965; Priebe,, et. al., 2005)), also known n as moving window analysis, a inveestigates a ran ndom field forr the presencee of a local siggnal. A small w window of observatio ons is used to o calculate a lo ocal statistic. In this paper a window sizze of 7 observaations proceeeding the curren nt time period d is used, and d the window mean m is used for the local statistic. Incrreasing the wiindow size reducces the likelihood of false alarm, a but ma akes detection n of a change lless likely. Deecreasing the window siize makes thee procedure more m sensitivee to change, bu ut increases tthe probabilitty of false sign nal. The decisiion to use a window w size off 7 was chosen n to be consisstent with preevious applicaations of the sscan statistic fo or detecting lo ongitudinal network n chang ges (Priebe ett al., 2005). Iff the statistic eexceeds a deccision interval, then t inferencee can be madee that a chang ge in the netw work may havee occurred. D ist r ibu u t ion a l Lim m it a t ion s The performance and false f alarm prrobability of th he SPC proceedures used in n this approacch assume thaat the stochasticc process bein ng monitored is independen nt and normaally distributeed. The assum mptions are clearly violated in n network app plications. Th he degree to which w these asssumptions arre violated an nd the impact on type I erro or varies baseed on the topo ology of the neetwork. Netw works that req quire a meanin ngful investm ment of resourcces to establissh a link, limitt the degree a node can obttain and the n network tendss to take on aan Erdos-Ren nyi random to opology (Erdo os & Renyi, 19 959; Alderson n, 2009). In o other network ks, such as scaalefree netwo orks common n for modeling g the internett and certain b biological nettworks, the diistribution of many network measures m is sk kewed and thee false alarm rate may be aadversely affeected. Figure 1 shows the variance of o data collectted from a normal and righ ht skewed disttribution verssus the numb ber of observaations sampled. The increased d variance fro om the right skewed s data w will inflate thee decision inteerval calculated on a few initiial observations, making itt more difficullt to detect ch hange, or morre susceptiblee to false alarm m. Figure F 1. Bias I nduced in Rig ht Skew ed Datt a Page 9 of 37 3 Some social scientists do not believe that groups can be adequately captured by quantitative analysis and statistical distributions (Brown & Morrow, 1994). We do not attempt to tackle this argument. Clearly, the work of this paper contributes to quantitative methods in social science. We also do not claim that a detected change is definitive proof that the organization has in fact changed. This approach will only detect a statistically significant change in the observed network measure of an organization. This could be a false alarm, an expected event affecting the organization, among other causes. Change detection simply alerts an analyst or social scientist that a change may have occurred. It is incumbent on the analyst or social scientist to investigate the group using many different methods in the social sciences to determine if change has in fact occurred, the nature of that change, and the cause of change. The approach laid out in this work will narrow the scope of this task by quickly identifying potential change and estimating when the change may have occurred. Data CUSUM is a method for assessing longitudinal change, and we use real-world data to demonstrate the practical application of the approach and simulated data to assess the accuracy of the approach. Altogether we use four data sets to demonstrate the efficacy of the social network change detection approach. We initially illustrate the CUSUM control chart on the Newcomb Fraternity data, a social network data set recorded of college transfer students; the Leavenworth data, a social network data set recorded of mid-career U.S. Army officers in a training exercise; and an al Qaeda data set. It is impossible to identify the “real” change in real-world data. For these data sets, we suggest compelling reasons for the change identified using SNCD; however, we acknowledge a different “story” might be constructed if different change points were identified. Thus, we also use simulated data generated by a multi-agent simulation so that we can decisively know the point of “real” change. Applying the CUSUM control chart to this data enables us to determine whether or not the proposed method can indeed identify the point of change. The performance comparison of the CUSUM to the EWMA, the Scan Statistic, and across various network level measures is explored using multi-agent simulation. The four data sets are explained in more detail. Page 10 of 37 N e w com m b Fr a t e r n it y N e t w orr k The first data d set was collected c by Th heodore New wcomb (1961) at the Univerrsity of Michig gan. The participan nts included 17 1 incoming trransfer studen nts, with no p prior acquainttance, who weere housed together in fraternity housing. h The participants p were w asked to rank their prreference of in ndividuals in the house from m 1 to 16, wheere 1 is their first f choice. Data D was colleected each weeek for 15 week ks, except forr week number 9. 9 David Krack khardt (1998)) dichotomizeed the networkk data by assiigning a link tto preference ratings off 1-8 and havin ng no link forr ratings of 9 to t 16. A visual alization of thee Newcomb F Fraternity netw work for time period p 8 is sho own in Figuree 2. The mean n and standard d deviation off the average betweenness, and average clloseness was estimated fro om the first fiv ve networks t o determine ttypical behaviior. The CUSU UM statistic was w then calcu ulated for all time t periods. Note that thee dichotomizaation scheme proposed by Krackhard dt results in a constant den nsity across alll time period ds, thus no change can occu ur in this meaasure. Figure 2. Dich ot om ized New com b Frat ernitt y Net work forr Tim e Period 8 8. Page 11 off 37 Le a ve nw w or t h D a t a The secon nd data set wa as collected frrom an Army war w fighting ssimulation att Fort Leavenw worth, Kansas, in April 2007, by Craig Scchreiber. The participants were w mid-carreer U.S. Arm my officers tak king part in a brigade leevel staff train ning exercise. There were 68 6 participantts in this dataa set, who servved as staff members in the headqu uarters of thee brigade cond ducting a simu ulated trainin ng exercise. R Relational dataa was collected through t self reported r comm munications surveys s over a period of fo our days, twice per day. Thus, there weree 8 time perio ods. A directeed relationship p is recorded if an officer rreports interaacting with an nother one of thee 68 officers during d the preeceding time period. p Halfw way through th he second dayy (after time period 3),, the brigade commander c was w displeased d at the lack o of coordinatio on between th he officers in tthe exercise. He H brought alll 68 participa ants together and chastised d them for their performan nce and told tthem that they were w expected d to perform better. b Thereffore, SNCD m might be able tto indicate a ssignificant ch hange in the netw work corresponding to thee brigade com mmander’s inteeraction with h the participaants. This data set is unique in that it conttains a known n change poin nt in time thatt can be used to validate th he proposed method. Figure F 3 show ws the social network n for tim me period 4 fr from the Leavvenworth dataa set. The meaan and stand dard deviation n of the densitty, average beetweenness, aand average clloseness was estimated fro om the first th hree networkss to determin ne typical beha avior. The CU USUM statistiic was then caalculated for aall time perio ods. Three tim me periods weere used becau use that repreesents about 3 30 percent off the time periiods and is com mparable to th he number ussed with the Newcomb N Fraaternity data. Ideally, moree networks will allow a more accurate estimate e of tyypical behavio or. The readerr is reminded d that these exxamples are u used to illustra ate the propossed methodolo ogy, while thee performancee of the meth hod is evaluateed using a simulated d data set. Fig ure 3. Leav enw wort h Net workk for Tim e Per io od 4 Page 12 off 37 Al Qa e da d Com m u n ica t ion s N e t w or k The Centeer for Computtational Analyysis of Social and Organizaational System ms (CASOS) aat Carnegie M Mellon University y created snap pshots of the annual comm munication beetween memb bers of the al Q Qaeda organizatiion from its fo ounding in 19 988 until 2004 from open ssource data (C Carley, 2006)). The data is limited in n that we do not know the type, t frequenccy, or substan nce of the com mmunication aand all links aare non-direcctional, meaniing we do nott know who in nitiated comm munication wiith whom. Fin nally, the completen ness of the da ata is uncertaiin since it onlyy contains infformation avaailable from o open sources. The data is un nique in that itt provides a network n picture of a robustt network oveer standard tim me-periods off one year. This data also providess a challenge for f the propossed method d due to the poo or data qualityy. Bernard & Killworth (1979) state that t “attemptts at detecting g change are u useless unlesss data quality are high.” Th he fact that the proposed meth hod succeeds at detecting change c underr these condittions speaks to o its usefulneess in practical applications. a Using the network snap pshots for eacch year time-p period, the avverage social n network meassures were calculated d and plotted for betweenn ness, closenesss, and densityy. Each of theese measures iincreased from 1988 until 1994, and th hen leveled offf. There are many m possiblee reasons for tthis burn-in p period, such aas the quality of our intelligen nce gathering g on al Qaeda and the rapid d developmen nt and reorgan nization of a ffast growing organization. o In I al Qaeda’s early years, access a to the iinfant organizzation may haave been limitted, as well as the resourcess devoted to tracking t a sma all, new, and relatively unaaccomplished d terrorist nettwork. The organ nization itselff may have alsso been chang ging drasticallly during its ffirst years by aactively recru uiting new mem mbers, and shiffting its struccture to accom mmodate new w resources an nd infrastructu ure. A required d condition fo or SNCD to bee applied is a period of nettwork stabilityy. For this reaason, the averrages for each measure m and standard s deviiation were ca alculated overr the five yearrs that follow tthe burn-in p period that ended d in 1994. The CUSUM con ntrol chart wa as then used tto monitor th he network fro om 1994 to 20 004. Figure 4 is a snapshot of o the al Qaed da social netw work. Figure 4. Mo nit ored al Qaeda Com m unica at ion Net work for Year 2001 Page 13 off 37 Sim u la t e d D a t a Simulated data is used in order to inject an organizational change at a defined point in time. SNCD approaches can then be evaluated on their ability to identify that change. In real-world data, there are often many changes facing an organization and identifying one specific cause of change can be subjective or questionable. With simulated data, SNCD can be explored in a more controlled series of virtual experiments. For this initial investigation, we use a multi-agent simulation of a 100 node network, using the Construct2 simulation model (Carley, 1990;Schreiber & Carley, 2004; Carley, Martin & Hirshman, 2009) set in the context of a U.S. infantry military organization (Headquarters, Department of the Army, 1992). Construct is a dynamic-network multi-agent simulation grounded in constructuralist theory (Carley, 1991; McCulloh et al., 2008). Agents are heterogeneous in their socio-demographic characteristics, information that they “know,” and their beliefs. Each time step agents may choose to interact with one or more others, communicate, and learn. The propensity of agents to interact is a function of knowledge, belief and task homophily; proximity of the agents; socio-demographic similarity, intent to learn new information, and intent to coordinate. Agent interaction leads to shared knowledge and thus greater knowledge-based homophily; however, heterophilous agents are less likely to interact. Construct has been validated in a number of settings and has been widely used to look at the co-evolution of social structure and culture, the diffusion of information and beliefs, and the impact of marketing campaigns and media on social behavior. Initial Construct populations, social and knowledge networks, can be hypothetical or real (Carley, Martin & Hirshman, 2009). Three key features that make Construct ideally suited to our needs are: 1) the social network evolves over time; 2) the user can specify “interventions” at specific times, thus guaranteeing a known state change in the system; and 3) the model can be instantiated with data on an actual group and so enables “what-if” reasoning about actual groups. The basic military structure that was simulated was an infantry training model. This is the most basic U.S. military unit and is used for training soldiers and officers across the U.S. Army Training and Doctrine Command (Headquarters, Department of the Army, 1992). Within this model, soldiers are organized into four-man teams. Two teams and a squad leader form a 9-man squad. Three squads and a three-person headquarters form a 30-man platoon. Three platoons and a 10-person command post form a company. Each soldier is trained in various skills that are distributed throughout the organization. Each team, for example, will have an automatic gunner, a grenadier and two riflemen. One member on a team will also be trained as a medic, another in demolitions, and two will be able to search enemy prisoners of war. Each soldier possesses individual skill in stealth, situational awareness, physical fitness, intelligence, military rank, and motivation. In the military context of this multi-agent simulation, the proximity was determined by the organizational proximity. Members of the same squad are closer to each other than other members in the platoon, who are closer than other members of the company. The socio-demographics of the agents do not change throughout the simulation and are coded as the agent’s military occupational specialty and military rank. The knowledge homophily was randomly seeded for each agent across 500 bits of knowledge data resulting in 3.27 * 1023 different agent knowledge combinations. This factor was allowed to change as agents share information when they interact, thus becoming more similar. The simulation was verified by adjusting the relative weights applied to homophily, proximity, and sociodemographics. The model was validated, in 2008, by four military subject matter experts who confirmed that the simulated networks represent their experience of soldier relationships in military units. Page 14 of 37 The simullation was run n with all agen nts present fo or the first 30 0 time periodss. At time periiod 30, some type of change was imposed d on the netwo ork, isolating some of the aagents, thereb by simulating g radio failuree or enemy atttack. Figures 5 and 6 show w example snapshots of the simulated neetwork beforee and after thee change. Figu ure 5. Sim ulat io on before Chan nge Figure 6. S Sim ulat ion aft e er Change The simullation was rep plicated 1,000 0 times to obttain estimatess of the averagge time to dettect change ass well as the varriance. M e t hod d Social nettwork change detection alg gorithms are implemented in much the ssame way a control chart iis implemen nted in a manufacturing prrocess. Three different grap ph measures aare used for cchange detecttion for the sak ke of illustrating the propo osed method. SNCD can bee applied to an ny node or grraph measuree over time. The graph measu ures for densitty, average clo oseness, and average betw weenness centrality are calculated d for several consecutive c tim me-periods of the social neetwork. The m mean and variiance for the measures of the networrk are calcula ated by taking g a sample aveerage and sam mple variancee from networrks that are asssumed to be “typical.” At least two netw works are req quired to estim mate these vaalues, howeverr, more netw works will allo ow a more acccurate estima ate of the meaan and variancce of the “typical” network k measure. The subsequeent, successiv ve social netw work measuress are then useed to calculatee the CUSUM M’s C+ and C- sta atistics as welll as the appro opriate statistiics for the EW WMA and Scaan Statistic. Th hese are then compared d to a decision n interval to determine d wheen or if the co ontrol chart siignals a chang ge in the meaan of the monittored network k measure. Up pon receiving a signal, the change pointt is calculated d by tracing th he signaling C+ or C- statisstic in the CU USUM procedu ure back to th he last time peeriod it was zeero. In order to continue running r the control c chart after a a signal, the mean an nd variance arre recalculated d after the nettwork measures have stabilizeed following the t change. Recall tha at SNCD only indicates tha at a change ma ay have occurrred. The deteermination th hat the networrk has in factt changed and d the subsequ uent determin nation that thee network hass stabilized fo ollowing the change sh hould be based d on an investigation of oth her aspects off the network k and the dataa surrounding g the change po oint. Otherwisse, the risk off misspecifyin ng the change point can biaas current and d future findin ngs of change. This CUSU UM methodo ology is demon nstrated on th hree real-worlld data sets aand explored iin more detaill through simulation. Th he real-world data sets are used to illusttrate practicall application o of the approaach. The decisiion threshold d for the threee real-world data d sets was eestablished att 3.0. If the neetwork measu ure Page 15 off 37 were norm mally distribu uted, this wou uld correspond ded to an estiimated risk off false alarm ((type I error) of 0.01 (Galb breath, 2008)). As noted ea arlier, as the distribution d off the network k measure is in ncreasingly riight skewed, bias b is introdu uced that can increase i the likelihood l of ffalse alarm. H However, the n network meassures observed during the sta abilized in-co ontrol period of o the three d data sets do no ot violate normality assumptio ons, as shown n in the norma al probabilityy plots in Figu ure 7. Figu re 7. Norm al Probabilit P y Plot s of t he I n- Con nt rol Measuress of Real- Wor ld d Dat a Vir t ua l Ex E pe r im e n t A virtual experiment e iss conducted using u the Consstruct Infantrry Model to prrovide a realisstic data set ffor evaluating g SNCD meth hods. Three diifferent size in nfantry units (squad, plato oon, and comp pany) are simulated d for 500 timee periods. In these t units, fo our changes aare introduced d. This createss 9 independeent data sets that t can be ussed to evaluatte SNCD perfo ormance. Thrree of the chan nges are not ffeasible for th he squad sizee element. Th he four networrk changes co orrespond to ccommon milittary commun nication probllems that migh ht affect an inffantry unit. The first type t of networrk change is the t isolation of o the Headqu uarters section n. For a squad, this is simp ply the squad d leader. For a platoon, thiss consists of th he platoon leaader, platoon n sergeant, and d the radio telephonee operator (RT TO). For a com mpany, this in ncludes the 10 0-person com mmand post, aalso known ass the headquartters element. A military heeadquarters iss most often iisolated from the rest of th he unit as a ressult of radio fa ailure or a delliberate attack k from enemyy forces. This is perhaps on ne of the mostt significant changes th hat commonly happen in a military situ uation, as it reequires a rapid d and efficien nt transfer of command d and control, as the forma al hierarchy is significantlyy adjusted. In the simulatio on, this is mod deled by isolatin ng the headqu uarters section n beginning at a time period d 20. These in ndividuals rem main isolated for the remainder of the siimulation. Neetwork measu ures are calcullated on the o organization ffor all time periods. Another significant s cha ange in a miliitary organiza ation is the losss of a subord dinate elemen nt. A subordin nate element might m be lost as a a result of a task organizzation changee, radio failuree, or enemy aattack. This ch hange is not mod deled for the infantry i squa ad, since this would w mean llosing half of the organizattion. For the platoon, this change is modeled by isolating i a squ uad at time peeriod 20 for tthe remainderr of the simullation. For the co ompany, this is i also modeleed by isolatin ng a squad at ttime period 2 20 for the rem mainder of thee simulation n. While it is conceivable to isolate any number of in ndividuals in tthe simulation n, these chang ges are used to t demonstratte the perform mance of the SNCD S method ds. Perhaps S SNCD method ds that have similar peerformance co ould be evalua ated under grreater conditio ons of changee in a future p paper. For now w, it is beyond the scope of this t paper to exhaustively address all co onceivable typ pes of networrk change. A similar change is thee addition of a new subordiinate elementt. This is usuaally a result off a task organizatiion change. This T is modeleed by adding a squad in botth the compaany and platoo on level modeels. It Page 16 off 37 is not modeled for a squad, because squad organizations are not usually capable of managing an additional subordinate element. Again, this simple change is used to evaluate SNCD and not meant to be an exhaustive comparison of different types of organizational change. The final type of change simulated, is sporadic communication. Sporadic communication can be either deliberate, or unplanned. An example of deliberate sporadic communication is a reconnaissance operation, where radio power must be conserved and noise discipline is important. An example of unplanned sporadic communication is radio failure. This is modeled in the simulation by introducing a squad from time period 30 to time period 40. Network measures will be recorded throughout the simulation. This change is only modeled for the platoon and company level simulations. Table 1 illustrates the combinations of the virtual experiment. The outputs of the simulation are the graph level measures recorded for each simulated time step. Different SNCD methods are then used to identify possible changes in the network over time. Table 1. Virt ual Exper im ent Variable N u m be r / N atu re o f Valu e s Valu e s N e tw o rk Size 3 9, 30, 100 Typ e o f Ch an ge in N e tw o rk Isolation of leadership 2 Isolated headquarters after 30 time periods Sporadic communication (reconnaissance) 2 Initially absent, present for 10 time periods, then absent for remainder of simulation (omitted for squad) Loss of subordinate unit 2 Removal of the immediate subordinate unit after 30 time periods (omitted for squad) Gain an attached unit 2 Addition of a squad after 30 time periods (omitted for squad) Ce lls 18 3 Network sizes x 4 Changes x 2 Levels – Squad omissions Re p licatio n s 25 In d e p e n d e n t Ru n s 450 Page 17 of 37 The sociall network measures listed in i Table 2 aree measured fo or every simullated network k. Table 2. So ocial Net work Measures M Average Betweenness B Standard d Deviation off Closeness Maximum m Betweennesss Average Eigenvector C Centrality Standard Deviation of Betweenness Maximum m Eigenvecto or Centrality Average Closeness C Minimum m Eigenvecto or Centrality Maximum m Closeness Standard d Deviation off Eigenvectorr Re su lt s The appro oach proposed d in this papeer was found to t be successfful at detectin ng significant events in all d data sets. Figurre 8 displays a plot of the C statistics forr Average Bettweenness oveer time for th he Newcomb Fraternity y data. Recall that the CUS SUM will detect either increeases or decreeases in a meeasure, but no ot both. Therefore, two co ontrol charts must m be run for f each sociaal network meeasure monito ored. In the fiigure, the two lin nes correspon nd to the charrt for detectin ng increases in n the measuree and the charrt for detectin ng decreases in the measu ure over time. The trends in n the data forr the betweenn ness measuree are similar tto the closeness measure. Thee density mea asure is not efffective for ch hange detectio on since the n network is fixeedchoice and d the density remains 0.5 for f every netw work. Decission Interrval Figure e 8. Plot of t he CUSUM C St a t ist ic Over Tim m e for t he Newccom b Frat er nitt y Dat a Page 18 off 37 According g to Figure 8, the control ch hart for avera age betweenneess signals att time period 110 that a chan nge may have occurred in the t social netw work of the frraternity mem mbers. The mo ost likely timee that the chaange actually occurred is thee last time perriod that the C statistic wass equal to 0. T This change p point correspo onds to time peeriod 8 in the Newcomb Frraternity data, which was th he week beforre a mid-sem mester break. IIt is not unrea asonable that social relation nships may have changed o over a break, as participan nts possibly vacationed together. Unfortunately, U , the exact acttivities and dyynamics of thee group are n not completelyy known. However, H this data d does pro ovide evidencee of the imporrtance of the proposed meethod in analyyzing network dynamics. d The Leaveenworth data perhaps prov vides more co ompelling sup pport for SNC CD. Figure 9 illlustrates the C statistics for f average beetweenness ov ver time. Thee chart in Figu ure 9 signals aat time period d 5 that a chan nge in the netw work may hav ve occurred. The T likely tim me the change actually took k place is timee period 3, wh hich coincides with the brigade command der chastising g the memberrs of the grou up. Decision Inteerval Fig gure 9. Plot of t he CUSUM C St at ist ic Over Tim e for t he L Leavenwort h D at a Page 19 off 37 The al Qaeda data set offered o data with w more nod des that were aggregated o over a much laarger time perriod. At the sam me time, we were w able to id dentify at least one major eevent in al Qaeeda’s history.. The question n was asked, “Ca an we identify y September 11 1 from the so ocial networkk?” Perhaps m more importan ntly, “Can we identify th he point in tim me when the organization o changed c and began to plan n the attacks??” Figure 10 sh hows the CUSU UM statistic fo or the averagee betweennesss of the al Qaeeda network. Decision D Interval I Figure 1 0. Plot of Bet weenness w CUSU UM St at ist ic of al Qaeda It can be seen s in Figuree 10 that the CUSUM C statisstic exceeds th he decision in nterval and siignals that theere might be a significant change c in the al Qaeda netw work, detecteed in the year 2000. Therefore, an analyyst monitorin ng al Qaeda would w be alerteed to a critica al, yet subtle cchange in the network prio or to the Septembeer 11 terrorist attacks. The CUSU UM’s built in feature f for deetermining the most likely time that thee change occurred estimatees the change po oint as 1997. For F the densitty and closeneess measures,, this point in n time is also 11997. To understan nd the cause of o the change in the al Qaed da network, aan analyst sho ould look at th he events occurring in al Qaeda’ss internal orga anization and d external opeerating enviro onment in 199 97. Several veery interesting g events relatted to al Qaed da and Islamicc extremism o occurred in 19 997. Six Islam mic militants massacred 58 8 foreign tourrists and at lea ast four Egypttians in Luxo or, Egypt (Jeh hl, 1997). Unitted States and d coalition forrces deployed d to Egypt in 1997 1 for a bi-aannual trainin ng exercise w were repeatedlly attacked by b Islamic millitants. The co oalition suffered numerou us casualties aand shortened d their deploymeent. In early 19 998, Zawahirri and Bin Lad den were publlicly reunited, although baased on press release tim ming, they mu ust have been n working thro oughout 19977 planning futture terrorist o operations. In n February 1998, an Arab b newspaper introduced th he “Internatio onal Islamic F Front for Com mbating Crusaaders and Jews..” This organiization establiished in 1997, was founded d by Bin Ladeen, Zawahiri, lleaders of thee Egyptian Islamic Group p, the Jamiatt-ul-Ulema-e--Pakistan, and d the Jihad M Movement in B Bangladesh, Page 20 off 37 among others. The Front condemned the sins of American foreign policy and called on every Muslim to comply with God’s order to kill the Americans and plunder their money (Marquand, 2001). Six months later the US embassies in Tanzania and Kenya were bombed by al Qaeda. Thus, 1997 was possibly the most critical year in uniting Islamic militants and organizing al Qaeda for offensive terrorist attacks against the United States. It is interesting that the proposed SNCD method identifies and accurately determines when change occurred. Vir t ua l Ex pe r im e n t Re su lt s Using the social simulation program, Construct (Carley, 1990; Carley, 1995; Schrieber & Carley, 2004), the performance of SNCD was explored through simulation. A variety of changes are introduced to the network at a known point. The Cumulative Sum (CUSUM), Exponentially Weighted Moving Average (EWMA), and Scan Statistic, statistical process control charts are applied to several social network graph level measures taken on the network at each time step. The number of time steps between the actual change and the time that an SNCD method “signals” a change will be recorded as the Detection Length. The Average Detection Length (ADL) over multiple independently seeded runs is then a measure of the SNCD method’s performance. The ADL will be compared for different changes and different SNCD parameters. I sola t ion of H e a dqua r t e r s Investigating the isolation of the headquarters element in three different organizations will provide insight into how the network size affects the performance of change detection measures. In each organization (30-man platoon, 100-man company, and 9-man squad); 10 percent of the network was removed. In a sense, the magnitude of change is the same; however, the network size is different. Page 21 of 37 The isolation of the platoon headquarters is modeled by removing the three headquarters members at time period 30 for the duration of the simulation. Social network measures are recorded for all time periods. Table 3 displays the ADL performance of the SNCD methods. It can be seen that the average of the betweenness is a better measure to use for SNCD than either the maximum or the standard deviation of betweenness. This is generally true for all magnitudes of change and sizes of organization investigated. For the closeness measure, both the maximum closeness and average closeness generally outperform the standard deviation of closeness. However, for an EWMA with r = 0.3, the maximum closeness measure has relatively poor performance. This might suggest that the average closeness measure is a more robust measure of change detection. In a single variant, non-network application of the EWMA, the parameter, r, makes the control chart more or less sensitive to a particular magnitude of change (Lucas & Saccucci, 1990; McCulloh, 2004). It is reasonable to consider that for the isolation of a platoon headquarters, the maximum closeness EWMA with r ≤ 0.2 is sensitive to detecting the change, yet the maximum closeness EWMA with r ≥ 0.3 is less sensitive. This will be explored with other magnitudes and types of changes throughout the paper. For eigenvector centrality, the maximum eigenvector centrality and the standard deviation of eigenvector centrality appear to be more sensitive measures of change detection than the average or minimum of the eigenvector centrality. It also appears that the eigenvector centrality measures dominate all other measures for performance in this case. Table 3. ADL Perform ance of SNCD on I solat ion of Plat oon Headquar t ers CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 9.32 8.24 10.16 11.52 6.76 Maximum Betweenness 14.36 14.72 15.72 17.08 13.24 Std. Dev. Betweenness 16.44 16.24 16.92 18.52 15.24 Average Closeness 10.68 9.08 13.60 17.52 10.48 Maximum Closeness 8.76 6.00 10.60 37.96 8.64 Std. Deviation Closeness 34.48 34.72 34.52 35.68 27.08 Average Eigenvector 31.28 31.28 31.28 31.28 24.00 Minimum Eigenvector 14.36 14.36 14.28 15.56 14.88 Maximum Eigenvector 5.24 5.40 5.80 7.52 4.00 Std. Dev. Eigenvector 5.92 4.88 6.40 6.96 3.64 Page 22 of 37 Statistical process control is a powerful statistical method for detecting the change. Figure 11 shows four measures plotted for the same simulated longitudinal networks. The top two plots are the network measure of betweenness over time. The bottom two plots are the CUSUM statistic C calculated on the same betweenness measure over time. The two plots on the left show the measures plotted when there is no change present in the network over time. These plots show stochastic fluctuations induced by the simulation. The two plots on the right show the measures plotted when a change is imposed at time period 20. The change is identified much more clearly using the CUSUM, especially when the reader directs their attention to the scale of the y-axis in the four plots. Baseline Avg. Betweenness Isolation of HQ Avg. Betweenness 0.12 0.1 2 Betweenness Score CUSUM Statistic Value 2.5 1.5 1 0.5 0.08 0.06 0.04 0.02 0 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 1 5 Simulation Time Period Simulation Time Period Baseline Avg. Betweenness Isolation of HQ Avg. Betweenness 80 0.12 70 CUSUM Statistic Value 0.1 Betweenness Score 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 0.08 0.06 0.04 0.02 60 50 40 30 20 10 0 0 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 Simulation Time Period 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 Simulation Time Period Figure 11. Plot s of t he Av erage Bet weenness Cent ralit y ( t op) Com pared t o Plot s of t he CUSUM St at ist ic, C ( bot t om ) for Sit uat ions wit h No Change ( left ) and wit h Change ( r ight ) The visual identification other types of change imposed on the network, and other SNCD schemes yield similar success. The CUSUM is simply used to illustrate the power of the general change detection approach. Other magnitudes and types of change will be compared by simply reporting the ADL from when a change occurs until the SNCD scheme signals. Page 23 of 37 The isolation of the company headquarters was modeled by removing the 10 soldier headquarters section at time 30 for the remainder of the simulation. This is very similar to the platoon example, in that 10 percent of the organization is removed. Social network measures are again recorded for all time periods. Table 4 displays the ADL performance of each of the SNCD methods applied to the 100 node network. Again, it can be seen that the average of the betweenness is a more effective measure of change detection than the maximum or the standard deviation of betweenness. The performance of the closeness measures behave as they did in the case of platoon headquarters isolation. In this case, the maximum eigenvector centrality does not appear to be as effective of a measure for detecting change as does other measures. However, the standard deviation of eigenvector centrality still dominates all other measures for change detection performance. Table 4. ADL Perform ance of SNCD on I solat ion of Com pany Headquart er s CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 11.16 11.08 10.20 13.48 6.96 Maximum Betweenness 17.32 17.76 18.20 20.12 13.72 Std. Dev. Betweenness 18.08 19.40 20.88 22.52 17.36 Average Closeness 11.16 9.44 12.52 15.64 9.40 Maximum Closeness 10.44 9.72 12.64 51.76 9.60 Std. Deviation Closeness 41.88 39.48 42.20 43.44 40.76 Average Eigenvector 35.84 36.72 34.84 34.84 29.24 Minimum Eigenvector 16.00 17.96 17.88 16.76 13.60 Maximum Eigenvector 26.40 30.76 29.64 29.24 25.44 Std. Dev. Eigenvector 10.40 10.72 9.36 9.48 6.44 Page 24 of 37 The isolation of squad leadership was modeled by removing the squad leader at time 20 for the remainder of the simulation. This is also similar in that 11 percent of the organization is isolated. Table 5 shows the SNCD performance at the squad level, 9 node network. It is not clear that certain measures perform better than others for change detection in the 9 node network. It appears that the measures of average betweenness, average closeness, and the standard deviation of eigenvector centrality become better measures of network change as the size of the network increases. However, they do not necessarily perform worse on a small network. While an extensive study of the sensitivity of each measure to the network size is beyond the scope of this paper, it holds the promise of fruitful future research. Table 5. ADL Perform ance of SNCD on I solat ion of Squad Leader CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 16.12 15.76 16.32 17.92 12.32 Maximum Betweenness 16.64 17.40 19.52 18.56 11.56 Std. Dev. Betweenness 17.68 17.76 18.20 18.72 12.08 Average Closeness 15.16 15.84 16.48 15.60 11.72 Maximum Closeness 18.72 19.60 18.68 23.80 14.32 Std. Deviation Closeness 16.20 16.08 15.52 16.24 12.88 Average Eigenvector 24.12 24.12 24.12 24.12 15.12 Minimum Eigenvector 17.84 18.48 17.04 18.08 12.36 Maximum Eigenvector 19.36 21.56 20.56 20.56 13.84 Std. Dev. Eigenvector 17.08 18.72 18.36 17.44 12.36 Page 25 of 37 Loss of Subor dina t e Ele m e n t The loss of a subordinate element provides insight into how the magnitude of change affects change detection performance. For the 30 man platoon and the 100 man company, a nine man squad is isolated. This represents 30 percent of the platoon and 9 percent of the company. This change is obviously not feasible for the nine man squad, since it would involve removal of the entire organization. The infantry platoon had one squad removed from the simulation at time period 20, for the remainder of the simulation. Social network measures were recorded for each time period. The ADL for each measure is reported in Table 6. Again, it can be seen that the average of the betweenness outperforms other betweenness measures. The closeness measures perform as in previously investigated cases. The minimum eigenvector centrality outperforms the maximum eigenvector centrality for most of the SNCD schemes for this particular type and magnitude of change. The standard deviation of eigenvector centrality still outperforms other eigenvector centrality measures, however, it is no longer dominates all other measures. Table 6. ADL Perform ance for Loss of Subordinat e Elem ent in a Plat oon CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 6.96 6.00 8.68 12.16 8.12 Maximum Betweenness 9.52 7.44 11.12 13.24 7.80 Std. Dev. Betweenness 9.16 7.40 9.48 12.72 6.84 Average Closeness 9.64 8.36 12.72 19.28 11.40 Maximum Closeness 9.32 9.16 12.36 31.56 9.52 Std. Deviation Closeness 18.96 16.44 19.40 26.24 17.04 Average Eigenvector 29.36 29.36 29.36 29.36 20.60 Minimum Eigenvector 10.08 9.64 12.24 12.60 10.28 Maximum Eigenvector 11.72 12.04 11.88 20.60 10.84 Std. Dev. Eigenvector 8.48 6.28 9.80 10.44 6.88 Page 26 of 37 The infantry company also had one squad removed at time 20 for the remainder of the simulation. The results for the company network are shown in Table 7. It generally takes longer to detect the changes in the company network. This was also observed in the isolation of the headquarters. This implies that the size of the network could impact the speed of change detection. The average betweenness, average closeness, and standard deviation of eigenvector centrality appear to outperform other measures for change detection performance. The maximum closeness measure dominates other measures in all cases except for the EWMA with r = 0.3. Table 7. ADL Perform ance for Loss of Subordinat e Elem ent in a Com pany CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 13.64 11.72 13.80 20.60 12.68 Maximum Betweenness 23.80 19.64 23.80 30.72 25.44 Std. Dev. Betweenness 24.84 18.12 24.96 25.52 22.04 Average Closeness 9.72 7.4 13.44 14.96 9.80 Maximum Closeness 6.92 4.92 7.48 53.16 6.32 Std. Deviation Closeness 45.44 47.92 47.96 50.88 43.68 Average Eigenvector 34.72 36.60 34.72 34.72 30.64 Minimum Eigenvector 18.68 19.96 19.64 23.88 18.32 Maximum Eigenvector 18.28 25.80 25.00 27.20 25.88 Std. Dev. Eigenvector 9.52 9.92 11.88 15.32 8.72 Page 27 of 37 Addit ion of N e w Su bor din a t e Ele m e n t Another type of change is the addition of a new subordinate element. A squad is added to both the 30man platoon and the 100-man company. The infantry platoon had one squad that was not present initially, and added at time period 20. Social network measures were calculated for each time period. SNCD methods were applied to the data. Results are shown in Table 8. Although the speed of change detection is much faster for this type of change, the same performance trends are seen as before. For betweenness measures, the average outperforms the maximum or the standard deviation. The average closeness and maximum closeness measure perform well, however, the maximum closeness does not perform well with an EWMA r = 0.3 scheme. The standard deviation of eigenvector centrality almost completely dominates other measures. Table 8. ADL Perform ance for Addit ion of Subordinat e Elem ent in a Plat oon CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 1.60 1.52 1.68 1.72 1.00 Maximum Betweenness 2.32 2.16 2.20 2.00 1.00 Std. Dev. Betweenness 2.36 2.36 2.40 2.24 1.00 Average Closeness 1.48 1.52 1.56 1.52 1.00 Maximum Closeness 1.24 1.28 1.20 5.00 1.00 Std. Deviation Closeness 3.44 4.60 4.20 3.48 2.64 Average Eigenvector 31.76 31.76 31.76 31.76 25.56 Minimum Eigenvector 6.24 5.6 6.16 6.80 4.20 Maximum Eigenvector 4.52 4.88 4.80 4.80 3.56 Std. Dev. Eigenvector 1.16 1.60 1.24 1.24 1.00 Page 28 of 37 The company model had a squad added at time period 20 for the remainder of the simulation. Again the platoon level performance is better than the company level performance, shown in Table 9. The average betweenness, average closeness, and maximum closeness all perform well at detecting the change. Surprisingly, the standard deviation of eigenvector centrality is not an effective measure for this type and magnitude of change. Table 9. ADL Perform ance for Addit ion of Subordinat e Elem ent in a Com pany CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 9.64 9.52 9.84 10.28 5.04 Maximum Betweenness 14.52 16.96 15.80 17.44 12.16 Std. Dev. Betweenness 12.88 13.16 13.32 14.56 8.92 Average Closeness 5.32 5.8 5.36 5.24 1.44 Maximum Closeness 4.24 5.12 4.48 6.04 1.04 Std. Deviation Closeness 10.40 18.52 12.96 12.32 10.00 Average Eigenvector 35.56 37.04 38.64 37.60 30.24 Minimum Eigenvector 38.16 39.32 38.04 40.84 36.40 Maximum Eigenvector 30.20 33.48 34.44 29.52 30.92 Std. Dev. Eigenvector 33.88 33.72 37.80 44.48 33.96 Page 29 of 37 Spor a dic Com m u n ica t ion Sporadic communication was modeled with a squad communicating from time period 30 to time period 40 only. It can be seen in Table 10 that the performance of different measures is much more similar than in previous types of change. It is also interesting that all of the ADL values are greater than 10, which means that the change was detected after the organization returned to its original state. This might be a result of the SNCD statistic being moved closer to the decision interval from time period 30 to time period 40. When the organization returned to its original state, the statistic is much closer to the decision interval than it was before the change occurred. Therefore, the statistic is much more likely to signal a false positive after the sporadic change than it is to detect an actual change. This increased sensitivity can therefore provide an alert that a sporadic change may have occurred. Table 10. ADL Perform ance for Sporadic Com m unicat ion CUSUM k = 0.5 EWMA r = 0.1 EWMA r = 0.2 EWMA r = 0.3 Scan Statistic Average Betweenness 15.08 14.20 16.12 17.56 17.76 Maximum Betweenness 15.24 16.52 16.88 18.24 17.84 Std Dev. Betweenness 14.28 14.80 16.04 17.40 17.48 Average Closeness 13.72 13.68 16.84 16.80 17.52 Maximum Closeness 12.44 12.16 15.32 18.32 17.20 Std Deviation Closeness 23.16 19.96 21.76 21.36 17.24 Average Eigenvector 24.32 24.32 24.32 24.32 18.84 Minimum Eigenvector 12.76 14.32 11.92 12.80 14.56 Maximum Eigenvector 12.96 12.68 14.36 14.36 18.84 Std. Dev Eigenvector 12.88 14.20 16.80 16.48 21.28 All methods of SNCD were ineffective for detecting sporadic changes in the company network. The sporadic change did not persist long enough to signal a possible change in most of the runs. The squad level network was not investigated for this type of change, due to a lack of context. Page 30 of 37 Con clu sion Statistical process control is a critical quality-engineering tool that provides rapid detection of change in stochastic processes (Montgomery, 1991; Ryan, 2000). The three real-world examples and the virtual experiments presented in this paper demonstrate that SNCD could enable analysts and researchers to detect important changes in longitudinal network data. Furthermore, the most likely time that the change occurred can also be determined. This allows one to allocate minimal resources to tracking the general patterns of a network and then shift to full resources when changes are determined.3 SNCD is therefore, an important analysis method for studying network dynamics. It is critical to be able to detect change in networks over time and to determine when observed fluctuations are not simply stochastic noise. This paper describes a method for change detection based off of statistical process control, and then demonstrates its ability to detect changes in networks. Within this method, three specific control chart schemes for detecting change were considered: CUSUM, Exponentially Weighted Moving Average, and a Scan Statistic. No doubt other change detection methods will emerge and control chart schemes will emerge. We found the CUSUM technique to be robust and to be of value in applied settings. The strengths of the proposed method are its statistical approach, its utility with a wide range of social network metrics, its ability to identify change points in organizational behavior, and its flexibility for various magnitudes of change. The proposed method requires the assumption of a period of stability that is necessary to estimate the mean and standard deviation of social network measures for “typical” network observations. In addition, the proposed method requires a reasonable number of time periods in which to detect change; i.e., greater than four. The empirical results described in this paper, such as the detection of change in the al Qaeda network should be viewed with caution. We present them here purely to illustrate the methodology. Limitations on the data make it difficult to determine the validity of the results; thus, we should simply view these results as showing the promise of this methodology. The Leavenworth data spans only four days and used selfreported survey data, therefore it is not likely that it captured all communication and interaction among officers. The fact that even in this data set we were able to systematically detect a key change suggests the value of the proposed approach. The al Qaeda data, was based on open source information. As such it is an incomplete representation of interaction in that terror network. We cannot be sure that we have the entire communication network, or even a true picture of the observed communication network. However, the fact that our technique detects a change corresponding with the 9/11 attacks is intriguing. This work suggests that our approach may provide some ability to detect change even when there is incomplete information. That being said, it is important that future work examine the errors associated with this technique, both the false positives and false negatives. Future work should also consider the sensitivity of this approach to missing information, and to the reason why the information is missing. For example, data sets collected post-hoc that focus on activity around an event, such as the al Qaeda data are prone to errors of missing nodes and as a result links prior to the event. In addition, open-source data tends to over-focus on nodes whose centrality is assumed; often resulting in “popular” actors being possibly over-connected and less popular actors being under-connected. Whereas, data sets collected based on opportunity, such as the Leavenworth data, are prone to missing links among the nodes. In order to rectify the above shortcomings, future research should focus on improved methods for node and link inference or near-complete datasets with high resolution. Higher resolution involves taking many snapshots of the network. This may mean, simply an increase in frequency, e.g. changes by month, or it Page 31 of 37 may mean a longer time horizon, e.g., more years. The right choice will depend on the problem where we want to detect network change. More data points will provide more opportunities to detect changes while they are still small, instead of allowing them to incubate and grow as was the case for the al Qaeda data. As a minimum two observed networks are required to estimate the “typical” behavior of a social group being monitored for change. In practice, five or more networks are preferred to reduce the variance in estimating the statistical process control parameters. Larger datasets will also provide near continuous network measures permitting the use of control charts for continuous data. Near complete data means that the data should cover the communication network, with little or no missing information for a large contiguous period. Here one might consider simply tracking a group in general, as opposed to focusing on tracking relative to a specific event. Data such as that on the U.S. Congress or Supreme Court that is regularly output might provide a good source of data. Another limitation of this approach is that the over-time dependence assumptions are ignored. This is common in statistical process control. English et al. (2001) points out that “the independence assumption is dramatically violated in processes subjected to process control.” Many manufacturing processes include feedback control systems which create autocorrelation among factors affecting the process. This is similar to problems of dyadic dependence and ergodicity issues with networks. In practice however, statistical process control still provides a great deal of insight, identifying when a process changes. This is no different in a network application. Networks may even have less dependence issues than manufacturing processes. Most manufacturing processes are engineered with feedback and control in an attempt to optimize the process. This is not necessarily true with social networks. Robins and Pattison (2007) lay out several statistical tests involving dependence graphs that can be used to determine if dependence is a statistically significant problem in a network. Just like the issues of normality, the dyadic dependence in the network can be verified similar to residual analysis in regression. If dependence is an issue in the network, SNCD can still be used to determine that a change occurred, however, there may be bias and an increase in the probability of a false positive. Future research should investigate both the impact of dependence on ADL performance as well as methods to better handle the problem statistically. Social networks may also exhibit periodicity over time. Intuitively, people’s communication patterns may change in cycles over time. People tend to communicate with different people during the week, while at work, than on the weekends. People may communicate more frequently at certain times of the day. Even seasonal trends may affect observed social networks. The application of wavelet theory and Fourier analysis in particular may provide insight into the periodic behavior of network dynamics. Methods should be developed to test and filter periodicity from network measures over time. This will allow SNCD to be more accurate in determining the time a change actually occurred and may reduce the ADL for certain changes. Future research should also look at the sensitivity of the optimality constant, k and control limit values of the CUSUM control chart for network measure change detection. As stated earlier, these values are generally arbitrarily chosen and then optimized for the process. By using further Monte Carlo simulations, a researcher should determine which parameter value would be best in detecting certain types of changes such as sudden large changes or slow creeping shifts. Usage of control charts on comparing models and observations should also be studied to see what specific conclusions can be obtained. Multi-agent simulations provide valuable insight into the performance of control charts for social network change detection applications. Simulations allow an investigator to introduce various changes into a simulated organization and evaluate the time to detect for different algorithms. Simulations provide an efficient means of evaluating change detection on social networks. More importantly, however, is the ability to create more controlled experiments, by fixing certain variables, exploring others, and using Page 32 of 37 many replications to estimate error. Simulation studies will continue to be extremely useful in exploring extensions of this methodology. Social network change detection is important for identifying significant shifts in organizational behavior. This provides insight into policy decisions that drive the underlying change. It also shows the promise of enabling predictive analysis for social networks and providing early warning of potential problems. In the same way that manufacturing firms save millions of dollars each year by quickly responding to changes in their manufacturing process, social network change detection can allow senior leaders and military analysts to quickly respond to changes in the organizational behavior of the socially connected groups they observe. The combination of statistical process control and social network analysis is likely to produce significant insight into organizational behavior and social dynamics. As a scientific community we can hope to see more research in this area as network statistics continue to improve. Re fe r e nce s Alderson, D. (2009). “Catching the ‘Network Science’ Bug: Insight and Opportunities for the Operations Researchers,” Operations Research 56, 5: 1047–1065. Baller, D., J. Lospinoso & A.N. Johnson (2008). “An Empirical Method for the Evaluation of Dynamic Network Simulation Methods.” In Proceedings of The 2008 World Congress in Computer Science Computer Engineering and Applied Computing, Las Vegas, NV. Banks, D.L., & K.M. Carley (1996). “Models for Network Evolution.” Journal of Mathematical Sociology 21: 173-196. Bernard, H.R. & P.D. Killworth (1977). “Informant Accuracy in Social Network Data II.” Human Communications Research 4: 3-18. Bonacich, P. (1972). “Factoring and Weighting Approaches to Clique Identification.” Journal of Mathematical Sociology 2: 113–120. Bonacich, P., A. Oliver & T.A.B. Snijders (1998). “Controlling for Size in Centrality Scores.” Social Networks 20, 2: 135-141. Brown, R.A. & D.D. Morrow (1994). Critical Theory and Methodology. Thousand Oaks, CA: Sage. Carley, K.M. (1990). “Group Stability: A Socio-Cognitive Approach.” Advances in Group Processes 7: 1-44. Carley, K.M. (1991). “A Theory of Group Stability.” American Sociology Review 56, 3: 331–354. Carley, K.M. (1995). “Communication Technologies and Their Effect on Cultural Homogeneity, Consensus, and the Diffusion of New Ideas.” Sociological Perspectives 38, 4: 547-571. Carley, K.M. (1999). “On the Evolution of Social and Organizational Networks.” Research in the Sociology of Organizations 16: 3-30. Carley, K.M. (2006). “A Dynamic Network Approach to the Assessment of Terrorist Groups and the Impact of Alternative Courses of Action.” In Visualising Network Information Meeting Proceedings RTOMP-IST-063. Neuilly-sur-Seine, France: RTO. Available: http://www.vistg.net/documents/IST063_PreProceedings.pdf [January 7, 2011]. Page 33 of 37 Carley, K.M., J. Reminga, J. Storrick, & M. De Reno (2009). *ORA User’s Guide 2009. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report CMU-ISR-09115. Available: http://www.casos.cs.cmu.edu/publications/papers/CMU-ISR-09-115.pdf [January 7, 2011]. Carley,K.M., M.K. Martin & B. Hirshman (2009). “The Etiology of Social Change,” Topics in Cognitive Science 1, 4. Coleman, T. F. & J.J. Moré (1983). “Estimation of Sparse Jacobian Matrices and Graph Coloring Problems.” SIAM Journal on Numerical Analysis 20, 1: 187–209. Doreian, P. (1983). “On the Evolution of Group and Network Structures II: Structures within Structure.” Social Networks 8: 33-64. Doreian, P. & F.N. Stokman (Eds.) (1997). Evolution of Social Networks. Amsterdam: Gordon and Breach. English, J.R., T. Martin, E. Yaz & E. Elsayed (2001). “Change Point Detection and Control Using Statistical Process Control and Automatic Process Control.” Presentation at the IIE Annual Conference, 2001, Dallas, TX. Erdős, P. & A. Rényi (1959). “On Random Graphs I.” Publicationes Mathematicae 6: 290–297. Feld, S. (1997). “Structural Embeddedness and Stability of Interpersonal Relations.” Social Networks 19: 91-95. Fisher, R.A., H. Thornton & W. Mackenzie (1922). “The Accuracy of the Plating Method of Estimating the Density of Bacterial Populations, with Particular Reference to the Use of Thornton’s Agar Medium with Soil Samples.” Annals of Applied Biology 9: 325–359. Frank, O. (1991). “Statistical Analysis of Change in Networks.” Statistica Neerlandica 45: 283–293. Freeman, L. (1977). “A Set of Measures of Centrality Based on Betweenness.” Sociometry 40: 35-41. Freeman, L. (1979). “Centrality in Social Networks I: Conceptual Clarification.” Social Networks 1: 215239. Hamming, R.W. (1950). “Error Detecting and Error Correcting Codes.” Bell System Technical Journal 26, 2:147-160. Handcock, M. S. (2003). “Assessing Degeneracy in Statistical Models of Social Networks.” Working Paper No. 39. Center for Statistics and the Social Sciences, University of Washington. Available: http://www.csss.washington.edu/Papers/wp39.pdf [January 7, 2011]. Headquarters, Department of the Army (1992). Field Manual 7-8, Infantry Rifle Platoon and Squad. U.S. Army Infantry School, Ft. Benning, GA. Holland, P. & S. Leinhardt (1977). “A Dynamic Model for Social Networks.” Journal of Mathematical Sociology 5, 5-20. Page 34 of 37 Huisman, M., & T.A.B. Snijders (2003). “Statistical Analysis of Longitudinal Network Data with Changing Composition.” Sociological Methods and Research 32: 253-287. Hunter, J.S. (1986). “The Exponentially Weighted Moving Average.” Journal of Quality and Technology 18: 203-210. Jehl, D. (1997). “Islamic Militants Attack Tourists in Egypt.” The New York Times, November 23, 1997. p. WK2. Johnson, J.C., J.S. Boster & L.A. Palinkas (2003). “Social Roles and the Evolution of Networks in Extreme and Isolated Environments.” Journal of Mathematical Sociology 27: 89-121. Katz, L. & C.H. Proctor (1959). “The Configuration of Interpersonal Relations in a Group as a TimeDependent Stochastic Process.” Psychometrika 24: 317-327. Killworth, P.D. & H.R. Bernard (1976). “Informant Accuracy in Social Network Data.” Human Organization 35:269-286. Krackhardt, D. (1987). “QAP Partialling as a Test of Spuriousness.” Social Networks 9: 171-186. Krackhardt, D. (1992). “A Caveat on the Use of the Quadratic Assignment Procedure.” Journal of Quantitative Anthropology 3: 279-296. Krackhardt, D. (1998). “Simmelian Tie: Super Strong and Sticky.” In R. Kramer & M. Neale (Eds.), Power and Influence in Organizations. Thousand Oaks, CA: Sage, 21-38. Leenders, R. (1995). “Models for Network Dynamics: A Markovian Framework.” Journal of Mathematical Sociology 20: 1-21. Lucas, J.M. & M.S. Saccucci (1990). “Exponentially Weighted Moving Average Control Schemes: Properties and Enhancements.” Technometrics 32: 1-12. Marquand, R. (2001). “The Tenets of Terror.” Christian Science Monitor, October 18, 2001. McCulloh, I., G. Garcia, K. Tardieu, J. MacGibbon, H. Dye, K. Moores, J.M. Graham & D.B. Horn (2007). IkeNet: Social Network Analysis of Email Traffic in the Eisenhower Leadership Development Program. (Technical Report, No. 1218). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. McCulloh, I., J. Lospinoso & K.M. Carley (2007). “Social Network Probability Mechanics.” In Proceedings of the World Scientific Engineering Academy and Society 12th International Conference on Applied Mathematics, Cairo, Egypt, December 29-31, 2007. McCulloh, I., B. Ring, T. Frantz, & K.M. Carley (2008). “Unobtrusive Social Network Data from Email.” In Proceedings, 26th Army Science Conference. Orlando, FL, December 1-4, 2008. McCulloh, I. (2004). Generalized Cumulative Sum Control Charts. Master’s Thesis, The Florida State University. Montgomery, D.C. (1991). Introduction to Statistical Quality Control, 2nd edition. New York: John Wiley and Sons. Page 35 of 37 Moustakides, G.V. (2004). “Optimality of the CUSUM Procedure in Continuous Time.” Annals of Statistics 32, 1: 302-315. Naus, J. (1965). “Clustering of Random Points in Two Dimensions.” Biometrika 52: 263-267. Newcomb, T.N. (1961). The Acquaintance Process. New York: Holt, Rinehart and Winston. Page, E.S. (1961). “Cumulative Sum Control Charts.” Technometrics 3: 1-9. Priebe, C.E., J.M. Conroy, D.J. Marchette & P. Youngser (2005). “Scan Statistics on Enron Graphs.” Computational and Mathematical Organization Theory 11: 229-247. Ring, B., S. Henderson & I. McCulloh (2008). “Gathering and Studying Email Traffic to Understand Social Networks.” In H.R. Arabnia & R.R. Hashemi (Eds.), Proceedings of the 2008 International Conference on Information and Knowledge Engineering, IKE 2008, July 14-17, 2008. Las Vegas, NV: CSREA Press, 338-343. Roberts, S.V. (1959). “Control Chart Tests Based on Geometric Moving Averages.” Technometrics 1: 239250. Robins, G. & P. Pattison (2001). “Random Graph Models for Temporal Processes in Social Networks.” Journal of Mathematical Sociology 25: 5-41. Robins, G. & P. Pattison (2007). “Interdependencies and Social Processes: Dependence Graphs and Generalized Dependence Structures.” In: P. Carrington, J. Scott & S. Wasserman (Eds.), Models and Methods in Social Network Analysis. New York: Cambridge University Press, 192-214. Rogers, E.M. (2003). Diffusion of Innovations, 5th edition. New York, NY: Free Press. Romney, A.K. (1989). “Quantitative Models, Science and Cumulative Knowledge.” Journal of Quantitative Anthropology 1: 153-223. Ryan, T. P. (2000). Statistical Methods for Quality Improvement, 2nd edition. Wiley. Saccucci, M.S. & J.M. Lucas (1990). “Average Run Lengths for Exponentially Weighted Moving Average Control Schemes Using the Markov Chain Approach.” Journal of Quality Technology 22: 154-159. Sampson, S.F. (1969). Crisis in a Cloister. Ph.D. Thesis, Ithaca, NY: Cornell University. Sanil, A., D. Banks & K.M. Carley (1995). “Models for Evolving Fixed Node Networks: Model Fitting and Model Testing.” Social Networks 17, 1: 65-81. Schreiber, C. & K.M. Carley (2004). Construct; A Multi-agent Network Model for the Co-Evolution of Agents and Socio-Cultural Environments. Carnegie Mellon University, School of Computer Science, Institute for Software Research International, Technical Report, CMU-ISRI-04-109. Available: http://reports-archive.adm.cs.cmu.edu/anon/isri2004/CMU-ISRI-04-109.pdf [January 7, 2011]. Shewhart, W.A. (1927). “Quality Control.” Bell Systems Technical Journal 6, 4 (October 1927): 722-735. Page 36 of 37 Snijders, T. A. B., & M.A.J. Van Duijn (1997). “Simulation for Statistical Inference in Dynamic Network Models.” In R. Conte, R. Hegselmann & P. Tera (Eds.), Simulating Social Phenomena. Berlin: Springer, 493-512. Snijders, T.A.B. (1990). “Testing for Change in a Digraph at Two Time Points.” Social Networks 12: 539573. Snijders, T.A.B. (1996). “Stochastic Actor-Oriented Models for Network Change.” Journal of Mathematical Sociology 21: 149-172. Snijders, T.A.B. (2001). “The Statistical Evaluation of Social Network Dynamics.” In: Sobel, M.E. & M.P. Becker (Eds.), Sociological Methodology. Boston: Basil Blackwell, 361-395. Snijders, T.A.B. (2007). “Models for Longitudinal Network Data.” In: P. Carrington, J. Scott & S. Wasserman (Eds.), Models and Methods in Social Network Analysis. New York: Cambridge University Press, 148–161. Snijders, T.A.B., C.E.G. Steglich, M, Schweinberger & M. Huisman (2007). Manual for SIENA version 3.1. University of Groningen: ICS/Department of Sociology; University of Oxford: Department of Statistics. Available: http://stat.gamma.rug.nl/sie_man31.pdf [January 7, 2011]. Van de Bunt, G.G., M.A.J. Van Duijin & T.A.B. Snijders (1999). “Friendship Networks through Time: An Actor-Oriented Statistical Network Model.” Computational and Mathematical Organization Theory 5: 167-192. Wasserman, S. (1977). Stochastic Models for Directed Graphs. Ph.D. dissertation, Harvard University, Department of Statistics, Cambridge, MA. Wasserman, S. (1979). “A Stochastic Model for Directed Graphs with Transition Rates Determined by Reciprocity.” In K.F. Schuessler (Ed.), Sociological Methodology. San Francisco: Jossey-Bass, 392-412. Wasserman, S. (1980). “Analyzing Social Networks as Stochastic Processes.” Journal of American Statistical Association 75: 280-294. Wasserman, S. (2007). “Introduction.” In P.J. Carrington, J. Scott, & S. Wasserman (Eds.), Models and Methods in Social Network Analysis. New York: Cambridge University Press. Wasserman, S. & D. Iacobucci (1988). “Sequential Social Network Data.” Psychometrika 53, 2: 261-282. Wasserman, S., & K. Faust (1994). Social Network Analysis: Methods and Applications. New York: Cambridge University Press. 1 *ORA can be downloaded from http://www.casos.cs.cmu.edu/projects/ora/ [January 7, 2011]. Construct is available at http://www.casos.cs.cmu.edu/projects/construct [January 7, 2011]. 3 Three social network change detection algorithms (Shewhart X-Bar, Cumulative Sum, and Exponentially Weighted Moving Average) are available in the “Statistical Network Monitoring Report” in the software tool, Organizational Risk Analyzer (ORA) available through the Center for Computational Analysis of Social and Organizational Systems (CASOS), http://www.casos.cs.cmu.edu [January 7, 2011]. 2 Page 37 of 37