Skip to main content
Ian McCulloh
  • Laurel, Maryland, United States
  • Ian McCulloh is the chief data scientist for Accenture Federal Services. His current work focuses on the application... moreedit
To systematically understand the effects of vulnerabilities introduced by AI/ML-enabled Army Multi-domain Operations, we provide an overview of characterization of ML attacks with an emphasis on black-box vs. white-box attacks. We then... more
To systematically understand the effects of vulnerabilities introduced by AI/ML-enabled Army Multi-domain Operations, we provide an overview of characterization of ML attacks with an emphasis on black-box vs. white-box attacks. We then study a system and attack model for Army MDO applications and services, and introduce the roles of stakeholders in this system. We show, in various attack scenarios and under different knowledges of the deployed system, how peer adversaries can employ deceptive techniques to defeat algorithms, and how the system should be designed to minimize the attacks. We demonstrate the feasibility of our approach in a cyber threat intelligence use case. We conclude with a path forward for design and policy recommendations for robust and secure deployment of AI/ML applications in Army MDO environments.
To facilitate the widespread acceptance of AI systems guiding decision-making in real-world applications, it is key that solutions comprise trustworthy, integrated human-AI systems. Not only in safety-critical applications such as... more
To facilitate the widespread acceptance of AI systems guiding decision-making in real-world applications, it is key that solutions comprise trustworthy, integrated human-AI systems. Not only in safety-critical applications such as autonomous driving or medicine, but also in dynamic open world systems in industry and government it is crucial for predictive models to be uncertainty-aware and yield trustworthy predictions. Another key requirement for deployment of AI at enterprise scale is to realize the importance of integrating human-centered design into AI systems such that humans are able to use systems effectively, understand results and output, and explain findings to oversight committees. While the focus of this symposium was on AI systems to improve data quality and technical robustness and safety, we welcomed submissions from broadly defined areas also discussing approaches addressing requirements such as explainable models, human trust and ethical aspects of AI.
Abstract Research in network monitoring spans a large and growing number of disciplines, including mathematics, physics, computer science, and statistics. Here, the panelists discuss the advantages and disadvantages of the... more
Abstract Research in network monitoring spans a large and growing number of disciplines, including mathematics, physics, computer science, and statistics. Here, the panelists discuss the advantages and disadvantages of the interdisciplinary nature of the area. It is largely agreed that integrating expertise from many disciplines drives innovation in network monitoring development, but several notable barriers are discussed that limit the area’s full potential.
Abstract In this article, the panelists broadly discuss the definition of network monitoring, and how it may be similar to or different from network surveillance and network change-point detection. The discussion uncovers ambiguity and... more
Abstract In this article, the panelists broadly discuss the definition of network monitoring, and how it may be similar to or different from network surveillance and network change-point detection. The discussion uncovers ambiguity and contradictions associated with these terms and we argue that this lack of clarity is detrimental to the field. The panelists also describe existing and emerging applications of network monitoring, which serves to illustrate the wide applicability of the tools and research associated with the field.
One of the most asked questions about ISIS during its occupation of large swathes Iraq is this: What was it like to live under the governance of the group? Using data collected from ordinary Iraqis, the chapter attempts to give a picture... more
One of the most asked questions about ISIS during its occupation of large swathes Iraq is this: What was it like to live under the governance of the group? Using data collected from ordinary Iraqis, the chapter attempts to give a picture of everyday life in ISIS-occupied Iraq. Most Sunni Iraqis who experienced the arrival of ISIS, particularly in Mosul, say the group was largely accepted at first, as an alternative to what was viewed as a corrupt, abusive, and sectarian Iraqi state. In retrospect, however, many of the people interviewed about ISIS’s governance thought that although ISIS was superior in some aspects of governance to the Iraqi state, the group largely wore out its welcome through its brutal imposition of an interpretation of sharia that was far more extreme than even relatively conservative Sunni Iraqis were willing to accept.
Social neuroscience research has demonstrated that those who are like-minded are also ‘like-brained.’ Studies have shown that people who share similar viewpoints have greater neural synchrony with one another, and less synchrony with... more
Social neuroscience research has demonstrated that those who are like-minded are also ‘like-brained.’ Studies have shown that people who share similar viewpoints have greater neural synchrony with one another, and less synchrony with people who ‘see things differently.’ Although these effects have been demonstrated at the ‘group level,’ little work has been done to predict the viewpoints of specific ‘individuals’ using neural synchrony measures. Furthermore, the studies that have made predictions using synchrony-based classification at the individual level used expensive and immobile neuroimaging equipment (e.g. functional magnetic resonance imaging) in highly controlled laboratory settings, which may not generalize to real-world contexts. Thus, this study uses a simple synchrony-based classification method, which we refer to as the ‘neural reference groups’ approach, to predict individuals’ dispositional attitudes from data collected in a mobile ‘pop-up neuroscience’ lab. Using fun...
As a result of the COVID-19 pandemic, many organizations and schools have switched to a virtual environ-ment. Recently, as vaccines have become more readily available, organizations and educational institutions have started shifting from... more
As a result of the COVID-19 pandemic, many organizations and schools have switched to a virtual environ-ment. Recently, as vaccines have become more readily available, organizations and educational institutions have started shifting from virtual environments to physical office spaces and schools. For the highest level of safety and caution with respect to the containment of COVID-19, the shift to in-person interaction requires a thoughtful approach. With the help of an Integer Programming (IP) Optimization model, it is possible to formulate the objective function and constraints to determine a safe way of returning to the office through cohort development. In addition to our IP formulation, we developed a heuristic approximation method. Starting with an initial contact matrix, these methods aim to reduce additional contacts introduced by subgraphs representing the cohorts. These formulations can be generalized to other applications that benefit from constrained community detection.
This paper examines quantity and quality superposter value creation within Coursera Massive Open Online Courses (MOOC) forums using a social network analysis (SNA) approach. The value of quantity superposters (i.e. students who post... more
This paper examines quantity and quality superposter value creation within Coursera Massive Open Online Courses (MOOC) forums using a social network analysis (SNA) approach. The value of quantity superposters (i.e. students who post significantly more often than the majority of students) and quality superposters (i.e. students who receive significantly more upvotes than the majority of students) is assessed using Stochastic Actor-Oriented Modeling (SAOM) and network centrality calculations. Overall, quantity and quality superposting was found to have a significant effect on tie formation within the discussion networks. In addition, quantity and quality superposters were found to have higher-than-average information brokerage capital within their networks.
Changes in observed social networks may signal an underlying change within an organization, and may even predict significant events or behaviors. The breakdown of a team’s effectiveness, the emergence of informal leaders, or the... more
Changes in observed social networks may signal an underlying change within an organization, and may even predict significant events or behaviors. The breakdown of a team’s effectiveness, the emergence of informal leaders, or the preparation of an attack by a clandestine network may all be associated with changes in the patterns of interactions between group members. The ability to systematically, statistically, effectively and efficiently detect these changes has the potential to enable the anticipation, early warning, and faster response to both positive and negative organizational activities. By applying statistical process control techniques to social networks we can rapidly detect changes in these networks. Herein we describe this methodology and then illustrate it using four data sets, of which the first is the Newcomb fraternity data, the second set of data is collected on a group of mid-career U.S. Army officers in a week long training exercise, the third is the perceived con...
After more than a year of non-pharmaceutical interventions, such as, lock-downs and masks, questions remain on how effective these interventions were and could have been. The vast differences in the enforcement of and adherence to... more
After more than a year of non-pharmaceutical interventions, such as, lock-downs and masks, questions remain on how effective these interventions were and could have been. The vast differences in the enforcement of and adherence to policies adds complexity to a problem already surrounded with significant uncertainty. This necessitates a model of disease transmission that can account for these spatial differences in interventions and compliance. In order to measure and predict the spread of disease under various intervention scenarios, we propose a Microscopic Markov Chain Approach (MMCA) in which spatial units each follow their own Markov process for the state of disease but are also connected through an underlying mobility matrix. Cuebiq, an offline intelligence and measurement company, provides aggregated, anonymized cell-phone mobility data which reveal how population behaviors have evolved over the course of the pandemic. These data are leveraged to infer mobility patterns across regions and contact patterns within those regions. The data enables the estimation of a baseline for how the pandemic spread under the true ground conditions, so that we can analyze how different shifts in mobility affect the spread of the disease. We demonstrate the efficacy of the model through a case study of spring break and it’s impact on how the infection spread in Florida during the spring of 2020, at the onset of the pandemic. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
ISIS and similar extremist communities are increasingly using forums in the darknet to connect with each other and spread news and propaganda. In this paper, we attempt to understand their network in an online forum by using descriptive... more
ISIS and similar extremist communities are increasingly using forums in the darknet to connect with each other and spread news and propaganda. In this paper, we attempt to understand their network in an online forum by using descriptive statistics, an exponential random graph model (ERGM) and Topic Modeling. Our analysis shows how the cohesion between active members forms and grows over time and under certain thread topics. We find that the top attendants of the forum have high centrality measures and other attributes of influencers.
: The US Army Research Laboratory (ARL) is currently conducts tests on anti-ballistic armor for military uses. This research is concerned with determining the limit velocity (vL.) of different target penetrator combinations. The limit... more
: The US Army Research Laboratory (ARL) is currently conducts tests on anti-ballistic armor for military uses. This research is concerned with determining the limit velocity (vL.) of different target penetrator combinations. The limit velocity is the highest velocity a penetrator can have without penetrating the targe. Unfortunately, penetration processes are highly complex and an effective first principles derivation of vL has not been discovered. Estimation of vL is therefore done empirically. Furthermore, ballistics tests can be very expensive, resulting in a small size sample with which to perform statistical data analysis. There are two ballistics testing methods commonly used to estimate vL. The Jonas Lambert method involves measuring the residual velocity of the projectile after perforation. The bisection method or V 50 simply evaluates the perforation without residual velocity. The second method is significantly less expensive. Simulation is used to model both of the common ...
Bots are often identified on social media due to their behavior. How easily are they identified, however, when they are dormant and exhibit no measurable behavior at all, except for their silence? We identified “dormant bot networks”... more
Bots are often identified on social media due to their behavior. How easily are they identified, however, when they are dormant and exhibit no measurable behavior at all, except for their silence? We identified “dormant bot networks” positioned to influence social media discourse surrounding the 2018 U.S. senate election. A dormant bot is a social media persona that does not post content yet has large follower and friend relationships with other users. These relationships may be used to manipulate online narratives and elevate or suppress certain discussions in the social media feed of users. Using a simple structure-based approach, we identify a large number of dormant bots created in 2017 that begin following the social media accounts of numerous US government politicians running for re-election in 2018. Findings from this research were used by the U.S. Government to suspend dormant bots prior to the elections to prevent any malign influence campaign. Application of this approach ...
Abstract Traditional statistical process monitoring (SPM) provides a useful starting point for framing and solving network monitoring problems. In this paper the panelists discuss similarities and differences between the two fields and... more
Abstract Traditional statistical process monitoring (SPM) provides a useful starting point for framing and solving network monitoring problems. In this paper the panelists discuss similarities and differences between the two fields and they describe many challenges and open problems in contemporary network monitoring research. The panelists also discuss potential outlets and avenues for disseminating such research.
Basketball is an inherently social sport, which implies that social dynamics within a team may influence the team's performance on the court. As NBA players use social media, it may be possible to study the social structure of a team... more
Basketball is an inherently social sport, which implies that social dynamics within a team may influence the team's performance on the court. As NBA players use social media, it may be possible to study the social structure of a team by examining the relationships that form within social media networks. This paper investigates the relationship between publicly available online social networks and quantitative performance data. It is hypothesized that network centrality measures for an NBA team's network will correlate with measurable performance metrics such as win percentage, points differential and assists per play. The hypothesis is tested using exponential random graph models (ERGM) and investigating correlation between network and performance variables. The results show that there are league-wide trends correlating certain network measures with game performance, and also quantifies the effects of various player attributes on network formation.
The United States is becoming increasingly politically divided. In addition to polarization between the two-major political parties, there is also divisiveness in intra-party dynamics. In this paper, we attempt to understand these... more
The United States is becoming increasingly politically divided. In addition to polarization between the two-major political parties, there is also divisiveness in intra-party dynamics. In this paper, we attempt to understand these intraparty divisions by using an exponential random graph model (ERGM) to compute a political cohesion metric to quantify the strength within the party at a given point in time. The analysis is applied to the 105th through 113th congressional sessions of the House of Representatives. We find that the Republican party not only generally exhibits stronger intra-party cohesion, but when voting patterns are broken out by topic, the party has a higher and more consistent cohesion factor compared to the Democratic Party.
The novel coronavirus, SARS-CoV-2, commonly known as COVID19 has become a global pandemic in early 2020. The world has mounted a global social distancing intervention on a scale thought unimaginable prior to this outbreak; however, the... more
The novel coronavirus, SARS-CoV-2, commonly known as COVID19 has become a global pandemic in early 2020. The world has mounted a global social distancing intervention on a scale thought unimaginable prior to this outbreak; however, the economic impact and sustainability limits of this policy create significant challenges for government leaders around the world. Understanding the future spread and growth of COVID19 is further complicated by data quality issues due to high numbers of asymptomatic patients who may transmit the disease yet show no symptoms; lack of testing resources; failure of recovered patients to be counted; delays in reporting hospitalizations and deaths; and the co-morbidity of other life-threatening illnesses. We propose a Monte Carlo method for inferring true case counts from observed deaths using clinical estimates of Infection Fatality Ratios and Time to Death. Findings indicate that current COVID19 confirmed positive counts represent a small fraction of actual...
... Pages: 9. Pub Types: Journal Articles; Reports - Evaluative. Abstract: This study introduces a new method of evaluating human comprehension in the context of machine translation using a language translation program known as the FALCon... more
... Pages: 9. Pub Types: Journal Articles; Reports - Evaluative. Abstract: This study introduces a new method of evaluating human comprehension in the context of machine translation using a language translation program known as the FALCon (Forward Area Language Converter) ...
Novel diseases such as COVID-19 present challenges for identifying and assessing the impact of public health interventions due to incomplete and inaccurate data. Many infected persons may be asymptomatic, pre-symptomatic, or may choose to... more
Novel diseases such as COVID-19 present challenges for identifying and assessing the impact of public health interventions due to incomplete and inaccurate data. Many infected persons may be asymptomatic, pre-symptomatic, or may choose to not seek medical treatment. Insufficient testing and reporting standards coupled with reporting delays may also affect the accuracy of case count, recovery rate, fatalities and other key metrics used to model the disease. High error in these metrics are propagated to all aspects of public health response including estimates of daily transmission rates. We propose a method that integrates Monte Carlo simulation based on clinical studies, linear noise approximation (LNA), and Hidden Markov Models (HMMs) to estimate daily reproductive number. Results are validated against known state population behavior, such as social distancing and stay-at-home orders. The proposed approach provides improved model initial conditions resulting in reduced error and su...
Information operations on social media have recently attracted the attention of media outlets, research organizations and governments, given the proliferation of high-profile cases such as the alleged foreign interference in the 2016 US... more
Information operations on social media have recently attracted the attention of media outlets, research organizations and governments, given the proliferation of high-profile cases such as the alleged foreign interference in the 2016 US presidential election. Nation-states and multilateral organizations continue to face challenges while attempting to counter false narratives, due to lack of familiarity and experience with online environments, limited knowledge and theory of human interaction with and within these spaces, and the limitations imposed by those who own and maintain social media platforms. In particular, these attributes present unique difficulties for the identification and attribution of campaigns, tracing information flows at scale, and identifying spheres of influence. Complications include the anonymity and competing motivations of online actors, poorly understood platform dynamics, and the sparsity of information regarding message transferal across communication pl...
Based on a comprehensive study of 20 established data sets, we recommend training set sizes for any classification data set. We obtain our recommendations by systematically withholding training data and developing models through five... more
Based on a comprehensive study of 20 established data sets, we recommend training set sizes for any classification data set. We obtain our recommendations by systematically withholding training data and developing models through five different classification methods for each resulting training set. Based on these results, we construct accuracy confidence intervals for each training set size and fit the lower bounds to inverse power low learning curves. We also estimate a sufficient training set size (STSS) for each data set based on established convergence criteria. We compare STSS to the data sets' characteristics; based on identified trends, we recommend training set sizes between 3000 and 30000 data points, according to a data set's number of classes and number of features. Because obtaining and preparing training data has non-negligible costs that are proportional to data set size, these results afford the potential opportunity for substantial savings for predictive mode...
Network science has been applied in the hard and soft sciences for several decades. Founded in graph theory, network science is now an expansive approach to the analyses of complex networks of many types of objects (events, people,... more
Network science has been applied in the hard and soft sciences for several decades. Founded in graph theory, network science is now an expansive approach to the analyses of complex networks of many types of objects (events, people, locations, etc.). Researchers are finding that techniques and tools used in social network analysis have relevant application in projects that span more than just relationships between people. This paper discusses the application of network analysis in a postgraduate course on information security and risks in organisational settings as a special topic course.
: Network data provides valuable insight into understanding complex organizations by modeling relational dependence between network agents. Detecting subtle changes in organizational behavior can alert analysts before the change... more
: Network data provides valuable insight into understanding complex organizations by modeling relational dependence between network agents. Detecting subtle changes in organizational behavior can alert analysts before the change significantly impacts the larger group. Statistical process control is applied to dynamic network measures of longitudinal data to quickly detect organizational change. The performance of 10 network measures and three algorithms are evaluated on simulated data. One of the algorithms and one of the network measures are used to demonstrate change detection on the Al-Qaeda terrorist network. There is no statistically significant difference in the performance of investigated algorithms, however, the cumulative sum control chart has a built-in estimate of the actual time a change may have occurred.
With the rise in popularity of social media, these platforms present a new opportunity to reach potential job candidates for employment opportunities. The current literature lacks sufficient research on methods and best practices to... more
With the rise in popularity of social media, these platforms present a new opportunity to reach potential job candidates for employment opportunities. The current literature lacks sufficient research on methods and best practices to design and assess the efficacy of recruit and hire campaigns delivered on social media. We present a case study of a government e-recruiting effort discovered on Twitter. We collected almost 20 thousand tweets using the hashtag #FBIJobs, this included both Tweets and Retweets. Applications of descriptive statistics, topic modeling, sentiment analysis, and graph analytics identify where the campaign may miss potentially interested job candidates. We also find evidence of “popularity transfer” where co-mentions appear to increase the visibility of an accounts content in public feeds, without transferring the sentiment surrounding the more popular account. The research and findings were based on a publicly available e-recruiting campaign found online, witho...
Current supervised deep learning frameworks rely on annotated data for modeling the underlying data distribution of a given task. In particular for computer vision algorithms powered by deep learning, the quality of annotated data is the... more
Current supervised deep learning frameworks rely on annotated data for modeling the underlying data distribution of a given task. In particular for computer vision algorithms powered by deep learning, the quality of annotated data is the most critical factor in achieving the desired algorithm performance. Data annotation is, typically, a manual process where the annotator follows guidelines and operates in a best-guess manner. Labeling criteria among annotators can show discrepancies in labeling results. This may impact the algorithm inference performance. Given the popularity and widespread use of deep learning among computer vision, more and more custom datasets are needed to train neural networks to tackle different kinds of tasks. Unfortunately, there is no full understanding of the factors that affect annotated data quality, and how it translates into algorithm performance. In this paper we studied this problem for object detection and recognition.We conducted several data anno...
A new model for a random graph is proposed that can be constructed from empirical data and has some desirable properties compared to scale-free graphs [1, 2, 3] for certain applications. The newly proposed random graph maintains the same... more
A new model for a random graph is proposed that can be constructed from empirical data and has some desirable properties compared to scale-free graphs [1, 2, 3] for certain applications. The newly proposed random graph maintains the same "small-world" properties [3, 4, 5] of the scale-free graph, while allowing mathematical modeling of the relationships that make up the random graph. E-mail communication data was collected on a group of 24 mid-career Army officers in a one-year graduate program [6] to validate necessary assumptions for this new class of random graphs. Statistical distributions on graph level measures are then approximated using Monte Carlo simulation and used to detect change in a graph over time.
Humans are autonomous, intelligent, and adaptive agents. By adopting social network analysis techniques, we submit a framework for the study of dynamic networks and demonstrate the use of actor-oriented specifications in longitudinal... more
Humans are autonomous, intelligent, and adaptive agents. By adopting social network analysis techniques, we submit a framework for the study of dynamic networks and demonstrate the use of actor-oriented specifications in longitudinal networks. Through the use of a unique command and control dataset from experiments run at the US Military Academy, we illustrate the power of testing hypotheses on actor utility profiles. We frame static, covariate factors onto communication networks, and find that statistical hypothesis testing indicates edge networks truly motivate soldiers to seek information, collaborate, and modify the social network around them into more comfortable configurations of triad closure and edge reciprocity, when compared to hierarchical networks: a finding with profound implications to the study of complex, adaptive social systems.
Extracting (social) network data and conducting effective searches of large document collections requires large corpora of labelled, annotated training data from which to build and validate classifiers. As the importance and value of data... more
Extracting (social) network data and conducting effective searches of large document collections requires large corpora of labelled, annotated training data from which to build and validate classifiers. As the importance and value of data grows, industry and government organizations are investing in large teams of individuals who annotate data at unprecedented scale. While much is understood about machine learning, little attention is applied to methods and considerations for managing and leading annotation efforts. This paper presents several metrics to measure and monitor performance and quality in large annotation teams. Recommendations for leadership best practices are proposed and evaluated within the context of an annotation effort led by the authors in support of U.S. government intelligence analysis. Findings demonstrate significant improvement in annotator utilization, inter-annotator agreement, and rate of annotation through prudent management best-practices.

And 46 more