Skip to main content

    Ross Sparks

    Abstract Practical applications always require an assessment of an aggregation process that is best for early detection of any outbreak of events including sales, warrantee claims or disease outbreaks. This article provides a means of... more
    Abstract Practical applications always require an assessment of an aggregation process that is best for early detection of any outbreak of events including sales, warrantee claims or disease outbreaks. This article provides a means of deciding on the level of temporal aggregation that best suits the needs of whoever aims to monitor a process or a service. This article aims to achieve a practical aggregation level for monitoring events when the in-control time between events (TBE) in the targeted process or service follows a non-homogeneous Weibull distribution. We analyze the impact of various aggregation levels on early detection of outbreaks with different magnitudes using several monitoring schemes including an adaptive exponentially weighted moving average (EWMA) plan and several simultaneous EWMA plans with differing amounts of temporal memory for TBE data. We also consider monitoring the related counting processes to address the problem of deciding on the temporal aggregation level. To the best of our knowledge, the effect of various levels of temporal aggregation on detecting outbreaks in TBEs, when they are Weibull distributed, is not studied thoroughly.
    This chapter focuses on evaluating practical approaches to monitoring the dispersion for a wide range of positively distributed and correlated bivariate data. It provides good practical advice regarding monitoring the dispersion of... more
    This chapter focuses on evaluating practical approaches to monitoring the dispersion for a wide range of positively distributed and correlated bivariate data. It provides good practical advice regarding monitoring the dispersion of variables with skewed distributions.
    To describe the first wave of hospitalisations of patients testing positive for COVID‐19 in South Australia.
    Research in network monitoring spans a large and growing number of disciplines, including mathematics, physics, computer science, and statistics. Here, the panelists discuss the advantages and disadvantages of the interdisciplinary nature... more
    Research in network monitoring spans a large and growing number of disciplines, including mathematics, physics, computer science, and statistics. Here, the panelists discuss the advantages and disadvantages of the interdisciplinary nature of the area. It is largely agreed that integrating expertise from many disciplines drives innovation in network monitoring development, but several notable barriers are discussed that limit the area’s full potential.
    BACKGROUND Face masks have been recommended or mandated at different points in time during the ongoing COVID-19 pandemic. The effectiveness of masks has been understood to be linked to their adoption. The scale of adoption of masks in the... more
    BACKGROUND Face masks have been recommended or mandated at different points in time during the ongoing COVID-19 pandemic. The effectiveness of masks has been understood to be linked to their adoption. The scale of adoption of masks in the community remains to be understood. OBJECTIVE Given the popularity of social media, we aim to use tweets (social media posts on Twitter) to analyse discussions mentioning masks and, specifically, tweets that report mask usage, as an indication of mask adoption. METHODS We used a repository of tweets from Australia and New Zealand to create a dataset of mask-related tweets posted from 2017 to 2020, specifically tweets containing five mask-related words: face mask, surgical mask, cloth mask, N95 mask and P2 mask. From this dataset, we created a manually annotated dataset of 3016 tweets labeled with mask usage. We first used topic modeling separately on each mask type to understand the context in which these mask types are mentioned. We then proposed ...
    Organisations are monitoring their Social License to Operate (SLO) with increasing regularity. SLO, the level of support organisations gain from the public, is typically assessed through surveys or focus groups, which require expensive... more
    Organisations are monitoring their Social License to Operate (SLO) with increasing regularity. SLO, the level of support organisations gain from the public, is typically assessed through surveys or focus groups, which require expensive manual efforts and yield quickly-outdated results. In this paper, we present SIRTA (Social Insight via Real-Time Text Analytics), a novel real-time text analytics system for assessing and monitoring organisations’ SLO levels by analysing the public discourse from social posts. To assess SLO levels, our insight is to extract and transform peoples’ stances towards an organisation into SLO levels. SIRTA achieves this by performing a chain of three text classification tasks, where it identifies task-relevant social posts, discovers key SLO risks discussed in the posts, and infers stances specific to the SLO risks. We leverage recent language understanding techniques (e.g., BERT) for building our classifiers. To monitor SLO levels over time, SIRTA employs ...
    We address the issue of having a limited number of annotations for stance classification in a new domain, by adapting out-of-domain classifiers with domain adaptation. Existing approaches often align different domains in a single, global... more
    We address the issue of having a limited number of annotations for stance classification in a new domain, by adapting out-of-domain classifiers with domain adaptation. Existing approaches often align different domains in a single, global feature space (or view), which may fail to fully capture the richness of the languages used for expressing stances, leading to reduced adaptability on stance data. In this paper, we identify two major types of stance expressions that are linguistically distinct, and we propose a tailored dual-view adaptation network (DAN) to adapt these expressions across domains. The proposed model first learns a separate view for domain transfer in each expression channel and then selects the best adapted parts of both views for optimal transfer. We find that the learned view features can be more easily aligned and more stance-discriminative in either or both views, leading to more transferable overall features after combining the views. Results from extensive exp...
    Early warning of disease outbreaks is paramount for health jurisdictions. The objective of the present study was to develop syndromic surveillance monitoring plans from routinely collected ED data with application to detecting disease... more
    Early warning of disease outbreaks is paramount for health jurisdictions. The objective of the present study was to develop syndromic surveillance monitoring plans from routinely collected ED data with application to detecting disease outbreaks.
    Abstract In this article, the panelists broadly discuss the definition of network monitoring, and how it may be similar to or different from network surveillance and network change-point detection. The discussion uncovers ambiguity and... more
    Abstract In this article, the panelists broadly discuss the definition of network monitoring, and how it may be similar to or different from network surveillance and network change-point detection. The discussion uncovers ambiguity and contradictions associated with these terms and we argue that this lack of clarity is detrimental to the field. The panelists also describe existing and emerging applications of network monitoring, which serves to illustrate the wide applicability of the tools and research associated with the field.
    ABSTRACT Tweets offer us early information on initial stages of diseases, since people often tweet the early symptoms of feeling unwell prior to presenting to an emergency department if their symptoms become more severe. Even when people... more
    ABSTRACT Tweets offer us early information on initial stages of diseases, since people often tweet the early symptoms of feeling unwell prior to presenting to an emergency department if their symptoms become more severe. Even when people do present at an emergency department, it generally takes over 24 hours for their information to be collected, diagnosed and transferred for analysis at a centralized location. The advantage of utilizing tweets is that they offer information on syndromes in real-time. This paper investigates the value of carrying out multivariate syndromic surveillance using daily counts of keywords. The dynamic bi-plot is used to detect unexpected changes in the daily counts. These methods can be easily generalized to hourly tweet syndromic counts. By following Twitter users that suffer certain symptoms over time we can better understand the burden of these health issues and better understand emerging health issues. Monitoring people who present with symptoms but are just not sick enough to go to emergency departments provides us with additional information not gathered by emergency departments.
    The vital signs of chronically ill patients are monitored daily. The record flags when a specific vital sign is stable or when it trends into dangerous territory. Patients also self-assess their current state of well-being, i.e. whether... more
    The vital signs of chronically ill patients are monitored daily. The record flags when a specific vital sign is stable or when it trends into dangerous territory. Patients also self-assess their current state of well-being, i.e. whether they are feeling worse than usual, neither unwell nor very well compared to usual, or are feeling better than usual. This paper examines whether past vital sign data can be used to forecast how well a patient is going to feel the next day. Reliable forecasting of a chronically sick patient’s likely state of health would be useful in regulating the care provided by a community nurse, scheduling care when the patient needs it most. The hypothesis is that the vital signs indicate a trend before a person feels unwell and, therefore, are lead indicators of a patient going to feel unwell. Time series and classification or regression tree methods are used to simplify the process of observing multiple measurements such as body temperature, heart rate, etc., ...
    Abstract Monitoring to detect changes in the communication levels within a social network is a new area of research. There is little available on this topic in the current literature. This paper proposes that existing spatio-temporal... more
    Abstract Monitoring to detect changes in the communication levels within a social network is a new area of research. There is little available on this topic in the current literature. This paper proposes that existing spatio-temporal surveillance technology could be used as a starting point for monitoring these changes. A number of challenges were encountered. The first involved the ordering of individuals into ‘neighbours’ so that the spatio-temporal surveillance technology and approaches could be applied. The second difficulty encountered was the computational effort involved in monitoring large numbers of individuals. In order to address this computational issue, cell communication level aggregation based on the order statistics of standardized cell communication count departures from expected was attempted. Simulations were used to compare two computationally feasible options.
    The scan statistic sets the benchmark for spatio-temporal surveillance meth-ods with its popularity. In its simplest form it scans the target area and time to find regions with disease count higher than expected. If the shape and size of... more
    The scan statistic sets the benchmark for spatio-temporal surveillance meth-ods with its popularity. In its simplest form it scans the target area and time to find regions with disease count higher than expected. If the shape and size of the disease outbreaks are known, then to detect it sufficiently early the scan statistic can design its search area to be efficient for this shape and size. A plan that is efficient at detecting a range of disease outbreak shapes and sizes is important because these vary from one outbreak to the next and are generally never known in advance. This paper offers a forward selection scan statistic that reduces the computational effort on the usual single window scan plan, while still offering greater flexibility in signalling outbreaks of varying shapes. The approach starts by dividing the target geographical regions into a lattice. Secondly it smooths the time series of lattice cell counts using multivariate exponential weighted mov-ing averages. Third...
    Low detection limits are common in measure environmental variables. Building models using data containing low or high detection limits without adjusting for the censoring produces biased models. This paper offers approaches to estimate an... more
    Low detection limits are common in measure environmental variables. Building models using data containing low or high detection limits without adjusting for the censoring produces biased models. This paper offers approaches to estimate an inverse Gaussian distribution when some of the data used are censored because of low or high detection limits. Adjustments for the censoring can be made if there is between 2% and 20% censoring using either the EM algorithm or MCMC. This paper compares these approaches.
    Spatio-temporal surveillance methods for detecting outbreaks are common with the SCAN statistic setting the benchmark. If the shape and size of the outbreaks are known, then the SCAN statistic can be trained to efficiently detect these,... more
    Spatio-temporal surveillance methods for detecting outbreaks are common with the SCAN statistic setting the benchmark. If the shape and size of the outbreaks are known, then the SCAN statistic can be trained to efficiently detect these, however this is seldom the case. Therefore devising a plan that is efficient at detecting a range of outbreaks that vary in size and shape is important in practical applications. So this paper introduces a method called EWMA Surveillance Trees that uses a binary recursive partitioning approach to locate and detect outbreaks. This approach is explained and then its performance is compared to that of the SCAN statistic in a series of simulation studies. While the SCAN statistic is shown to remain the most effective at detecting outbreaks of a known shape and size, the EWMA Surveillance Trees are shown to be more robust. The method is also applied to an example of actual data from motor vehicle crashes in an area of Sydney Australia from 2000 to 2004 in...
    This article examines the performance of the adaptive cumulative sum (CUSUM) plans and three simultaneous CUSUM statistics with different levels of temporal memory for monitoring changes in dispers...
    In many manufacturing processes the operators have observed the process closely for decades and therefore understand what interventions drive the process back on target. The researchers in the manufacturing companies have carried out... more
    In many manufacturing processes the operators have observed the process closely for decades and therefore understand what interventions drive the process back on target. The researchers in the manufacturing companies have carried out experiments to establish robust operating environments, and therefore interventions that bring the process back on target when the process quality starts deteriorating are mostly routine. Manufactures also operate the process in conditions that minimize the variation in their manufactured outputs thus limiting the need for interventions. In other words, manufacturing companies are operating in a well-controlled and understood environment, where the consequences of interventions are fairly certain. In this case the three zone approach outlined in Woodall and Faltin (2019) is appropriate. However this is very much a manufacturing view of monitoring. For applications concerning the monitoring of social media communications, we do not have a capability measure, and the process is not well-controlled. It is unclear if, how and when to respond to out-of-control events on adverse changes in the nature of the messages in social media. There may be the three zones for interventions, but this becomes less important than other considerations because the consequences of any intervention are less clear in social media. The challenges of intervening in social media applications are considerable. The examples presented below are what we are currently working on at CSIRO. These are useful for outlining the challenges, but these may not cover all issues that are important in social media monitoring:
    This article (Zwetsloot and Woodall 2019) is a welcome thorough investigation of sampling and aggregation strategies for the basic control charts. This article has raised the profile of sampling an...
    This paper proposes a simple distribution-free control chart for monitoring shifts in location when the process distribution is continuous but unknown. In particular, we are concerned with big data applications where there are sufficient... more
    This paper proposes a simple distribution-free control chart for monitoring shifts in location when the process distribution is continuous but unknown. In particular, we are concerned with big data applications where there are sufficient in-control data that can be used to specify certain quantiles of interest which, in turn, are used to assess whether the new, incoming data to be monitored are in control. The distribution-free chart is shown to lose very little power against the Shewhart charts designed for normally distributed data. The proposed charts offer a practical and robust alternative to the classical Shewhart charts which assume normality, particularly when monitoring quantiles and the data distribution is skewed. The effect of the size of the reference sample is examined on the assumption that the quantiles are known. Conclusions and recommendations are offered.
    Abstract This article focuses on monitor plans aimed at the early detection of the increase in the frequency of events. The literature recommends either monitoring the time between events (TBE) if events are rare or counting the number of... more
    Abstract This article focuses on monitor plans aimed at the early detection of the increase in the frequency of events. The literature recommends either monitoring the time between events (TBE) if events are rare or counting the number of events per unit non-overlapping time intervals otherwise. Some authors advocate using the Bernoulli model for rare events, applying presence or absence of events within non-overlapping and exhaustive time intervals. This Bernoulli model does improve the real-time monitoring assessment of these events compared to counting events over a larger interval, making them less rare. However this approach became inefficient if more than one event starts occurring within the intervals. Monitoring TBE is the real-time option for outbreak detection, because outbreak information is accumulated when an event occurs. This is preferred to waiting for the end of a period to count events. If the TBE reduces significantly, then the incidence of these events increases significantly. This article explores this TBE option relative to using the monitoring of counts when the TBEs are either Exponentially, Gamma or Weibull distributed for moderately low count scenarios. The article will discuss and compare the approaches of using an Exponentially Weighted Moving Average (EWMA) statistic for the TBEs to the EWMA of counts. Several robust options will be considered when the future change in event frequency is unknown. Our goal is to have a robust monitoring plan which is able to efficiently detect many different levels of shifts. These robust plans are compared to the more traditional event monitoring plans for both small and large changes in the event frequency.
    Telemonitoring is becoming increasingly important for the management of patients with chronic conditions, especially in countries with large distances such as Australia. However, despite large national investments in health information... more
    Telemonitoring is becoming increasingly important for the management of patients with chronic conditions, especially in countries with large distances such as Australia. However, despite large national investments in health information technology, little policy work has been undertaken in Australia in deploying telehealth in the home as a solution to the increasing demands and costs of managing chronic disease. The objective of this trial was to evaluate the impact of introducing at-home telemonitoring to patients living with chronic conditions on health care expenditure, number of admissions to hospital, and length of stay (LOS). A before and after control intervention analysis model was adopted whereby at each location patients were selected from a list of eligible patients living with a range of chronic conditions. Each test patient was case matched with at least one control patient. Test patients were supplied with a telehealth vital signs monitor and were remotely managed by a ...
    Social networks are increasingly attracting the attention of academic and industry researchers. Monitoring communications within clusters of suspicious individuals is important in flagging potential planning activities for terrorism... more
    Social networks are increasingly attracting the attention of academic and industry researchers. Monitoring communications within clusters of suspicious individuals is important in flagging potential planning activities for terrorism events or crime. Governments are interested in methodology that can forewarn them of future terrorist attacks or social uprisings in disenchanted groups of their populations. This paper will examine a range of approaches that could be used to monitoring communication levels between suspicious individuals. The methodology could be scaled up to either understand changes in social structure for larger groups of people, to help manage crises such are bushfires in densely populated areas, or early detection of disease outbreaks using surveillance methods. The methodology could be extended into these other application domains that are less invasive of individuals’ privacy.
    The conventional cumulative sum (CUSUM) with k = 0.5 is often used as the default CUSUM statistic when future shifts are unknown. In this paper, CUSUM procedures are designed to be efficient at signalling a range of future expected but... more
    The conventional cumulative sum (CUSUM) with k = 0.5 is often used as the default CUSUM statistic when future shifts are unknown. In this paper, CUSUM procedures are designed to be efficient at signalling a range of future expected but unknown location shifts. Two approaches are advocated. The first uses three simultaneous conventional CUSUM statistics with different resetting boundaries. This results in a procedure that has, on average, several levels of memory, and thus signals a broader range of location shifts more efficiently than the conventional CUSUM with k = 0.5. The second uses an adaptive CUSUM statistic that continually adjusts its form to be efficient for signalling a one-step-ahead forecast in deviation from its target value. Average run length (ARL) is used to compare the relative performance of procedures. Several applications are used to illustrate procedures.
    This paper discusses the opportunities big data offers decision makers from a statistical perspective. It calls for a multidisciplinary approach by computer scientists, statisticians and domain experts to providing useful big data... more
    This paper discusses the opportunities big data offers decision makers from a statistical perspective. It calls for a multidisciplinary approach by computer scientists, statisticians and domain experts to providing useful big data solutions. Big data calls for us to think in new ways and communicate effectively within such teams. We make a plea for linking data-driven and model-driven analytics, and stress the role of cause-effect models for knowledge enhancement in big data analytics. We remember Kant’s statement that theory without data is blind, but facts without theories are meaningless. A case is made for each discipline to define the contribution they offer to big data solutions so that effective teams can be formed to improve inductions. Although new approaches are needed much of the past learning related to small data are valuable in providing big data solutions. Here we have in mind the long-term academic training and field experience of statisticians concerning reduction of dataset volumes, sampling in a more general setting, data depreciation and quality, model design and validation, visualisation, etc. We expect that combining the present approaches will give incentives for increasing the chances for “real big solutions”.
    The monitoring of vital signs for the management of chronic conditions at home have been demonstrated in numerous trials to have a positive impact on the patient's healthcare... more
    The monitoring of vital signs for the management of chronic conditions at home have been demonstrated in numerous trials to have a positive impact on the patient's healthcare outcomes as well as to reduce hospitalization and improve quality of life. The CSIRO has just completed a two year clinical trial designed to evaluate a large number of qualitative and quantitative outcomes of at home telemonitoring. As preliminary data demonstrates that before and after data is not stationary, a model based BACI (Before-After-Control-Impact) design frequently used in environmental and agricultural yield studies, but rarely in clinical trials, has been developed to model the effects of the intervention on healthcare outcomes over time as well as possible secondary effects associate with environmental and seasonal conditions.
    Research Interests:

    And 80 more