Skip to main content

    Russel Pears

    Time-series prediction has been a very well researched topic in recent studies. Some popular approaches to this problem are the traditional statistical methods e.g. multiple linear regression and moving average, and neural network with... more
    Time-series prediction has been a very well researched topic in
    recent studies. Some popular approaches to this problem are the traditional statistical methods e.g. multiple linear regression and moving average, and neural network with the Multi Layer Perceptron which has shown its supremacy in time-series prediction. In this study, we used a different approach based on
    evolving clustering algorithm with polynomial regressions to find repeating local patterns in a time-series data. To illustrate chaotic time-series data we have taken into account the use of stock price data from Indonesian stock exchange market and currency exchange rate data. In addition, we have also
    conducted a benchmark test using the Mackey Glass data set. Results showed that the algorithm offers a considerably high accuracy in time-series prediction and could also reveal repeating patterns of movement from the past.
    Research Interests:
    ABSTRACT Understanding how the underlying network structure and interconnectivity impact on the robustness of the interdependent networks is a major challenge in complex networks studies. There are some existing metrics that can be used... more
    ABSTRACT Understanding how the underlying network structure and interconnectivity impact on the robustness of the interdependent networks is a major challenge in complex networks studies. There are some existing metrics that can be used to measure network robustness. However, different metrics such as the average node degree interprets different characteristic of network topological structure, especially less metrics have been identified to effectively evaluate the cascade performance in interdependent networks. In this paper, we propose to use a combined Laplacian matrix to model the interdependent networks and their interconnectivity, and then use its algebraic connectivity metric as a measure to evaluate its cascading behavior. Moreover, we have conducted extensive comparative studies among different metrics such as the average node degree, and the proposed algebraic connectivity. We have found that the algebraic connectivity metric can describe more accurate and finer characteristics on topological structure of the interdependent networks than other metrics widely adapted by the existing research studies for evaluating the cascading performance in interdependent networks.
    The major bottleneck in searching genomic databases is the sheer size of the databases involved. A number of different solutions to the problem of aligning query sequences to genomic databases have been proposed, including the widely used... more
    The major bottleneck in searching genomic databases is the sheer size of the databases involved. A number of different solutions to the problem of aligning query sequences to genomic databases have been proposed, including the widely used BLAST and FASTA systems. While such systems are effective against traditional applications such as query alignment, they do not scale well for applications such as whole genome shotgun sequencing and all versus all comparisons of one organism against another. The latter application has quadratic time complexity in the size of the databases involved and requires a different approach to BLAST type search engines that rely on a linear scan of the database. Our approach relies on a two-stage filter to prune a significant fraction of the database prior to alignment. The filter uses the MRS index[8] as the first stage followed by a novel indexing scheme that we propose in this paper. The MRS index screens sequences that map to the same frequency vector and has been shown to produce speedups of up to 12 over systems that do not employ such an index. However, the MRS index is inadequate against sequences that are inherently different while still mapping to the same frequency vector. Our filter, based on the prime factor Indexing scheme is successful in eliminating a large fraction of such false positives that survive the MRS index. Our experiments show that at least 75% of the false positives is eliminated, resulting in speedups of up to 5 times over the MRS indexing scheme.
    The behaviour of multiple stock markets can be described within the framework of complex dynamic systems (CDS). Using a global model with the Kalman Filter we are able to extract the dynamic interaction network (DIN) of these markets. The... more
    The behaviour of multiple stock markets can be described within the framework of complex dynamic systems (CDS). Using a global model with the Kalman Filter we are able to extract the dynamic interaction network (DIN) of these markets. The model was shown to successfully capture interactions between stock markets in the long term. In this study we investigate the effectiveness
    Research Interests:
    Research Interests:
    Research Interests:
    Research Interests:
    Research Interests:
    ABSTRACT Typically association rule mining only considers positive frequent itemsets in rule generation, where rules involving only the presence of items are generated. In this paper we consider the complementary problem of negative... more
    ABSTRACT Typically association rule mining only considers positive frequent itemsets in rule generation, where rules involving only the presence of items are generated. In this paper we consider the complementary problem of negative association rule mining, which generates rules describing the absence of itemsets from transactions. We describe a new approach called MINR (Mining Interesting Negative Rules) to efficiently find all interesting negative association rules. In our approach, we only consider the presence or absence of itemsets that are strongly associated. Our approach does not require a user defined support threshold, and is based on pruning coincidental itemsets. For every individual itemset we calculate two custom thresholds based on their support: the positive and negative chance thresholds. Itemsets whose support falls above their positive chance threshold are considered positively associated whereas itemsets whose support falls below its negative chance threshold are considered negatively associated. We compared our implementation against Pearson phi correlation and we note that MINR was able to generate sets of rules which are more interesting.
    Research Interests:
    Research Interests:
    ABSTRACT
    Research Interests:
    Research Interests:
    Research Interests:
    Research in Weighted Association Rule Mining (WARM) has largely concentrated on mining traditional static transactional datasets. Whilst there have been a few attempts at researching WARM in a data stream environment, none have addressed... more
    Research in Weighted Association Rule Mining (WARM) has largely concentrated on mining traditional static transactional datasets. Whilst there have been a few attempts at researching WARM in a data stream environment, none have addressed the problem of assigning and adapting weights in the presence of concept drift, which often occurs in a data stream environment. In this research we experiment with two methods of adapting weights; firstly, a simplistic method that recomputes the entire set of weights at fixed intervals, and secondly a method that relies on a distance function that assesses the extent of change in the stream and only updates those items that have had significant change in their patterns of interaction. We show that the latter method is able to maintain good accuracy whilst being several times faster than the former.
    Research Interests:
    Association Rule Mining is an important data mining technique that has been widely used as an automatic rule generation method. While having outstanding success in many different application domains, it also has the potential to generate... more
    Association Rule Mining is an important data mining technique that has been widely used as an automatic rule generation method. While having outstanding success in many different application domains, it also has the potential to generate a vast number of rules, many of which are of little interest to the user. Weighted Association Rule Mining (WARM) overcomes this problem by assigning weights to items thus enabling interesting rules to be ranked ahead of less interesting ones and making it easier for the user to determine which rules are the most useful. Past research on WARM assumes that users have the necessary knowledge to supply item weights. In this research we relax this assumption by deriving item weights based on interactions between items. Our experimentation shows that the rule bases produced by our scheme produces more compact rule bases with a higher information content than standard rule generation methods.
    ABSTRACT Weighted Association Rule Mining (WARM) is a technique that is commonly used to overcome the well-known limitations of the classical Association Rule Mining approach. The assignment of high weights to important items enables... more
    ABSTRACT Weighted Association Rule Mining (WARM) is a technique that is commonly used to overcome the well-known limitations of the classical Association Rule Mining approach. The assignment of high weights to important items enables rules that express relationships between high weight items to be ranked ahead of rules that only feature less important items. Most previous research to weight assignment has used subjective measures to assign weights and are reliant on domain specific information. Whilst there have been a few approaches that automatically deduce weights from patterns of interaction between items, none of them take advantage of the situation where weights of only a subset of items are known in advance. We propose a model, WeightTransmitter, that interpolates the unknown weights from a known subset of weights.
    Research Interests:
    ABSTRACT Association rule mining discovers relationships among items in a transactional database. Most approaches assume that all items within a dataset have a uniform distribution with respect to support. However, this is not always the... more
    ABSTRACT Association rule mining discovers relationships among items in a transactional database. Most approaches assume that all items within a dataset have a uniform distribution with respect to support. However, this is not always the case, and weighted association rule mining WARM was introduced to provide importance to individual items. Previous approaches to the weighted association rule mining problem require users to assign weights to items. In certain cases, it is difficult to provide weights to all items within a dataset. In this paper, the authors propose a method that is based on a novel Valency model that automatically infers item weights based on interactions between items. The authors experiment shows that the weighting scheme results in rules that better capture the natural variation that occurs in a dataset when compared with a miner that does not employ a weighting scheme. The authors applied the model in a real world application to mine text from a given collection of documents. The use of item weighting enabled the authors to attach more importance to terms that are distinctive. The results demonstrate that keyword discrimination via item weighting leads to informative rules.
    Research Interests:
    ABSTRACT Although frequent pattern mining techniques have been extensively studied, the extension of their application onto data streams has been challenging. Due to data streams being continuous and unbounded, an efficient algorithm that... more
    ABSTRACT Although frequent pattern mining techniques have been extensively studied, the extension of their application onto data streams has been challenging. Due to data streams being continuous and unbounded, an efficient algorithm that avoids multiple scans of data is needed. In this paper we propose Kernel-Tree (KerTree), a single pass tree structured technique that mines frequent patterns in a data stream based on forecasting the support of current items in the future state. Unlike previous techniques that build a tree based on the support of items in the previous block, KerTree performs an estimation of item support in the next block and builds the tree based on the estimation. By building the tree on an estimated future state, KerTree effectively reduces the need to restructure for every block and thus results in a better performance and mines the complete set of frequent patterns from the stream while maintaining a compact structure.
    ABSTRACT Intricate webs of interlinked critical infrastructures such as electrical grid, telecommunication, and transportation are essential for the minimal functioning of contemporary societies and economies. Advances in Information and... more
    ABSTRACT Intricate webs of interlinked critical infrastructures such as electrical grid, telecommunication, and transportation are essential for the minimal functioning of contemporary societies and economies. Advances in Information and Communication Technology (ICT) underpin the increasing interconnectivity of these systems which created new vulnerabilities that can be affected by hardware failure, link cut, human error, natural disaster, physical-attacks and cyber-attacks. Failures of a fraction on nodes may possibly lead to failures of the dependent nodes in another network. Therefore, the main objective of this paper is to investigate the cascades phenomena caused by load shedding between two interconnected networks using Bak-Tang-Wiesenfeld sandpile modeling. We have found that, largest avalanche occurred when node degree and/interconnectivity link become dense. In addition, coupled random-regular networks tend to be more robust than the coupled Erdös-Rényi networks. However, coupled randomregular networks are vulnerable to random attack and coupled Erdös-Rényi networks are vulnerable to target attack due to the degree distribution.
    Transaction clustering has received a great deal of attention in the past few years. Its functionality extends well beyond traditional clustering algorithms which basically perform a near-neighbourhood search for locating groups of... more
    Transaction clustering has received a great deal of attention in the past few years. Its functionality extends well beyond traditional clustering algorithms which basically perform a near-neighbourhood search for locating groups of similar instances. The basic concept underlying transaction clustering stems from the concept of large items as defined by association rule mining algorithms. Clusters formed on the basis of
    ... School of Computing and Mathematical Science, Auckland University of Technology, Auckland, New Zealand {Muhammad.Usman,Russel.Pears}@aut ... Occupation Group1 Group2 Group3 Group-Others Craft-repair Machine-op-inspct Protective-serv... more
    ... School of Computing and Mathematical Science, Auckland University of Technology, Auckland, New Zealand {Muhammad.Usman,Russel.Pears}@aut ... Occupation Group1 Group2 Group3 Group-Others Craft-repair Machine-op-inspct Protective-serv Farming-fishing Transport ...
    Recent research in data mining has focussed on developing new algorithms for mining high-speed data streams. Most real-world data streams have in common that the underlying data generation mechanism changes over time, introducing... more
    Recent research in data mining has focussed on developing new algorithms for mining high-speed data streams. Most real-world data streams have in common that the underlying data generation mechanism changes over time, introducing so-called concept drift into the data. Many current algorithms incorporate a time-based window to be able to cope with drift in order to keep their model up-to-date
    Probably the single most important concern of the biotechnology industry is to improve existing biotechnology applications and tools due to the exponential increase in the size of the datasets. Improving application specific performance,... more
    Probably the single most important concern of the biotechnology industry is to improve existing biotechnology applications and tools due to the exponential increase in the size of the datasets. Improving application specific performance, pertaining to sudden and dynamic changes in the execution environment has been a widely researched problem. This research was undertaken to find application specific performance problems in
    ... Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland, New Zealand {wpj6371,russel.pears,nkasabov}@aut.ac.nz ... patient by looking for similar conditions (by comparing features of the... more
    ... Knowledge Engineering and Discovery Research Institute, Auckland University of Technology, Auckland, New Zealand {wpj6371,russel.pears,nkasabov}@aut.ac.nz ... patient by looking for similar conditions (by comparing features of the patient) from past patient data (Chan 2006 ...
    Time-series prediction has been very well researched by both the Statistical and Data Mining communities. However the multiple time-series problem of predicting simultaneous movement of a collection of time sensitive variables which are... more
    Time-series prediction has been very well researched by both the Statistical and Data Mining communities. However the multiple time-series problem of predicting simultaneous movement of a collection of time sensitive variables which are related to each other has received much less attention. Strong relationships between variables suggests that trajectories of given variables that are involved in the relationships can be

    And 20 more