Skip to main content

Wee Hong Ong

Open-Set Recognition (OSR) has been emphasizing its capability to reject unknown classes and maintain closed-set performance simultaneously. The primary objective of OSR is to minimize the risk of unknown classes being predicted as one of... more
Open-Set Recognition (OSR) has been emphasizing its capability to reject unknown classes and maintain closed-set performance simultaneously. The primary objective of OSR is to minimize the risk of unknown classes being predicted as one of the known classes. The OSR assumes that unknown classes are present during testing and identifies only one distribution of the unknowns. On the other hand, recognizing unknowns in the domain of interest and outside of the domain of interest will benefit future learning endeavors. These rejected unknown samples in the domain of interest could be leveraged for the further development of deep learning models. We introduce and formalize the open domain-specific space risk to mitigate the recognition of two unknown distributions. To achieve the solution, we propose an initial baseline using quad-channel self-attention reciprocal point learning to mitigate open-space risk and autoencoder to mitigate open domain-specific space risk. We utilize the knowled...
Most deep clustering methods despite providing complex networks to learn better from data, use a shallow clustering method. These methods have difficulty in finding good clusters due to the lack of ability to handle between local search... more
Most deep clustering methods despite providing complex networks to learn better from data, use a shallow clustering method. These methods have difficulty in finding good clusters due to the lack of ability to handle between local search and global search to prevent premature convergence. In other words, they do not consider different aspects of the search and it causes them to get stuck in the local optimum. In addition, the majority of existing deep clustering approaches perform clustering with the knowledge of the number of clusters, which is not practical in most real scenarios where such information is not available. To address these problems, this paper presents a novel automatic deep sparse clustering approach based on an evolutionary algorithm called Multi-Trial Vectorbased Differential Evolution (MTDE). Sparse auto-encoder is first applied to extract embedded features. Manifold learning is then adopted to obtain representation and extract the spatial structure of features. A...
Most research in human activity recognition is supervised, while non-supervised approaches are not completely unsupervised. Moreover, These methods cannot be used in real-time applications due to high calculations. In this paper, we... more
Most research in human activity recognition is supervised, while non-supervised approaches are not completely unsupervised. Moreover, These methods cannot be used in real-time applications due to high calculations. In this paper, we provide a novel flexible multi-objective particle swarm optimization clustering method based on game theory (FMOPG) to discover human activities fully unsupervised. Unlike conventional clustering methods that estimate the number of clusters and are very time-consuming and inaccurate, an incremental technique is introduced which makes the proposed method flexible in dealing with the number of clusters and improves the speed of clustering. By adopting this technique, clusters with a better connectedness and good separation from other clusters are gradually selected. Updating of particles’ velocity is modified by adopting the concept of mean-shift vector to improve the convergence speed of PSO in achieving the best solution and dealing with non-spherical sh...
Despite many advances in human activity recognition, most existing works are conducted with supervision. Supervised methods rely on labeled training data. However, obtaining labeled data is difficult, costly, and time-consuming. In this... more
Despite many advances in human activity recognition, most existing works are conducted with supervision. Supervised methods rely on labeled training data. However, obtaining labeled data is difficult, costly, and time-consuming. In this paper, we introduce an automatic multi-objective particle swarm optimization clustering based on Gaussian mutation and game theory (MOPGMGT) to tackle the problem of human activity discovery fully unsupervised and map the multi-objective clustering problem to game theory to get the best optimal solution. The proposed algorithm does not require any prior knowledge of the number of activities to be discovered and can find the optimal number. Multi-objective optimization problems typically cannot have a single optimal solution. To solve this issue, Nash Equilibrium (NE) is applied to the pareto front to choose the best solution. NE does not just look for the best solution, but tries to optimize the final solution by considering the effect of choosing ea...
Many algorithms have been proposed to solve the clustering problem. However, most of them lack a proper strategy to maintain a good balance between exploration and exploitation to prevent premature convergence. Multi-Trial Vector-based... more
Many algorithms have been proposed to solve the clustering problem. However, most of them lack a proper strategy to maintain a good balance between exploration and exploitation to prevent premature convergence. Multi-Trial Vector-based Differential Evolution (MTDE) is an improved differential evolution (DE) algorithm that is done by combining three strategies and distributing the population between these strategies to avoid getting stuck at a local optimum. In addition, it records inferior solutions to share information about visited regions with solutions of the next generations. In this paper, an Improved version of the Multi-Trial Vector-based Differential Evolution (IMTDE) algorithm is proposed and adapted for clustering data. The purpose here is to enhance the balance between the exploration and exploitation mechanisms in MTDE by employing Gaussian crossover and modifying the sub-population distribution between the strategies. To evaluate the performance of the proposed cluster...
Human activity discovery aims to distinguish the activities performed by humans, without any prior information of what defines each activity. Most methods presented in human activity recognition are supervised, where there are labeled... more
Human activity discovery aims to distinguish the activities performed by humans, without any prior information of what defines each activity. Most methods presented in human activity recognition are supervised, where there are labeled inputs to train the system. In reality, it is difficult to label data because of its huge volume and the variety of activities performed by humans. In this paper, a novel unsupervised approach is proposed to perform human activity discovery in 3D skeleton sequences. First, important frames are selected based on kinetic energy. Next, the displacement of joints, set of statistical, angles, and orientation features are extracted to represent the activities information. Since not all extracted features have useful information, the dimension of features is reduced using PCA. Most human activity discovery proposed are not fully unsupervised. They use pre-segmented videos before categorizing activities. To deal with this, we used the fragmented sliding time w...
As the internet has become more accessible, the use of Internet of Things (IoT) systems is increasing. An IoT system can be accessed either by directly connecting to the network configured with external access and appropriate port... more
As the internet has become more accessible, the use of Internet of Things (IoT) systems is increasing. An IoT system can be accessed either by directly connecting to the network configured with external access and appropriate port forwarding; or through a cloud server. Direct access through network may not be possible for networks with restriction on external access, and requires technical know-how to configure the port forward. The use of cloud server is convenient that the users simply have to sign up an account at and use user friendly application to setup the system. This approach however requires the manufacturer to main the cloud server, has the concern of having users' data accessible to the owner of the cloud server, and does not allow interoperability of devices and systems across different manufacturers. In this paper, we have proposed the use of publicly available cloud storage services to provide the accessibility of the IoT devices without relying on a specific cloud server, and without requiring the manufacturer to maintain their own cloud server. The proposed approach will allow interoperability between devices or systems from different manufacturers. This paper describes the concept, implementation and evaluation of the proposed cloud storage based IoT system. The proposed IoT system has been implemented to work with Google Drive, Dropbox and Microsoft OneDrive. The performance of the proposed system has been evaluated against dedicated IoT cloud servers including Microsoft Azure IoT, Google IoT, CloudMQTT (CMQTT) and Eclipse IoT. The results show that can be effectively used in systems that are not time critical.
Human activity detection is usually achieved through supervised learning where a model is learned from given samples of each activity. With this approach, it is difficult for intelligent systems to automatically discover and learn new... more
Human activity detection is usually achieved through supervised learning where a model is learned from given samples of each activity. With this approach, it is difficult for intelligent systems to automatically discover and learn new activities. We attempt to distinguish human activity using unsupervised learning. The approach relies on the fact that given appropriate features, the distinction between different activities can be observed from simple distance measurement. In this paper, we present our investigation to extract features from the coordinates of human joint positions based on human range of movement and the results of tests performed to check their effectiveness to distinguish sixteen (16) example activities are reported. Simple unsupervised learning, K-means clustering was used to evaluate the effectiveness of the features. The results indicate that the features based on range of movement significantly improved clustering performance.
Research Interests:
Face detection and tracking algorithms mainly suffer from low accuracy, slow processing speed, and poor robustness when meet with real-time setup. The problem becomes crucial in real-time situations such as in human robot interactions... more
Face detection and tracking algorithms mainly suffer from low accuracy, slow processing speed, and poor robustness when meet with real-time setup. The problem becomes crucial in real-time situations such as in human robot interactions (HRI) or video analysis. A margin-based region of interest (ROI) hybrid approach that combines Haar cascade and template matching for face detection and tracking is proposed in this paper to improve the detection accuracy and processing speed. To speed up the processing time, region of interests (ROIs) with fixed and dynamic margin concepts are used. A dataset comprising of ten RGB video streams of fifteen seconds have been created from real-life videos containing a person in lecture delivering environment. In each video, there exists person’s movement, face turning and camera movements. An accuracy of 97.96% with processing time of 10.76 ms per frame has been achieved. The proposed algorithm can detect and track faces in sideway orientation apart from frontal face. The proposed approach can process the video streams at the speed above 90 frames per second (FPS). The proposed approach reduces processing time by ten times and with a boost to accuracy in comparison to the conventional full frame scanning techniques.
Real-time face detection and tracking systems suffer from low accuracy and slow processing speed that lead to poor robustness. This problem is vital in real-time setups including human robot interactions (HRI) and video analysis systems.... more
Real-time face detection and tracking systems suffer from low accuracy and slow processing speed that lead to poor robustness. This problem is vital in real-time setups including human robot interactions (HRI) and video analysis systems. This paper presents margin-based region of interest (MROI) approach to speed up the processing time. Further a hybrid approach is also presented that combines Multi-task Convolutional Neural Networks (MTCNN) with template matching to improve face detection accuracy. The MROI approach which is responsible to speed up the processing time is presented in two variants with fixed and dynamic margin concepts. Dataset used in this work comprises of twenty RGB video files. Each video file is fifteen seconds long and been created from real-life videos containing a person in lecture delivering environment. Each video file contains a person in which the person moves, turns head and the camera also moves. The highest face detection and tracking accuracy achieved in this paper is 99.31% with a processing time of 14.93 milliseconds per frame. The proposed hybrid algorithm significantly improves the ability to detect and track faces in sideway orientation apart from frontal face. The proposed algorithm has the ability to process above 65 frames per second (FPS). The system presented has increased FPS processing ability by more than 400% as well as given boost to the accuracy if compared to the conventional MTCNN full frame scanning techniques.
This paper presents an implementation of autonomous navigation functionality based on Robot Operating System (ROS) on a wheeled differential drive mobile platform called Eddie robot. ROS is a framework that contains many reusable software... more
This paper presents an implementation of autonomous navigation functionality based on Robot Operating System (ROS) on a wheeled differential drive mobile platform called Eddie robot. ROS is a framework that contains many reusable software stacks as well as visualization and debugging tools that provides an ideal environment for any robotic project development. The main contribution of this paper is the description of the customized hardware and software system setup of Eddie robot to work with an autonomous navigation system in ROS called Navigation Stack and to implement one application use case for autonomous navigation. For this paper, photo taking is chosen to demonstrate a use case of the mobile robot.
Herbarium contains treasures of millions of specimens which have been preserved for several years for scientific studies. To speed up more scientific discoveries, a digitization of these specimens is currently on going to facilitate easy... more
Herbarium contains treasures of millions of specimens which have been preserved for several years for scientific studies. To speed up more scientific discoveries, a digitization of these specimens is currently on going to facilitate easy access and sharing of its data to a wider scientific community. Online digital repositories such as IDigBio and GBIF have already accumulated millions of specimen images yet to be explored. This presents a perfect time to automate and speed up more novel discoveries using machine learning and computer vision. In this study, a thorough analysis and comparison of more than 50 peer-reviewed studies which focus on application of computer vision and machine learning techniques to digitized herbarium specimen have been examined. The study categorizes different techniques and applications which have been commonly used and it also highlights existing challenges together with their possible solutions. It is our hope that the outcome of this study will serve ...
Automated identification of herbarium species is of great interest as quite a number of these collections are still unidentified while others need to be updated following recent taxonomic knowledge. One challenging task in automated... more
Automated identification of herbarium species is of great interest as quite a number of these collections are still unidentified while others need to be updated following recent taxonomic knowledge. One challenging task in automated identification process of these species is the existence of visual noise such as plant information labels, color codes and other scientific annotations which are mostly placed at different locations on the herbarium mounting sheet. This kind of noise needs to be removed before applying different species identification models as it can significantly affect the models’ performance. In this work we propose the use of deep learning semantic segmentation model as a method for removing the background noise from herbarium images. Two different semantic segmentation models, namely DeepLab version three plus (DeepLabv3+) and the Full- Resolution Residual Networks (FRNN-A), were applied and evaluated in this study. The results indicate that FRNN-A performed slight...
The identification of plant species is fundamental for effective study and management of biodiversity. For automated plant species classification, a combination of leaf features like shapes, texture and color are commonly used. However,... more
The identification of plant species is fundamental for effective study and management of biodiversity. For automated plant species classification, a combination of leaf features like shapes, texture and color are commonly used. However, in herbariums, the samples collected for each species are often limited and during preservation step some of the feature details disappear making automated classification a challenging task. In this study, we aimed at applying machine learning techniques in automating herbarium species identification from leaf traits extracted from images of the families Annonaceae, Euphorbiaceae and Dipterocarpaceae. Furthermore, we investigated the application of Synthetic Minority Over-sampling Technique (SMOTE) in improving classifier performance on the imbalance datasets. Three machine learning techniques namely Linear Discriminant Analysis (LDA), Random Forest (RF) and Support Vector Machine (SVM) were applied with/without SMOTE. For Annonaceae species, the best accuracy was 56% by LDA after applying SMOTE. For Euphorbiaceae, the best accuracy was 79% by SVM without SMOTE. For inter-species classification between Annonaceae and Euphorbiaceae, the best accuracy of 63% was achieved by LDA without SMOTE. An accuracy of 85% was achieved by LDA for Dipterocarpaceae species while 91% accuracy was obtained by both RF and SVM for inter-family classification between the two balanced datasets of Annonaceae and Euphorbiaceae. The results of this study show the feasibility of using extracted traits for building accurate species identification models for Family Dipterocarpaceae and Euphorbiaceae, however, the features used did not yield good results for Annonaceae family. Furthermore, there was no significant improvement when SMOTE technique was applied.
Automated identification of herbarium species is of great interest as quite a number of these collections are still unidentified while others need to be updated following recent taxonomic knowledge. One challenging task in automated... more
Automated identification of herbarium species is of great interest as quite a number of these collections are still unidentified while others need to be updated following recent taxonomic knowledge. One challenging task in automated identification process of these species is the existence of visual noise such as plant information labels, color codes and other scientific annotations which are mostly placed at different locations on the herbarium mounting sheet. This kind of noise needs to be removed before applying different species identification models as it can significantly affect the models’ performance. In this work we propose the use of deep learning semantic segmentation model as a method for removing the background noise from herbarium images. Two different semantic segmentation models, namely DeepLab version three plus (DeepLabv3+) and the Full- Resolution Residual Networks (FRNN-A), were applied and evaluated in this study. The results indicate that FRNN-A performed slightly better with a mean Intersection of Union (IoU) of 99.2% compared to 98.1% mean IoU attained by DeepLabv3+ model on the test set. The pixel -wise accuracy for two classes (herbarium specimen and background) was found to be 99.5% and 99.7%, respectively using FRNN-A model while the DeepLabv3+ was able to segment herbarium specimen and the rest of the background with a pixel-wise accuracy of 98.4% and 99.6%, respectively. This work evidently suggests that deep learning semantic segmentation could be successfully applied as a pre-processing step in removing visual noise existing in herbarium images before applying different classification models.
ABSTRACT
Research Interests:
ABSTRACT
Human activity recognition is an important ability in any system that supports human in performing their daily activities. However, current supervised approach in human activity recognition is difficult to be deployed in the natural human... more
Human activity recognition is an important ability in any system that supports human in performing their daily activities. However, current supervised approach in human activity recognition is difficult to be deployed in the natural human living environment where labeled observations are scarce. In this paper, we demonstrate the use of K-means clustering and simple template models to achieve human activity detection and recognition in an unsupervised manner. The features used are extracted from the skeleton data obtained from an inexpensive RGBD (RGB-Depth) sensor. Our results show an average detection performance of 80.4% precision and 83.8% recall. The availability of an unsupervised approach in human activity recognition will make possible the wide adoption of human activity recognition in the natural human living environment.
ABSTRACT
This paper describes the proposed methodology, data used and the results of our participation in the ChallengeTrack 2 (Expr Challenge Track) of the Affective Behavior Analysis in-the-wild (ABAW) Competition 2020. In this competition, we... more
This paper describes the proposed methodology, data used and the results of our participation in the ChallengeTrack 2 (Expr Challenge Track) of the Affective Behavior Analysis in-the-wild (ABAW) Competition 2020. In this competition, we have used a proposed deep convolutional neural network (CNN) model to perform automatic facial expression recognition (AFER) on the given dataset. Our proposed model has achieved an accuracy of 50.77% and an F1 score of 29.16% on the validation set.
The ability to understand what humans are doing is crucial for any intelligent system to autonomously support human daily activities. Technologies to enable such ability, however, are still undeveloped due to the many challenges in human... more
The ability to understand what humans are doing is crucial for any intelligent system to autonomously support human daily activities. Technologies to enable such ability, however, are still undeveloped due to the many challenges in human activity analysis. Among them are the difficulties in extracting human poses and motions from raw sensor data, either recorded from visual sensor or wearable sensor and the need to recognize activities not seen before using unsupervised learning. Furthermore, human activity analysis usually requires expensive sensors or sensing environment. With the availability of low-cost RGBD (RGB-depth) sensor, the new form of data can provide human posture data with high degree of confidence. In this paper, we present our approach to extract features directly from such data (joint positions) based on human range of movement and the results of tests performed to check their effectiveness to distinguish sixteen (16) example activities are reported. Simple unsuper...
With the increase in the digitization efforts of herbarium collections worldwide, dataset repositories such as iDigBio and GBIF now have hundreds of thousands of herbarium sheet images ready for exploration. Although this serves as a new... more
With the increase in the digitization efforts of herbarium collections worldwide, dataset repositories such as iDigBio and GBIF now have hundreds of thousands of herbarium sheet images ready for exploration. Although this serves as a new source of plant leaves data, herbarium datasets have an inherent challenge to deal with the sheets containing other non-plant objects such as color charts, barcodes, and labels. Even for the plant part itself, a combination of different overlapping, damaged, and intact individual leaves exist together with other plant organs such as stems and fruits, which increases the complexity of leaf trait extraction and analysis. Focusing on segmentation and trait extraction on individual intact herbarium leaves, this study proposes a pipeline consisting of deep learning semantic segmentation model (DeepLabv3+), connected component analysis, and a single-leaf classifier trained on binary images to automate the extraction of an intact individual leaf with pheno...
Automated identification of herbarium species is of great interest as quite a number of these collections are still unidentified while others need to be updated following recent taxonomic knowledge. One challenging task in automated... more
Automated identification of herbarium species is of great interest as quite a number of these collections are still unidentified while others need to be updated following recent taxonomic knowledge. One challenging task in automated identification process of these species is the existence of visual noise such as plant information labels, color codes and other scientific annotations which are mostly placed at different locations on the herbarium mounting sheet. This kind of noise needs to be removed before applying different species identification models as it can significantly affect the models’ performance. In this work we propose the use of deep learning semantic segmentation model as a method for removing the background noise from herbarium images. Two different semantic segmentation models, namely DeepLab version three plus (DeepLabv3+) and the Full- Resolution Residual Networks (FRNN-A), were applied and evaluated in this study. The results indicate that FRNN-A performed slight...
The identification of plant species is fundamental for effective study and management of biodiversity. For automated plant species classification, a combination of leaf features like shapes, texture and color are commonly used. However,... more
The identification of plant species is fundamental for effective study and management of biodiversity. For automated plant species classification, a combination of leaf features like shapes, texture and color are commonly used. However, in herbariums, the samples collected for each species are often limited and during preservation step some of the feature details disappear making auto- mated classification a challenging task. In this study, we aimed at applying ma- chine learning techniques in automating herbarium species identification from leaf traits extracted from images of the families Annonaceae, Euphorbiaceae and Dipterocarpaceae. Furthermore, we investigated the application of Synthetic Mi- nority Over-sampling Technique (SMOTE) in improving classifier performance on the imbalance datasets. Three machine learning techniques namely Linear Discriminant Analysis (LDA), Random Forest (RF) and Support Vector Machine (SVM) were applied with/without SMOTE. For Annonaceae species, t...
Automated facial expression recognition (AFER) has become an important research area with several computer vision (CV) applications. A robust AFER system requires sufficient good quality training and testing data for development and... more
Automated facial expression recognition (AFER) has become an important research area with several computer vision (CV) applications. A robust AFER system requires sufficient good quality training and testing data for development and evaluation of a robust AFER model. There exist a number of AFER datasets and an increasing number of research works in AFER. However, research works in AFER have not matured to a stage that there are openly available platforms or toolsets to implement the pipeline of AFER system development. New comers to the field are faced with various challenges such as 1) images in the datasets are messy or with low resolutions; 2) the data are not organized into separate training and testing data for fair evaluation; 3) majority of the datasets are very small leading to insufficiency for training a model; 4) some datasets do not provide important facial features, 5) it is unclear which dataset to start with, and 6) no development framework and methodologies to syste...
This paper investigates face detection and tracking system, addressing the problem of low accuracy and slow processing speed for real time systems. A margin-based region of interest (MROI) approach has been presented in this paper to... more
This paper investigates face detection and tracking system, addressing the problem of low accuracy and slow processing speed for real time systems. A margin-based region of interest (MROI) approach has been presented in this paper to speed up the processing time. The proposed MROI algorithm is presented with fixed and dynamic margin concepts. A hybrid system is proposed to boost the accuracy and overcome the deficiency of the main detection algorithm. This approach consist of two routines (i.e. main and escape routines). Three algorithms are used (i.e. Haar cascade, Joint cascade, and Multi-task Convolutional Neural Networks (MTCNN)) as main routines. To evaluate the effectiveness of the proposed hybrid approach and to improve different algorithms, template matching (TM) algorithm is used as the escape routine. Dataset used includes twenty RGB videos, containing a person in lecture delivering environment. To test our system to confront with non-frontal face orientations, it is made sure that the person in each video moves, turns head and also the camera moves. It is observed that the complex nature of MTCNN detect and track faces in non-frontal orientation with better accuracy. The improved processing speed also enhanced the frames per second (FPS) processing ability by four times compared to the conventional full frame scanning techniques. After applying different combinations and thorough analysis, the dynamic margin based model, which combines MTCNN and TM was found to give the highest average accuracy with low average processing time.
Face detection and tracking algorithms mainly suffer from low accuracy, slow processing speed, and poor robustness when meet with real-time setup. The problem becomes crucial in real-time situations such as in human robot interactions... more
Face detection and tracking algorithms mainly suffer from low accuracy, slow processing speed, and poor robustness when meet with real-time setup. The problem becomes crucial in real-time situations such as in human robot interactions (HRI) or video analysis. A margin-based region of interest (ROI) hybrid approach that combines Haar cascade and template matching for face detection and tracking is proposed in this paper to improve the detection accuracy and processing speed. To speed up the processing time, region of interests (ROIs) with fixed and dynamic margin concepts are used. A dataset comprising of ten RGB video streams of fifteen seconds have been created from real-life videos containing a person in lecture delivering environment.  In each video, there exists person’s movement, face turning and camera movements. An accuracy of 97.96% with processing time of 10.76 milliseconds per frame has been achieved. The proposed algorithm can detect and track faces in sideway orientation apart from frontal face. The proposed approach can process the video streams at the speed above 90 frames per second (FPS). The proposed approach reduces processing time by ten times and with a boost to accuracy in comparison to the conventional full frame scanning techniques.
As the internet has become more accessible, the use of Internet of Things (IoT) systems is increasing. An IoT system can be accessed either by directly connecting to the network configured with external access and appropriate port... more
As the internet has become more accessible, the use of Internet of Things (IoT) systems is increasing. An IoT system can be accessed either by directly connecting to the network configured with external access and appropriate port forwarding; or through a cloud server. Direct access through network may not be possible for networks with restriction on external access, and requires technical know-how to configure the port forward. The use of cloud server is convenient that the users simply have to sign up an account at and use user friendly application to setup the system.  This approach however requires the manufacturer to main the cloud server, has the concern of having users’ data accessible to the owner of the cloud server, and does not allow interoperability of devices and systems across different manufacturers.  In this paper, we have proposed the use of publicly available cloud storage services to provide the accessibility of the IoT devices without relying on a specific cloud server, and without requiring the manufacturer to maintain their own cloud server. The proposed approach will allow interoperability between devices or systems from different manufacturers. This paper describes the concept, implementation and evaluation of the proposed cloud storage based IoT system. The proposed IoT system has been implemented to work with Google Drive, Dropbox and Microsoft OneDrive.  The performance of the proposed system has been evaluated against dedicated IoT cloud servers including Microsoft Azure IoT, Google IoT, CloudMQTT (CMQTT) and Eclipse IoT.  The results show that can be effectively used in systems that are not time critical.
Real-time face detection and tracking systems suffer from low accuracy and slow processing speed that lead to poor robustness. This problem is vital in real-time setups including human robot interactions (HRI) and video analysis systems.... more
Real-time face detection and tracking systems suffer from low accuracy and slow processing speed that lead to poor robustness. This problem is vital in real-time setups including human robot interactions (HRI) and video analysis systems. This paper presents margin-based region of interest (MROI) approach to speed up the processing time. Further a hybrid approach is also presented that combines Multi-task Convolutional Neural Networks (MTCNN) with template matching to improve face detection accuracy. The MROI approach which is responsible to speed up the processing time is presented in two variants with fixed and dynamic margin concepts. Dataset used in this work comprises of twenty RGB video files. Each video file is fifteen seconds long and been created from real-life videos containing a person in lecture delivering environment.  Each video file contains a person in which the person moves, turns head and the camera also moves. The highest face detection and tracking accuracy achieved in this paper is 99.31% with a processing time of 14.93 milliseconds per frame. The proposed hybrid algorithm significantly improves the ability to detect and track faces in sideway orientation apart from frontal face. The proposed algorithm has the ability to process above 65 frames per second (FPS). The system presented has increased FPS processing ability by more than 400% as well as given boost to the accuracy if compared to the conventional MTCNN full frame scanning techniques.
Abstract The ability to understand what humans are doing is crucial for any intelligent system to autonomously support human daily activities. Technologies to enable such ability, however, are still undeveloped due to the many challenges... more
Abstract The ability to understand what humans are doing is crucial for any intelligent system to autonomously support human daily activities. Technologies to enable such ability, however, are still undeveloped due to the many challenges in human activity analysis. Among them are the difficulties in extracting human poses and motions from raw sensor data, either recorded from visual sensor or wearable sensor and the need to recognize activities not seen before using unsupervised learning.
At current stage, the majority of the human activity recognition (HAR) technologies are based on supervised learning, where there are labeled data to train an expert system. In this paper, we proposed a framework based on the unsupervised... more
At current stage, the majority of the human activity recognition (HAR) technologies are based on supervised learning, where there are labeled data to train an expert system. In this paper, we proposed a framework based on the unsupervised learning to autonomously discover, learn and recognize atomic activities, i.e., the actions. The input to the HAR framework is a sample pool of unlabeled observations of an unknown number of actions. An incremental action discovery algorithm based on K-means is used to discover new actions. For each new action discovered, a learning algorithm is used to model it through an automated training and cross-validation cycle. The algorithm uses Mixture of Gaussians Hidden Markov Model (HMM) to model the actions, and the algorithm autonomously determines the appropriate number of Gaussian components and states. The framework deals with the dynamic and noisy nature of the data. We evaluated the proposed framework on a third party dataset of daily activities and the results show its performance is in-par with that achieved using a supervised learning algorithm to recognize the activities from the same dataset.
One of the challenges in human activity recognition is the ability for an intelligent system to discover the activity models by itself. In this paper, we propose an incremental approach to discover human activities from unlabeled data... more
One of the challenges in human activity recognition is the ability for an intelligent system to discover the activity models by itself. In this paper, we propose an incremental approach to discover human activities from unlabeled data using K-means. The approach does not require prior specification of the number of clusters, or k-value, and has the ability to reject random movements or noise. Simple algorithm is used making the approach easy to implement without requiring any prior knowledge in the data. We evaluated the effectiveness of the approach and the results show more than 30% improvement in precision and 19% improvement in recall when compared to the results obtained using a non-incremental approach with cluster validity index. The achievement in human activity discovery will enable the wide adoption of human activity recognition technologies in the natural human living environment where labeled data are not available.
Human activity recognition is an important ability in an y system that supports human in performing their daily activities. However, current supervised approach in human activity recognition is difficult to be deployed in the natural... more
Human activity recognition is an important ability in an y system that supports human in performing their daily activities. However, current supervised approach in human activity recognition is difficult to be deployed in the natural human living environment where labeled observations are scarce. In this paper, we demonstrate the use of K-means clustering and simple template models to achieve human activity detection and recognition in an unsupervised manner. The features used are extracted from the skeleton data obtained from an inexpensive RGBD (RGB-Depth) sensor. Our result s show an average detection performance of 80.4% precision and 83.8% recall. The availability of an unsupervised approach in human activity recognition will make possible the wide adoption of human activity recognition in the natural human living environment.
An approach for unsupervised human activity discovery has been proposed in this paper. The approach automatically discover unknown activities from unlabeled data and has the ability to reject random activities. This ability will enable... more
An approach for unsupervised human activity discovery has been proposed in this paper. The approach automatically discover unknown activities from unlabeled
data and has the ability to reject random activities.  This ability will enable intelligent systems to discover and learn new activities autonomously. K-means is used to cluster a pool of unlabeled activity observations into groups of different activities. The system requires no prior knowledge of how many activities to be discovered. It uses cluster validity indices to automatically estimate the required number of clusters and further evaluate cluster homogeneity to accept clusters with homogenous activity and reject clusters with random activities. Experimental results showed the potential of the approach and identified suitable validity indices to achieve unsupervised human activity discovery.
Human activity recognition is an important functionality in any intelligent system designed to support human daily activities. While majority of human activity recognition systems use supervised learning, these systems lack the ability to... more
Human activity recognition is an important functionality in any intelligent system designed to support human daily activities. While majority of human activity recognition systems use supervised learning, these systems lack the ability to detect new activities by themselves. In this paper, we report the results of our investigation of unsupervised human activity detection with features extracted from skeleton data obtained from RGBD sensor. Unlike activity recognition, activity detection does not provide the label however attempts to distinguish one activity from another. This paper demonstrates a suitable set of features to be used with K-means clustering to distinguish different activities from a pool of unlabeled observations. The results show 100% F0.5-score were achieved for six out of nine activities for one of the subjects at low frame rate, while F0.5-score of 71.9% was achieved on average for all activities by four subjects.
Human activity detection is usually achieved through supervised learning where a model is learned from given samples of each activity. With this approach, it is difficult for intelligent systems to automatically discover and learn new... more
Human activity detection is usually achieved through supervised learning where a model is learned from given samples of each activity. With this approach, it is difficult for intelligent systems to automatically discover and learn new activities. We attempt to distinguish human activity using unsupervised learning. The approach relies on the fact that given appropriate features, the distinction between different activities can be observed from simple distance measurement. In this paper, we present our investigation to extract features from the coordinates of human joint positions based on human range of movement
and the results of tests performed to check their effectiveness to distinguish sixteen (16) example activities are reported. Simple unsupervised learning, K-means clustering was used to evaluate the effectiveness of the features. The results indicate that the features based on range of movement significantly improved clustering performance.
The ability to understand what humans are doing is crucial for any intelligent system to autonomously support human daily activities. Technologies to enable such ability, however, are still undeveloped due to the many challenges in human... more
The ability to understand what humans are doing is crucial for any intelligent system to autonomously support human daily activities. Technologies to enable such ability, however, are still undeveloped due to the many challenges in human activity analysis. Among them are the difficulties in extracting human poses and motions from raw sensor data, either recorded from visual sensor or wearable sensor and the need to recognize activities not seen before using unsupervised learning. Furthermore, human activity analysis usually requires expensive sensors or sensing environment. With the availability of low-cost RGBD (RGB-depth) sensor, the new form of data can provide human posture data with high degree of confidence. In this paper, we present our approach to extract features directly from such data (joint positions) based on human range of movement and the results of tests performed to check their effectiveness to distinguish sixteen (16) example activities are reported. Simple unsupervised learning, K-means clustering was used to evaluate the effectiveness of the features. The results indicate that the features based on range of movement significantly improved clustering performance.
The functionality of a robot greatly depends on its sensory capability. Adding sensors to a robot is one major strategy to extend the robot's function and intelligence. However, this strategy can only be effective if the performance of... more
The functionality of a robot greatly depends on its sensory capability. Adding sensors to a robot is one major strategy to extend the robot's function and intelligence. However, this strategy can only be effective if the performance of the sensors meet the requirements of the robot application. In particular, in many systems, it is crucial for the sensors to provide reliable updates on its value to the robot system. Accessing the sensors represents an overhead to the system and shall be taken into consideration when designing such robotic system. For a remote controlled robot, one significant overhead of accessing a sensor will be the communication delay. This paper investigates the impact of sensors in a remote controlled robot created from LEGO Mindstorms NXT kit. Communication activities and overhead when using LEGO sensors are reported. The results can serve a reference when designing a remote controlled robot using LEGO Mindstorms NXT kit.