Sek Chai - Academia.edu

Skip to main content

Sek Chai

Followers

46

Following

11

Co-authors

11

Public Views

Interests

Uploads

Papers

Event Prediction in Processors Using Deep Temporal Models

2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), 2018

In order to achieve high processing efficiencies, next generation computer architecture designs n... more In order to achieve high processing efficiencies, next generation computer architecture designs need an effective Artificial Intelligence (AI)-framework to learn large-scale processor interactions. In this short paper, we present Deep Temporal Models (DTMs) that offer effective and scalable time-series representations to addresses key challenges for learning processor data: high data rate, cyclic patterns, and high dimensionality. We present our approach using DTMs to learn and predict processor events. We show comparisons using these learning models with promising initial simulation results.

BitNet: Bit-Regularized Deep Neural Networks

ArXiv, 2017

We present a novel optimization strategy for training neural networks which we call "BitNet&... more We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a ...

Generative Memory for Lifelong Reinforcement Learning

ArXiv, 2019

Our research is focused on understanding and applying biological memory transfers to new AI syste... more Our research is focused on understanding and applying biological memory transfers to new AI systems that can fundamentally improve their performance, throughout their fielded lifetime experience. We leverage current understanding of biological memory transfer to arrive at AI algorithms for memory consolidation and replay. In this paper, we propose the use of generative memory that can be recalled in batch samples to train a multi-task agent in a pseudo-rehearsal manner. We show results motivating the need for task-agnostic separation of latent space for the generative memory to address issues of catastrophic forgetting in lifelong learning.

Quantization-Guided Training for Compact TinyML Models

ArXiv, 2021

We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized lo... more We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-ofthe-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.

Generalized Ternary Connect: End-to-End Learning and Compression of Multiplication-Free Deep Neural Networks

ArXiv, 2018

The use of deep neural networks in edge computing devices hinges on the balance between accuracy ... more The use of deep neural networks in edge computing devices hinges on the balance between accuracy and complexity of computations. Ternary Connect (TC) \cite{lin2015neural} addresses this issue by restricting the parameters to three levels $-1, 0$, and $+1$, thus eliminating multiplications in the forward pass of the network during prediction. We propose Generalized Ternary Connect (GTC), which allows an arbitrary number of levels while at the same time eliminating multiplications by restricting the parameters to integer powers of two. The primary contribution is that GTC learns the number of levels and their values for each layer, jointly with the weights of the network in an end-to-end fashion. Experiments on MNIST and CIFAR-10 show that GTC naturally converges to an `almost binary' network for deep classification networks (e.g. VGG-16) and deep variational auto-encoders, with negligible loss of classification accuracy and comparable visual quality of generated samples respectiv...

Subtensor Quantization for Mobilenets

Quantization for deep neural networks (DNN) have enabled developers to deploy models with less me... more Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.

Guest Editorial: Special Issue on Embedded Machine Learning

Detecting Zero-day Controller Hijacking Attacks on the Power-Grid with Enhanced Deep Learning

Attacks against the control processor of a power-grid system, especially zero-day attacks, can be... more Attacks against the control processor of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the attacks can prevent further damage. However, detecting zero-day attacks can be challenging because they have no known code and have unknown behavior. In order to address the zero-day attack problem, we propose a data-driven defense by training a temporal deep learning model, using only normal data from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we can quickly find malicious codes running on the processor, by estimating deviations from the normal behavior with a statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with over 99.9% accuracy and nearly zero false positives.

Bootstrapping Deep Neural Networks from Image Processing and Computer Vision Pipelines

Complex image processing and computer vision systems often consist of a “pipeline” of “black boxe... more Complex image processing and computer vision systems often consist of a “pipeline” of “black boxes” that each solve part of the problem. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amounts of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.

Bit Efficient Quantization for Deep Neural Networks

2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS), 2019

Power-Grid Controller Anomaly Detection with Enhanced Temporal Deep Learning

2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019

Deep Multimodal Fusion: A Hybrid Approach

International Journal of Computer Vision, 2017

Efficient Object Detection Using Embedded Binarized Neural Networks

Journal of Signal Processing Systems, 2017

Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016

Method for synchronizing audio and video streams

Automatic generation of streaming processor architectures

Unsupervised underwater fish detection fusing flow and objectiveness

2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016

Scientists today face an onerous task to manually annotate vast amount of underwater video data f... more Scientists today face an onerous task to manually annotate vast amount of underwater video data for fish stock assessment. In this paper, we propose a robust and unsupervised deep learning algorithm to automatically detect fish and thereby easing the burden of manual annotation. The algorithm automates fish sampling in the training stage by fusion of optical flow segments and objective proposals. We auto-generate large amounts of fish samples from the detection of flow motion and based on the flow-objectiveness overlap probability we annotate the true-false samples. We also adapt a biased training weight towards negative samples to reduce noise. In detection, in addition to fused regions, we used a Modified Non-Maximum Suppression (MNMS) algorithm to reduce false classifications on part of the fishes from the aggressive NMS approach. We exhaustively tested our algorithms using NOAA provided, luminance-only underwater fish videos. Our tests have shown that Average Precision (AP) of detection improved by about 10% compared to non-fusion approach and about another 10% by using MNMS.

Power constrained design of multiprocessor interconnection networks

Proceedings International Conference on Computer Design VLSI in Computers and Processors

Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek ... more

Saturday, June 16

IMIS-2012 Reviewers

Event Prediction in Processors Using Deep Temporal Models

2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), 2018

In order to achieve high processing efficiencies, next generation computer architecture designs n... more In order to achieve high processing efficiencies, next generation computer architecture designs need an effective Artificial Intelligence (AI)-framework to learn large-scale processor interactions. In this short paper, we present Deep Temporal Models (DTMs) that offer effective and scalable time-series representations to addresses key challenges for learning processor data: high data rate, cyclic patterns, and high dimensionality. We present our approach using DTMs to learn and predict processor events. We show comparisons using these learning models with promising initial simulation results.

BitNet: Bit-Regularized Deep Neural Networks

ArXiv, 2017

We present a novel optimization strategy for training neural networks which we call "BitNet&... more We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a ...

Generative Memory for Lifelong Reinforcement Learning

ArXiv, 2019

Our research is focused on understanding and applying biological memory transfers to new AI syste... more Our research is focused on understanding and applying biological memory transfers to new AI systems that can fundamentally improve their performance, throughout their fielded lifetime experience. We leverage current understanding of biological memory transfer to arrive at AI algorithms for memory consolidation and replay. In this paper, we propose the use of generative memory that can be recalled in batch samples to train a multi-task agent in a pseudo-rehearsal manner. We show results motivating the need for task-agnostic separation of latent space for the generative memory to address issues of catastrophic forgetting in lifelong learning.

Quantization-Guided Training for Compact TinyML Models

ArXiv, 2021

We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized lo... more We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-ofthe-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.

Generalized Ternary Connect: End-to-End Learning and Compression of Multiplication-Free Deep Neural Networks

ArXiv, 2018

The use of deep neural networks in edge computing devices hinges on the balance between accuracy ... more The use of deep neural networks in edge computing devices hinges on the balance between accuracy and complexity of computations. Ternary Connect (TC) \cite{lin2015neural} addresses this issue by restricting the parameters to three levels $-1, 0$, and $+1$, thus eliminating multiplications in the forward pass of the network during prediction. We propose Generalized Ternary Connect (GTC), which allows an arbitrary number of levels while at the same time eliminating multiplications by restricting the parameters to integer powers of two. The primary contribution is that GTC learns the number of levels and their values for each layer, jointly with the weights of the network in an end-to-end fashion. Experiments on MNIST and CIFAR-10 show that GTC naturally converges to an `almost binary' network for deep classification networks (e.g. VGG-16) and deep variational auto-encoders, with negligible loss of classification accuracy and comparable visual quality of generated samples respectiv...

Subtensor Quantization for Mobilenets

Quantization for deep neural networks (DNN) have enabled developers to deploy models with less me... more Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.

Guest Editorial: Special Issue on Embedded Machine Learning

Detecting Zero-day Controller Hijacking Attacks on the Power-Grid with Enhanced Deep Learning

Attacks against the control processor of a power-grid system, especially zero-day attacks, can be... more Attacks against the control processor of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the attacks can prevent further damage. However, detecting zero-day attacks can be challenging because they have no known code and have unknown behavior. In order to address the zero-day attack problem, we propose a data-driven defense by training a temporal deep learning model, using only normal data from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we can quickly find malicious codes running on the processor, by estimating deviations from the normal behavior with a statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with over 99.9% accuracy and nearly zero false positives.

Bootstrapping Deep Neural Networks from Image Processing and Computer Vision Pipelines

Complex image processing and computer vision systems often consist of a “pipeline” of “black boxe... more Complex image processing and computer vision systems often consist of a “pipeline” of “black boxes” that each solve part of the problem. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amounts of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.

Bit Efficient Quantization for Deep Neural Networks

2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS), 2019

Power-Grid Controller Anomaly Detection with Enhanced Temporal Deep Learning

2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019

Deep Multimodal Fusion: A Hybrid Approach

International Journal of Computer Vision, 2017

Efficient Object Detection Using Embedded Binarized Neural Networks

Journal of Signal Processing Systems, 2017

Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), 2016

Method for synchronizing audio and video streams

Automatic generation of streaming processor architectures

Unsupervised underwater fish detection fusing flow and objectiveness

2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016

Scientists today face an onerous task to manually annotate vast amount of underwater video data f... more Scientists today face an onerous task to manually annotate vast amount of underwater video data for fish stock assessment. In this paper, we propose a robust and unsupervised deep learning algorithm to automatically detect fish and thereby easing the burden of manual annotation. The algorithm automates fish sampling in the training stage by fusion of optical flow segments and objective proposals. We auto-generate large amounts of fish samples from the detection of flow motion and based on the flow-objectiveness overlap probability we annotate the true-false samples. We also adapt a biased training weight towards negative samples to reduce noise. In detection, in addition to fused regions, we used a Modified Non-Maximum Suppression (MNMS) algorithm to reduce false classifications on part of the fishes from the aggressive NMS approach. We exhaustively tested our algorithms using NOAA provided, luminance-only underwater fish videos. Our tests have shown that Average Precision (AP) of detection improved by about 10% compared to non-fusion approach and about another 10% by using MNMS.

Power constrained design of multiprocessor interconnection networks

Proceedings International Conference on Computer Design VLSI in Computers and Processors

Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek ... more

Saturday, June 16

IMIS-2012 Reviewers