2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), 2018
In order to achieve high processing efficiencies, next generation computer architecture designs n... more In order to achieve high processing efficiencies, next generation computer architecture designs need an effective Artificial Intelligence (AI)-framework to learn large-scale processor interactions. In this short paper, we present Deep Temporal Models (DTMs) that offer effective and scalable time-series representations to addresses key challenges for learning processor data: high data rate, cyclic patterns, and high dimensionality. We present our approach using DTMs to learn and predict processor events. We show comparisons using these learning models with promising initial simulation results.
We present a novel optimization strategy for training neural networks which we call "BitNet&... more We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a ...
Our research is focused on understanding and applying biological memory transfers to new AI syste... more Our research is focused on understanding and applying biological memory transfers to new AI systems that can fundamentally improve their performance, throughout their fielded lifetime experience. We leverage current understanding of biological memory transfer to arrive at AI algorithms for memory consolidation and replay. In this paper, we propose the use of generative memory that can be recalled in batch samples to train a multi-task agent in a pseudo-rehearsal manner. We show results motivating the need for task-agnostic separation of latent space for the generative memory to address issues of catastrophic forgetting in lifelong learning.
We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized lo... more We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-ofthe-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.
The use of deep neural networks in edge computing devices hinges on the balance between accuracy ... more The use of deep neural networks in edge computing devices hinges on the balance between accuracy and complexity of computations. Ternary Connect (TC) \cite{lin2015neural} addresses this issue by restricting the parameters to three levels $-1, 0$, and $+1$, thus eliminating multiplications in the forward pass of the network during prediction. We propose Generalized Ternary Connect (GTC), which allows an arbitrary number of levels while at the same time eliminating multiplications by restricting the parameters to integer powers of two. The primary contribution is that GTC learns the number of levels and their values for each layer, jointly with the weights of the network in an end-to-end fashion. Experiments on MNIST and CIFAR-10 show that GTC naturally converges to an `almost binary' network for deep classification networks (e.g. VGG-16) and deep variational auto-encoders, with negligible loss of classification accuracy and comparable visual quality of generated samples respectiv...
Quantization for deep neural networks (DNN) have enabled developers to deploy models with less me... more Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.
Attacks against the control processor of a power-grid system, especially zero-day attacks, can be... more Attacks against the control processor of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the attacks can prevent further damage. However, detecting zero-day attacks can be challenging because they have no known code and have unknown behavior. In order to address the zero-day attack problem, we propose a data-driven defense by training a temporal deep learning model, using only normal data from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we can quickly find malicious codes running on the processor, by estimating deviations from the normal behavior with a statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with over 99.9% accuracy and nearly zero false positives.
Complex image processing and computer vision systems often consist of a “pipeline” of “black boxe... more Complex image processing and computer vision systems often consist of a “pipeline” of “black boxes” that each solve part of the problem. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amounts of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.
2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019
2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016
Scientists today face an onerous task to manually annotate vast amount of underwater video data f... more Scientists today face an onerous task to manually annotate vast amount of underwater video data for fish stock assessment. In this paper, we propose a robust and unsupervised deep learning algorithm to automatically detect fish and thereby easing the burden of manual annotation. The algorithm automates fish sampling in the training stage by fusion of optical flow segments and objective proposals. We auto-generate large amounts of fish samples from the detection of flow motion and based on the flow-objectiveness overlap probability we annotate the true-false samples. We also adapt a biased training weight towards negative samples to reduce noise. In detection, in addition to fused regions, we used a Modified Non-Maximum Suppression (MNMS) algorithm to reduce false classifications on part of the fishes from the aggressive NMS approach. We exhaustively tested our algorithms using NOAA provided, luminance-only underwater fish videos. Our tests have shown that Average Precision (AP) of detection improved by about 10% compared to non-fusion approach and about another 10% by using MNMS.
Proceedings International Conference on Computer Design VLSI in Computers and Processors
Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek ... more Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek M. Chai, Sudhakar Yalamanchili, David E. Schimmel School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta GA. 30332-0250 Abstract ...
2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), 2018
In order to achieve high processing efficiencies, next generation computer architecture designs n... more In order to achieve high processing efficiencies, next generation computer architecture designs need an effective Artificial Intelligence (AI)-framework to learn large-scale processor interactions. In this short paper, we present Deep Temporal Models (DTMs) that offer effective and scalable time-series representations to addresses key challenges for learning processor data: high data rate, cyclic patterns, and high dimensionality. We present our approach using DTMs to learn and predict processor events. We show comparisons using these learning models with promising initial simulation results.
We present a novel optimization strategy for training neural networks which we call "BitNet&... more We present a novel optimization strategy for training neural networks which we call "BitNet". The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a ...
Our research is focused on understanding and applying biological memory transfers to new AI syste... more Our research is focused on understanding and applying biological memory transfers to new AI systems that can fundamentally improve their performance, throughout their fielded lifetime experience. We leverage current understanding of biological memory transfer to arrive at AI algorithms for memory consolidation and replay. In this paper, we propose the use of generative memory that can be recalled in batch samples to train a multi-task agent in a pseudo-rehearsal manner. We show results motivating the need for task-agnostic separation of latent space for the generative memory to address issues of catastrophic forgetting in lifelong learning.
We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized lo... more We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-ofthe-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.
The use of deep neural networks in edge computing devices hinges on the balance between accuracy ... more The use of deep neural networks in edge computing devices hinges on the balance between accuracy and complexity of computations. Ternary Connect (TC) \cite{lin2015neural} addresses this issue by restricting the parameters to three levels $-1, 0$, and $+1$, thus eliminating multiplications in the forward pass of the network during prediction. We propose Generalized Ternary Connect (GTC), which allows an arbitrary number of levels while at the same time eliminating multiplications by restricting the parameters to integer powers of two. The primary contribution is that GTC learns the number of levels and their values for each layer, jointly with the weights of the network in an end-to-end fashion. Experiments on MNIST and CIFAR-10 show that GTC naturally converges to an `almost binary' network for deep classification networks (e.g. VGG-16) and deep variational auto-encoders, with negligible loss of classification accuracy and comparable visual quality of generated samples respectiv...
Quantization for deep neural networks (DNN) have enabled developers to deploy models with less me... more Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.
Attacks against the control processor of a power-grid system, especially zero-day attacks, can be... more Attacks against the control processor of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the attacks can prevent further damage. However, detecting zero-day attacks can be challenging because they have no known code and have unknown behavior. In order to address the zero-day attack problem, we propose a data-driven defense by training a temporal deep learning model, using only normal data from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we can quickly find malicious codes running on the processor, by estimating deviations from the normal behavior with a statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with over 99.9% accuracy and nearly zero false positives.
Complex image processing and computer vision systems often consist of a “pipeline” of “black boxe... more Complex image processing and computer vision systems often consist of a “pipeline” of “black boxes” that each solve part of the problem. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amounts of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.
2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019
2016 IEEE Winter Applications of Computer Vision Workshops (WACVW), 2016
Scientists today face an onerous task to manually annotate vast amount of underwater video data f... more Scientists today face an onerous task to manually annotate vast amount of underwater video data for fish stock assessment. In this paper, we propose a robust and unsupervised deep learning algorithm to automatically detect fish and thereby easing the burden of manual annotation. The algorithm automates fish sampling in the training stage by fusion of optical flow segments and objective proposals. We auto-generate large amounts of fish samples from the detection of flow motion and based on the flow-objectiveness overlap probability we annotate the true-false samples. We also adapt a biased training weight towards negative samples to reduce noise. In detection, in addition to fused regions, we used a Modified Non-Maximum Suppression (MNMS) algorithm to reduce false classifications on part of the fishes from the aggressive NMS approach. We exhaustively tested our algorithms using NOAA provided, luminance-only underwater fish videos. Our tests have shown that Average Precision (AP) of detection improved by about 10% compared to non-fusion approach and about another 10% by using MNMS.
Proceedings International Conference on Computer Design VLSI in Computers and Processors
Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek ... more Page 1. Power Constrained Design of Multiprocessor Interconnection Networks Chirag S. Patel, Sek M. Chai, Sudhakar Yalamanchili, David E. Schimmel School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta GA. 30332-0250 Abstract ...
Uploads