Deep Learning Networks Guide
Deep Learning Networks Guide
S. S. Iyengar
Azad M. Madni
Deep Learning
Networks
Design, Development and Deployment
Deep Learning Networks
Jayakumar Singaram • S. S. Iyengar •
Azad M. Madni
Azad M. Madni
Astronautical Engineering Deptarment,
RRB 201 3100
University of Southern California
Los Angeles, CA, USA
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
This book, written by experts in AI and machine learning, is unique. Unlike current
books on this subject that either cover the theory and mathematical underpinnings
of deep learning, or focus exclusively on programming-centric concepts, tools and
languages, this book addresses and bridges both aspects. It seamlessly connects
theoretical methods with pertinent technologies and toolsets in a manner that makes
the material suitable for students, educators, and practitioners.
Its proposition lies in its multifaceted treatment of the subject. It conveys com-
plex Deep Learning concepts in simple terms, making the material understandable
to a wide audience. In addition, it elucidates the intricate landscape of the different
technologies and toolsets currently available, thereby offering readers the much
needed clarity needed to make informed decisions for their respective applications
and problem domains.
By bridging theory and practice, this book empowers readers to not only grasp
fundamental concepts but to also confidently navigate the practical applications of
Deep Learning. Ultimately, this book will serve as a comprehensive guide for Deep
Learning enthusiasts, practitioners, educators, and researchers alike. Its focus on
holistic understanding and actionable insights makes it an invaluable “must read,”
and an essential resource for anyone interested in delving into the exciting realm of
Deep Learning.
v
Preface
This book presents multiple facets of deep learning networks involved in the design,
development, and deployment of these networks. More specifically, this book is
an introduction to the toolset and its associated deep learning techniques. The
book also presents design and technical aspects of programming and provides
pragmatic tools for understanding the interplay of programming and technology
for several applications. It charts a tutorial which provides wide-ranging conceptual
and programming tools that underlie the deep learning applications.
Furthermore, the book presents a clear direction toward a path forward that
profoundly engages and challenges the art of science and engineering programming
for students taking undergraduate courses.
vii
Acknowledgements
This research was sponsored by the Army Research Office and was accomplished
under Grant Number W911NF-21-1-0264. The views and conclusions contained in
this document are those of the authors and should not be interpreted as representing
the official policies, either expressed or implied, of the Army Research Office or
the US Government. The US Government is authorized to reproduce and distribute
reprints for Government purposes notwithstanding any copyright notation herein.
ix
Purpose of This Book
xi
xii Purpose of This Book
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Artificial Intelligence (AI). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 AI vs. ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Deep Learning (DL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 DL vs. ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Deep Learning and Deep Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Deep Learning Network Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Low-Code and Deep Learning Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Role of Tool Set in Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Schematic Representation of Deep Learning Architecture . . . . . . . . . 13
2.3 Deep Learning Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Data Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Model Design for Deep Learning Network . . . . . . . . . . . . . . . . . 15
2.3.3 Train Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.4 Test Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.5 Save Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.6 Load Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.7 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Custom Framework: DLtrain for AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Sample AI Application Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Quick Look: IBM Watson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 IBM Watson Service and Monitor Tomato Farm . . . . . . . . . . . 18
2.5.3 Real-Time Audit of IP Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Introduction to Software Tool Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Virtual Environment for Required Tool Set . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 TensorFlow: An AI Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Keras in TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 TensorFlow Image in Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
xiii
xiv Contents
3.3 JupyterLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 JupyterLab: Latex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Setting Up Edge AI Computer (Jetson Nano) . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 IBM Watson Machine Learning: Community Edition . . . . . . . . . . . . . . . 28
3.7 Tool Set to Build DLtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7.1 Target Machine Is X86 with Ubuntu . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7.2 Use Docker: Target Machine Is X86 with Ubuntu . . . . . . . . . 30
3.7.3 Target Machine Is Power 9 with Ubuntu . . . . . . . . . . . . . . . . . . . . 30
3.7.4 Target Machine Is Jetson Nano with Ubuntu . . . . . . . . . . . . . . . 31
3.7.5 Target Machine Is X86 Windows 10 . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Docker Image of DLtrain Application to Train CNN . . . . . . . . . . . . . . . . 31
3.9 Deploy DL Networks in Near Edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.9.1 Deploy DL Networks by Using TensorFlow RT . . . . . . . . . . . 33
4 Hardware for DL Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Open Source for Edge Native Hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 POWER9 with RTX 2070 GPU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 OpenPOWER CPU with ASPEED VGA Controller . . . . . . . 38
4.2.2 CUDA Installation and PCI Driver for RTX 2070 . . . . . . . . . 40
4.2.3 Build Application Using nvcc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.4 Edge Native AI Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.5 On-Prem Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.6 DGX Station A100 for DL Networks . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.7 Deployment of AI in X86 Machine . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.8 Deployment of AI in Android Phone . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.9 Deployment of AI in Rich Edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Data Set Design and Data Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Source of Data: Human and Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 Data Set Creation and Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.5 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.5.1 Bernoulli: Binary Classification of Data . . . . . . . . . . . . . . . . . . . . 52
5.5.2 Binomial: Binary Classification of Data . . . . . . . . . . . . . . . . . . . . 54
5.5.3 Poisson: Binary Classification of Data . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Image Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6.1 Image Data and Maxwell-Boltzmann Statistics . . . . . . . . . . . . 57
5.6.2 Working with Image Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.6.3 Pixel Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6.4 Global Centering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6.5 Global Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7 Data Set: Read and Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7.1 Data Set with Label Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7.2 Working with CSV Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Contents xv
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A Training Restricted Boltzmann Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.1 Gradient Descent Is Used to Minimize Cost Function . . . . . . . . . . . . . . . 145
A.2 Score and Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.3 Data Flow in Computation of W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.4 Use of GPU to Compute W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
About the Authors
xix
xx About the Authors
Contribution Award from 2019 IEEE Congress on Cybermatics, the Times Network
NRI (Non-Resident Indian) of the Year Award for 2017, most distinguished
Ramamoorthy Award at the Society for Design and Process Science (SDPS 2017),
the National Academy of Inventors Fellow Award in 2013, and the NRI Mahatma
Gandhi Pradvasi Medal at the House of Lords in London in 2013, among others. He
was awarded Satish Dhawan Chaired Professorship at IISc, then Roy Paul Daniel
Professorship at LSU. He has received the Distinguished Alumnus Award of the
Indian Institute of Science. In 1998, he was awarded the IEEE Computer Society’s
Technical Achievement Award and is an IEEE Golden Core Member. Professor
Iyengar is an IEEE Distinguished Visitor, SIAM Distinguished Lecturer, and ACM
National Lecturer. In 2006, his paper, entitled A Fast-Parallel Thinning Algorithm
for the Binary Image Skeletonization, was the most frequently read article in the
month of January in the International Journal of High-Performance Computing
Applications. His innovative work called the Brooks–Iyengar algorithm along with
Professor Richard Brooks from Clemson University is applied in industries to
solve real-world applications. Dr. Iyengar’s work had a big impact; in 1988, he
and his colleagues discovered "NC algorithms for Recognizing Chordal Graphs
and K-trees" [IEEE Trans. on Computers 1988]. This breakthrough result led to
the extension of designing fast parallel algorithms by researchers like J. Naor
(Stanford), M. Naor (Berkeley), and A. A. Schaffer (AT&T Bell Labs). Professor
Iyengar earned his undergraduate and graduate degrees at UVCE-Bangalore and the
Indian Institute of Science, Bangalore, and a doctoral degree from Mississippi State
University.
His research has been funded by National Science Foundation (NSF), Defense
Advanced Research Projects Agency (DARPA), Multi-University Research Initia-
tive (MURI Program), Office of Naval Research (ONR), Department of Energy/Oak
Ridge National Laboratory (DOE/ORNL), Naval Research Laboratory (NRL),
National Aeronautics and Space Administration (NASA), US Army Research Office
(URO), and various state agencies and companies. He has served on US National
Science Foundation and National Institute of Health Panels to review proposals in
various aspects of Computational Science and has been involved as an external
evaluator (ABET-accreditation) for several Computer Science and Engineering
Departments across the country and the world. Dr. Iyengar has also served as
a research proposal evaluator for the National Academy. Dr. Iyengar has been a
Visiting Professor or Scientist at Oak Ridge National Laboratory, Jet Propulsion
Laboratory, and Naval Research Laboratory and has been awarded the Satish
Dhawan Visiting Chaired Professorship at the Indian Institute of Science, the Homi
Bhabha Visiting Chaired Professor (IGCAR), and a professorship at the University
of Paris-Sorbonne.
Dr. Azad M. Madni is a researcher, educator, entrepreneur, author, and phi-
lanthropist. He is a member of the National Academy of Engineering and is a
University Professor (highest academic designation) at the University of Southern
California. He is the holder of the Northrop Grumman Foundation Fred O’Green
Chair in Engineering and is the Executive Director of USC’s Systems Architecting
xxii About the Authors
xxiii
Chapter 1
Introduction
This chapter presents a comprehensive concept of data, deep learning, and the
design, training, testing, loading, and saving various network models associated
with machine learning. Significantly, it does so using the PyTorch and TensorFlow
open-source tools with the DLTrain with suitable examples and simple command-
controlled instructions. Further suitable hardware configuration, setup, testing,
and other installation-associated infrastructures are structured within a simple
programming framework. This chapter employs the simplest and most widely
used deep learning training models that frequently take first place in competitions.
The learning from the book also envisions the hands-on experience for all kinds
of machine learning users with strong practical demonstrations by supporting
foundational concepts. It is recommended that the reader have a laptop or desktop
handy while reading, in order to write the material learned into a permanent memory
for greater future clarity.
The book is a state-of-the-art treatment of deep learning environments. It caters
to both basic users and experienced data scientists. Standard and specific tools
in demand for deep learning design, development, and deployment are covered.
Furthermore, illustrative screenshots are provided for every topic to help users
acquire hands-on knowledge in deep learning.
“One of the most interesting features of machine learning is that it lies at the
intersection of multiple academic disciplines, principally computer science, statis-
tics, mathematics, and engineering.” Machine learning is usually studied as part
of artificial intelligence, which puts it firmly into the computer science discipline.
However, understanding why these algorithms work requires a certain level of
statistical and mathematical sophistication that is often missing in computer science
undergraduate courses. Question: Did convolutional neural networks (CNNs) find a
way around statistical or mathematical methods, or did it come up with a new theory
of modeling physical processes?
set and only carry out the one activity for which they were created. They are not
aware, sentient, or motivated by emotions or ethics (Fig. 1.3).
2. Artificial general intelligence (AGI): Artificial general intelligence, often
known as powerful machines, that demonstrate human intellect are said to have
artificial intelligence (AI). In this, machines can learn, comprehend, and behave
in a way that is identical to a person in a certain circumstance.
While the general AI does not yet exist, it has been featured in several science
fiction films starring humans interacting with sentient, feeling-driven, and self-
aware robots. Strong AI will enable us to create computers that can reason, plan,
and carry out a variety of activities in a variety of unpredictable environments.
While making decisions, they may use their existing knowledge to provide
original, creative, and out-of-the-box answers.
3. Artificial superintelligence (ASI): The concept of artificial superintelligence
(ASI) envisions a future in which robots will be able to demonstrate intellect that
is greater than that of the smartest humans. In this sort of AI, robots will not only
have the multidimensional intellect of people, but they will also be significantly
more capable of making decisions and solving problems than people. It is the
kind of AI that will have a significant influence on people and might eventually
wipe out the human species entirely.
4 1 Introduction
its prior actions and experiences. The agents in reinforcement learning receive
rewards for carrying out the right actions and penalties for doing them poorly.
1.3 AI vs. ML
The below steps show how AI and ML can be seen as a one big picture and ML as
a part of AI:
Step 1 An AI system is built using machine learning and other models.
Step 2 Machine learning models are created by studying patterns in the data.
Step 3 Data scientists optimize the machine learning models based on patterns in
the data.
which comes before creating the output, compiles the weights of the input nodes
and proclaims the outcome. Deep learning requires sophisticated mathematical
computations and data processing. As a result, the system hardware must be highly
strong. Yet, even with extremely strong technology, training neural networks on it
takes weeks.
1.5 DL vs. ML
Given that deep learning and machine learning are frequently used synonymously,
it is important to understand their differences. Neural networks, deep learning,
and machine learning are all branches of artificial intelligence. Deep learning is
a subfield of neural networks, which are in turn a subfield of machine learning. The
way each algorithm learns is where deep learning and machine learning diverge.
While supervised learning, sometimes referred to as labeled data sets, can be used
by “deep” machine learning to guide its algorithm, it is not a requirement. Deep
learning can automatically identify the collection of features that separate several
categories of data from one another after ingesting unstructured material in its raw
form (such as text or photos). This reduces the need for some human interaction
and makes it possible to handle bigger data sets. Deep learning can be equated to
“scalable machine learning.” Traditional, or “non-deep,” machine learning is more
reliant on human input. In order to grasp the distinctions between different data
inputs, human specialists choose a set of features, which typically requires more
structured data to learn.
Artificial neural networks (ANNs), often known as neural networks, are built
from node layers that each have an input layer, one or more hidden layers, and
an output layer. Each node, or artificial neuron, is connected to others and has a
weight and threshold that go along with it. Any node whose output exceeds the
defined threshold value is activated and begins providing data to the network’s
uppermost layer. Otherwise, that node does not transmit any data to the network’s
next layer. The term “deep learning” simply describes the quantity of layers in a
neural network. Deep learning algorithms or deep neural networks can be defined
as neural networks with more than three layers, inclusive of the input and output.
Just a basic neural network is one with three layers or less. Deep learning and
neural networks are credited with quickening development in fields including speech
recognition, computer vision, and natural language processing.
Deep learning is a subset of machine learning that uses artificial neural networks
with multiple layers to model and solve complex problems. It involves training
the neural network on large data sets to learn patterns and make predictions or
1.7 Deep Learning Networks 7
Deep learning networks are artificial neural networks with multiple layers of
interconnected nodes, also known as artificial neurons. These networks are typically
composed of an input layer, one or more hidden layers, and an output layer. Each
layer consists of many nodes that perform a specific computation and communicate
with nodes in the adjacent layers.
The input layer receives data from the outside world and passes it to the hidden
layers, where the data is transformed through a series of nonlinear transformations.
The output layer produces the final output of the network, which is a prediction or
classification based on the input data.
Deep learning networks can be divided into two main types: feed forward neural
networks and recurrent neural networks. Feed forward neural networks are the
most common type of deep learning network and are used in tasks such as image
recognition and speech recognition. Recurrent neural networks, on the other hand,
are used in tasks such as natural language processing and speech recognition, where
the input data is a sequence of values, such as a sentence or a sound waveform.
The power of deep learning networks comes from their ability to automatically
learn complex features from the input data without human intervention. This makes
them well suited for tasks where the data is high-dimensional and complex, such
as image and speech recognition. With the help of large data sets and powerful
8 1 Introduction
Image recognition: Deep learning has been used to develop highly accurate image
recognition systems, such as Google Photos, which can accurately identify and
categorize images based on their content.
Natural language processing (NLP): Deep learning has been applied to NLP tasks
such as sentiment analysis, language translation, speech recognition, and text
generation. For example, the Google Assistant uses deep learning to understand
natural language queries and respond with relevant information.
Autonomous vehicles: Deep learning is a key technology in the development of
autonomous vehicles. Self-driving cars use deep learning to analyze sensor data,
such as camera images, LIDAR data, and radar data, to recognize and respond to
different driving scenarios.
Healthcare: Deep learning is being used to improve medical diagnoses and
treatments. For example, deep learning algorithms can analyze medical images
to detect diseases such as cancer or to predict patient outcomes based on medical
records.
Robotics: Deep learning has been applied to robotics, enabling robots to perform
complex tasks such as grasping and manipulation of objects. This has numerous
applications in manufacturing, agriculture, and other industries.
Chapter 2
Low-Code and Deep Learning
Applications
intelligent systems in these core engineering verticals apart from consumer industry
requirements.
But in embedded systems (called as IoT edge or IoT node in modern Industry 4.0),
still there is a challenge in deploying AI, ML, and DL applications in edge or in
node native applications.
In the past, there have been two major routes for businesses to take on their way
to application development. Buy apps ready-made from an external vendor or build
and customize them from scratch using skilled developers and coders. Trending
news in market shows, there is rise and growing sophistication of low-code and no-
code development alternatives that bring the power of application development to
users across the business.
Experiment performed on chatGPT to get code for “matrix multiplication in
cores.” A generated code by chatGPT is clean and as good as a hand coded
by human. It appears that code generation is much more involved and requires
information on silicon architecture and also suitable algorithms for the same silicon.
Code generation is not new in industry, for example, MATLAB used to generate C
code for a given Simulink diagram. DLTRAIN is a new-generation tool set that is
going step ahead and it is very close to the no-code route. Tool sets play a major
role in industry. Most of them are open-source tools and increasingly complex to
use and provide commercial service in inference. DLTRAIN is developed to serve
as a single tool set to handle training and embedded deployment. Discussion from
[13] “Software Engineering Institute in Carnegie Mellon University” is given in the
following.
The need for an engineering discipline to guide the development and deployment of AI
capabilities is urgent. For example, while an autonomous vehicle functions well cruising
down an empty race track on a sunny day, how can it be designed to function just
as effectively during a hail storm in New York City? AI engineering aims to provide
a framework and tools to proactively design AI systems to function in environments
characterized by high degrees of complexity, ambiguity, and dynamism. The discipline of
AI engineering aims to equip practitioners to develop systems across the enterprise-to-edge
spectrum, to anticipate requirements in changing operational environments and conditions,
and to ensure human needs are translated into understandable, ethical, and thus trustworthy
AI.
Data collection is presented in the bottom layer of Fig. 2.2. Engineering domain
knowledge is key to handling data collection for training, testing, and deployment
of deep learning networks.
The top layer handles deployment of deep learning networks, where deployment
might happen in CPU, GPU, FPGA, DSP, or combinations of mentioned computing
devices. In between layer includes data preprocessing, training, and testing of deep
learning networks. A model of deep leaning networks also provided the above data
preprocessing layer. Generic AI applications include the following steps as part of
the necessary steps to design, develop, and deploy deep learning networks.
14 2 Low-Code and Deep Learning Applications
In the deep learning framework, the network models include NN, CNN, RNN,
LSTM, GAN, VAE, etc. Many more variations also persist but mostly all are based
on the restricted Boltzmann machine (RBM).
Independent Chap. 6 is included to handle mathematical theory which is used in
designing deep learning networks. Readers are expected to refer to books or research
papers on probability distribution, Boltzmann distribution, restricted Boltzmann
distribution, and neural networks.
Training of a deep learning network model uses the available hardware and is one
of the most time-consuming.
PyTorch and TensorFlow are two major open-source platforms that are generally
used in training a given deep learning network model through the available data set.
The TensorFlow tool set is used in this book to illustrate examples and associated
events in training NN and CNN.
IBM Cloud service offers both of these open-source platforms along with IBM
WATSON Studio.
DLtrain is designed and developed (with no dependency on open-source AI
software packages).
A trained model is required to undergo testing by using test data set which
is segmented for testing. Testing also requires complex platforms like PyTorch,
DLtrain, TensorFlow, etc. But workload may be compared to training deep learning
networks.
16 2 Low-Code and Deep Learning Applications
A trained and tested model is required to be stored in storage media by using save
model methods that are defined in PyTorch, TensorFlow, DLtrain, etc. As of now
there is no IEEE standard file format to store a deep learning network model.
A trained and tested model is useful to perform inference. Load model methods are
using a tool set available in PyTorch, TensorFlow, DLtrain, etc. The model can be
deployed on to different types of embedded devices.
2.3.7 Deployment
Business owners for enterprises of all sizes are struggling to find the next generation
of solutions that will unlock the hidden patterns and value from their data. Many
organizations are turning to artificial intelligence (AI), machine learning (ML),
and deep learning (DL) to provide higher levels of value and increased accuracy
from a broader range of data than ever before. They are looking to AI to provide
the basis for the next generation of transformative business applications that span
hundreds of use cases across a variety of industry verticals. AI, ML, and DL have
become hot topics with global IT clients. They are driven by the confluence of next-
generation ML and DL algorithms, new accelerated hardware, and more efficient
tools to store, process, and extract value from vast and diverse data sources that
ensure high levels of AI accuracy. However, AI client initiatives are complex and
often require specialized skills, ability, hardware, and software that are often not
readily available. AI-enabled application deployment includes both the software and
2.4 Custom Framework: DLtrain for AI 17
the hardware infrastructure that are deeply optimized for a complete production AI
system.
The engineering workforce in industries is highly enthusiastic about adopting
new development tools and the accompanying environments. Engineering college
teaching staff with good interest in setting up a “Cognitive Computing Lab” in their
college after going through the proposed workshop. Self-motivated students with an
interest in learning DL-based application development and deployment in IoT edge.
The abovementioned problems and associated tool sets have their own difficulties
at many levels. DLTRAIN is designed to remove most of the issues and provides a
good solution for train, test, and deploy given NN and CNN models. Deployment
can be on the IoT edge as well.
A custom AI framework provides consistency across AI in IoT edges. For
example, real-time inference is emerging as a critical need of the food and medical
service delivery industry to process and extract value from vast and diverse data
sources that ensure high levels of accuracy in delivered service. However, AI-
enabled enterprise service initiatives are complex and often require specialized
skills, ability, hardware, and software that is often not readily available. AI-enabled
application deployment requires being deeply optimized and also production-ready.
The host OS is provided by NVIDIA and the same is used by the team in
NVIDIA. NVIDIA provides a driver to handle A100 hardware from CPU. Most
importantly, “container runtime” provides remote access to deploy containers for
AI model training. Enterprise business customers have the option to use their
application containers for DL/ML. Fresh AI model scripts or pre-trained models
can be used as an input to build AI applications.
Democratize deep learning: Pushing the limit on deep learning’s accuracy
remains an exciting area of research, but as the saying goes, “perfect is the enemy
of good.” Existing models are already accurate enough to be deployed in a wide
range of applications. Nearly every industry and scientific domain can benefit from
deep learning tools. If many people in many sectors are working on the technology,
we will be more likely to see surprising innovations in performance and energy
efficiency.
The DLTRAIN platform provides options to train NN and CNN models by using
an image class of data set. DLTRAIN is designed to make easy-to-deploy DL in edge
computing devices. DLTRAIN is a perfect tool to handle issues in porting trained
DL models in edge computers that are having CPU and GPUs. A silicon vendor can
take advantage of the above infrastructure and move their GPU silicon into IoT edge
device market. Porting PyTorch and TensorFlow models on to embedded device is
one of the challenging problems and DLTRAIN is solving the same issue. DLTRAIN
provides C and C.++ code along with a license for the customer team to quickly
deploy DL-enabled devices into the market.
18 2 Low-Code and Deep Learning Applications
have the inclination to take up work on a farm on a daily wage basis. Added to this,
there is a need to have the capital to train these workforces and deploy them in the
field. All these added up to the level in which farmland owners get nervous to go in
for short-term crops such as potato, tomato, wheat, etc.
IBM Watson Studio-based visual recognition service is used to build an appli-
cation that can be a digital assistant to the agriculture workforce in the agriculture
industry. In case, if this is expected to work locally, then a local deployment of
infrastructure (visual insights) is required for visual intelligence service, leveraging
automation to enhance agriculture workers’ productivity, identifying crop disease,
and acting on insights faster with machine learning optimization. This will lead to
sustained output from harvest and also provide relief to agriculture farm owners
to manage cash flow well. Deploy AI-based applications in agriculture farms on a
large scale by using on-premise inference ability in the form of mobile applications
or web applications. In this direction, IBM cognitive computing (visual insights)
infrastructure appears to be the best fit to deliver high-performance computing
requirements.
Deployment companies can customize inference applications for smartphones.
Tomato packing line workers use visual intelligence micro web service to become
part of the workflow to monitor and deliver good-quality tomatoes. During moni-
toring, workers can be efficient by using a “customized visual insight application
service” as a digital assistant to check the quality of tomatoes.
Cost per diagnosis is a critical parameter and the complexity of workflow to
perform diagnosis is another parameter. “Customized visual insight application
service” addresses both these parameters by using the IBM Watson IoT platform
to reduce complexity in workflow and the visual recognition platform to reduce
cost per diagnosis. Innovation in creating optimal yet robust models by using deep
learning convolutional neural networks has led to low-cost “customized visual
insight application service.”
For example, agriworkers start diagnosis work and get results within 2–3 minutes
by using a smartphone app with a few clicks (sub 5 clicks). Also cost per diagnosis
is 5 Rs. Workflow complexity for diagnosis is removed and this is brought down to
a few clicks in smartphone applications.
Tomato crop monitoring requires sensor deployment in the tomato field. These
sensors (IoT node) are used to record data (for example, humidity, wind speed, rain
level, sunlight intensity, soil moisture, etc.) and send recorded data to IoT edge.
The application of artificial intelligence at the IoT edge is aimed at comprehending
incoming sensor data and transmitting classification or prediction outcomes to the
Watson IoT platform.
The application deployed in edge works as an MQTT client device and provides
the following two services: send notification service to those who are in the
subscription list and receive notifications from those IoT devices that are in the
publish list. The Pub-Sub model-based “application deployed in edge” provides
the latest information on crop health to agriculture workers and also to those in
the subscription list. IBM Watson provides the MQTT broker platform to manage
MQTT clients that are deployed in IoT nodes, IoT edge, applications in IBM
20 2 Low-Code and Deep Learning Applications
Cloud, and user access devices such as smartphone, desktop PC, etc. For example,
applications in IoT sensors and IoT edge are working in asynchronous mode. In this
case, there is a need to have a broker to handle data collection from IoT nodes or
from IoT edge to both. “Application deployed in edge” is designed to work with IoT
nodes or IoT edges that are connected via 4G or 5G or Starlink satellite modem.
“Application deployed in edge” is working as a microservice to manage title-based
Pub-Sub message handling service. IoT nodes and IoT edges are not required to
have global IP addresses to use the abovementioned service. It is expected that IoT
node devices in the field may not have good hardware and software infrastructure to
have clients that are based on rest API, XMPP, etc. “Application deployed in edge”
is supporting text string, number string, and JPG data.
A sensor network is deployed in the tomato field. Optionally, sensor nodes can
be connected directly to the Watson IoT platform by using the MQTT client in the
sensor. But this is not recommended because sensor nodes need to have a good
amount of hardware infrastructure to make the above happen. It appears that the
optimal way is to deploy IoT edges in the field and connect with IoT nodes (sensors).
In this process, the amount of investment required for an IoT node (sensor) network
will be optimal. IoT edge will have MQTT clients and edge will be connected
with the Watson IoT platform as well by using 4G network or by using an Internet
connectivity infrastructure in a given tomato field.
“Application deployed in edge” is a limited capability MQTT broker and it
is used to include all those nodes and edges that are part of a 2G/3G network.
Mostly, it provides customized service to each node, each edge, agriworkers, tomato
traders, tomato buyers, and farm owners. Web service and smartphone app services
are deployed in IBM Cloud. The IBM Watson AI component is used to provide
machine learning and deep learning capability to web application and mobile
application service. The mentioned service is deployed by using containers (for
example, Docker). For long-term benefit to farm owners, it is recommended to have
an on-premise mini cloud platform such that monthly expenditure is cut down in
communication, and also, farm owners can derive advantage by having near real-
time service for the abovementioned personas.
“Create a real-time stream of sensor data and receive control data for the
Agriculture fields with MQTT and Kubernetes.” This is meant to build the required
product technical prototype and will show how to turn open data into an open
event stream with MQTT and microservices on Kubernetes. MQTT is a lightweight
messaging protocol which is useful in situations of low network bandwidth.
Featured technologies:
1. Kubernetes: container orchestration
2. MQTT: lightweight publish/subscribe protocol
3. Application deployed in edge (limited and yet AI-driven MQTT broker)
Workflow with major tool chain:
1. Create Kubernetes cluster.
2. Setup Openshift.
2.5 Sample AI Application Deployment 21
The right tools are the bridge between ideas and results in the
world of deep learning.
This section of the book presents tool sets for deep learning applications and
primarily focuses to illustrate the instructions to configure the environment step
by step with data, operating system, application, hardware, and other auxiliary
services. The novelty of this section is describing the detailed practical configuration
techniques for setting up of virtual environments with TensorFlow and PyTorch
open-source tools in governance with IBM Watson and Keras Support. The book
further presents the eye-opening techniques to install, configure, and run methods
for machine learning coding editors such as Jupyter Notebook in various environ-
ments. More importantly, the practical training of tool engagement for deep learning
is inevitable for students, professionals, domain experts and data scientists which is
highly recommended to gain the best learning experience.
Also, this chapter includes a real-time console diagram on each of the tool and
application configuration that makes domain experts understand data science and
associated workflows that are possible in the Watson AI platform. On the other
hand, it will be practically infeasible to train data scientists on domain knowledge.
The intent is to ensure readers of this book will have insight knowledge of tools and
their working environment prior to writing deep learning techniques. Open-source
software provides a major boost for deep learning applications. It is known that
open-source software might undergo rapid change or it might vanish as well. Thus,
busing applications based on open-source software modules require its own list of
tool sets to maintain for building apps and deploying applications.
to have its own tool chain for the development of an application and also for
deployment of an application. In a given machine many applications might have
deployed and each application might require a list of libraries or packages.
For example, in Fig. 3.1, library A has two versions such as A1 and A2. Suppose
application J1 is using A1 and J2 is using A2. In this case, it is important to keep two
versions of library A1 and A2. Keeping two versions of the library creates problems
at a given point of time.
In the above example, library A has two versions such as A1 and A2. It is not
safe to keep A1 and A2 in Project 1 and expect application J1 to work properly.
For example, /tmp/AI/ is a folder used as project root directory. And “WorkDL”
is used as a name for a virtual environment. Following, it creates a copy of Python
in an /tmp/AI/ folder in which the user runs commands and also places it in a folder
named WorkDL.
Use link [14] for source code and workflow documentation.
After completing installation of “virtualenv,” users can use pip or pip3 to install
packages of their choice. Importantly, installed packages via pip or pip3 will be
placed in the WorkDL folder. The same installed packages will be not available
globally in a given PC or device. Using pip will be placed in the WorkDL folder
isolated from the global Python installation. The following is used to check versions
of pip in this environment:
Virtual environment 1.7 onwards appears to be good.
3.2 TensorFlow: An AI Platform 25
Virtual environment 1.7 onwards will not include the packages that are installed
globally.
Listing tool sets installed in WorkDL can be obtained by using pip or pip3. In this
case both pip and pip3 provide the same result. The location of pip will be useful
and use the following to get the location of the installation folder. Use freeze to keep
the user environment consistent. File [Link] has a list of packages that are
installed or requires installation. The same is equal to “pip list,” where “pip list”
displays installed packages.
If the user machine does not have a GPU and the user wants to utilize CPU
as much as possible, then the user should build TensorFlow from the source
optimized for user CPU with AVX, AVX2, and FMA enabled. (This is required to
build TensorFlow for a given CPU instead of installing TensorFlow that is directly
available as an installation executable.)
Refer to link [15] for the source code and sample documentation.
Check the TF installed in the user virtual environment. For example, the above
is an old virtual environment and in that TF is installed. In a given new virtual
environment WorkDL, install by using URL [16]
Install TensorFlow and Keras in WorkDL. Use link [16] for more information on
Keras Installation
File [Link] is created in the /tmp/jetson folder by using the command
“pip freeze [Link].” The created file is empty and has added items for
Keras and TF installation. Use link [17] to obtain a workflow which is used in Keras
to train the TensorFlow model.
Step 1. Set up your environment.
Step 2. Install Keras.
Step 3. Import libraries and modules.
Step 4. Load image data from MNIST.
Step 5. Preprocess input data for Keras.
Step 6. Preprocess class labels for Keras.
Step 7. Define the model architecture.
Step 8. Compile model.
Step 9. Fit model on training data.
Step 10. Evaluate the model on test data.
3.3 JupyterLab
JupyterLab is a popular web-based user interface for Project Jupyter. Execution can
be done cell by cell and the same is very useful for design engineers to trace issues
with ease.
How is Python 3-enabled JupyterLab brought up?
Install JupyterLab on Python 3.5 or above. The above given questions are handled
in URL [14].
The Jupyter Notebook is the early web application for creating and sharing com-
putational documents. It offers a simple, streamlined, document-centric experience.
Jupyter supports over 40 programming languages, including Python.
Install Jupyter Notebook for Python 3.5 or above. Details on the use of Jupyter
Notebook is given in [14].
Access to Remote Jupyter Notebook is a very useful tool set while working
with near edge machine. Use URL [18] to access information on “Remote Jupyter
Notebook.”
Latex is very useful to create scientific and research-level documents. Jupyter Lab
provides extensions to create Latex versions of content that are present in cells of
Jupyter Lab.
Install Latex extension with JupyterLab on Python 3.5 or above. Use URL [14]
for more information.
To convert to PDF, nbconvert uses the TeX document preparation ecosystem. It
produces an intermediate .tex file which is compiled by the XeTeX engine with the
LaTeX2e format to produce a PDF output.
Users can use an Overleaf account [19] to compile and generate PDF files from
a given file which are output from the Jupyter Lab.
IoT edge devices use GPU for real-time inference. Jetson Nano is one of the
emerging IoT edge devices and Nvidia has released a development kit for Nvidia
Jetson Nano.
28 3 Introduction to Software Tool Set
The URL [20] offers a comprehensive example with detailed instructions for
configuring an AI computer. It furnishes 15 essential steps for the installation of the
required software to conduct inference on a Jetson Nano device.
IBM Watson Machine Learning Accelerator for Enterprise AI: Watson Machine
Learning Accelerator, a new piece of Watson Machine Learning, makes deep learn-
ing and machine learning more accessible to your staff and brings the benefits of
AI into your business. It combines popular open-source deep learning frameworks,
efficient AI development tools, and accelerated IBM® Power Systems™ servers.
Now your organization can deploy a fully optimized and supported AI platform that
delivers blazing performance, proven dependability, and resilience. Watson Machine
Learning Accelerator is a complete environment for data science as a service,
enabling your organization to bring AI applications into production. It enables rapid
deployment.
It includes the most popular deep learning frameworks, including all required
dependencies and files, precompiled and ready to deploy. The entire AI suite
has been validated and optimized to run reliably on accelerated power servers. It
incorporates the most popular deep learning frameworks. Watson Machine Learning
Accelerator gives access to power-optimized versions of all of the most popular
deep learning frameworks currently available, including TensorFlow, Caffe, and
PyTorch. Watson Machine Learning Accelerator runs on IBM Power-accelerated
server HPC, a platform that runs not only your deep learning but also a wide
variety of HPC and high-performance data analytic workloads. It leverages unique
capabilities of accelerated power servers, delivering performance unattainable on
commodity servers, and provides for hyperparameter search and optimization and
elastic training to allocate the resources needed to optimize performance, and
distributed deep learning provides for rapid insights at massive scale. A large model
support facilitates the use of system memory with little to no performance impact,
yielding significantly larger and more accurate deep learning models.
The IBM Watson Machine Learning Community Edition is available as a no-
charge orderable part number from IBM.
Install PowerAI in Conda Environment to use GPU Get “Conda” for Power 9
machine by using the URL [21].
The WML CE packages are installed into a Conda environment, so after
installation is complete, the frameworks are ready for use. Each framework provides
a test script to verify some of its functions. These test scripts include tests and
examples that are sourced from the various communities. Note that some of the
included tests rely on data sets (for example, MNIST) that are available in the
community and are downloaded at run time.
3.7 Tool Set to Build DLtrain 29
DLtrain is an embedded AI-ready tool set. Details on the same are given on the
GitHub link with working source code. Ubuntu 18.04 machine is used to build
DLtrain. And also it is built for X86, arm, ppc64le.
The DLtrain platform in Fig. 3.2 uses multiple resources (for example, CPU and
GPU) to train deep learning networks. The DLtrain platform is available for multiple
CPUs such as X86, ARM, and ppc64le.
Users having access to Jetson series hardware can use DLtrain to run training
workload and also inference workload.
The DLtrain platform is available for Windows machines as well.
The GitHub page of DLinIoTedge provides the necessary source code and informa-
tion to build the DLtrain.
Reference link [22] provides the source code of deep learning networks for deep
learning network training application.
Reference link [22] provides the source code of deep learning networks for
inference by using a deep learning network model.
The abovementioned deep learning network model training and inference plat-
form is named as DLtrain.
Use URL [23] to build DLtrain for inference. [Link] for x86 gcc tool set
is given in the above URL.
Use cmake to generate makefile. Use make to build executable of DLtrain. Use
DLtrain to train deep learning networks. Use DLtrain to perform inference by using
deep learning networks.
Use URL [24] to build DLtrain to train deep learning networks. The above-
mentioned four steps are used in the above URL to build DLtrain for the training
workload. The [Link] for x86 gcc tool set is given in the above URL.
The Docker image of DLtrain is created by using a source code in the following link
[25].
DLtrain for Power 9 machine is created by using source code in the following link
[26].
Ubuntu 18.04-based g++, gcc tool set used (cmake also used to create makefile)
to create the DL application that is running in Power 9.
The objective is to build DLtrain for Power 9 Ubuntu machines.
Run it in Power 9 (training workload) and store it in a model with the name as
jjnet. Where jjnet is a model and this is output after training. For inference, this
model alone is enough to perform inference at IoT edge. Then use the jjnet model
and perform inference by using edge devices and DLtrain for edges.
Use URL [26] to build DLtrain for training workload and also for inference
workload. The [Link] for Power 9 gcc tool set is given in the above URL.
1. Use cmake to generate makefile.
2. Use make to build executable of DLtrain.
3. Use DLtrain to train deep learning networks.
4. Use DLtrain to perform inference by using deep learning networks.
3.8 Docker Image of DLtrain Application to Train CNN 31
DLtrain for Jetson Nano machine is created by using the source code in the
following link [27].
The objective is to build DLtrain for machines.
Use URL [27] to build DLtrain for the training workload and also for inference
workload. [Link] for Power 9 gcc tool set is given in the above URL.
1. Use cmake to generate makefile.
2. Use make to build executable of DLtrain.
3. Use DLtrain to train deep learning networks.
4. Use DLtrain to perform inference in Jetson Nano by using deep learning
networks.
Running inference workload is the key focus by using ARM and GPU. The above
URL is handling ARM and also GPU during creating executables. The GPU side of
the source code required further development.
DLtrain for Windows 10 machine is created by using the source code in the
following link [28].
Use the above URL to build DLtrain for the training workload and also for the
inference workload.
to be built for each specific target platform, for example, a specific operating system
and hardware CPU architecture. It is required to create a Docker image of a DLtrain
for a specific target platform. Docker image read-only templates are nothing but the
building blocks of a Docker container. A Docker container is the running instance
of a Docker image.
URL [29] provides details on “using Docker image of DLtrain.”
DLtrain is a tool set which can be used to train NN and CNN models of deep
learning networks. DLtrain is used in the deployment of “deep learning networks
in near edge.” DLtrain is the best option for the embedded application development
team and also for the deployment team. Algorithms and edge silicon startups
have attracted huge investment, but tool developers are still catching up. Tool
development is compensating for lingering skills gaps by moving to higher levels of
abstraction.
Open-source tool sets are used in the training of NN or CNN. But during
deployment, there is a need to use a tool set from a particular silicon vendor.
Inference engine clients are expected to work from near edge and receive input data
in real time from IoT nodes, for example, home gateway machine receiving image
or video from doorbell camera for real-time inference.
Add the following port along with the IP address of
The following might help users to get connected with near edge and IoT nodes
via the TCP/Ip network.
If a user runs a web server listening on [Link] as opposed to [Link] or user-
specific IP, [Link] is the local loop back device and is only accessible to the
device which is running on. [Link] is used to make an application listen on all
network devices. Users can provide IP addresses to edge. For example, an address
can be
1 http ://[Link]:8000/
or similar in local network with a local IP address. Users can use the above URL
from other networked IoT devices to run client applications.
URL [30] has a necessary workflow for the following five steps in near edge:
Step 1 Virtual environment: Activate virtual environment and in this case dlBox is
the virtual environment in near edge machine.
Step 2 use Jupyter Notebook: Run a Jupyter Notebook as given below. Keep in
server mode and no browser mode is on, and in this process, the remote
3.9 Deploy DL Networks in Near Edge 33
machine can use its own browser to work with a Jupyter Notebook that runs
in a near edge (Power 9) processor.
Step 3 Near edge is live: The above is running in a near edge (Power 9) machine
(or running in ssh terminal to near edge which is a Power 9 machine). Users
should leave this open and in running mode.
Step 4 Local Server to handle near edge: Open another terminal in user machine (if
it is a Ubuntu machine ) and use the ssh command to connect and read the
URL of the near edge (Power 9) machine such that the Jupyter Notebook
can be used in the user machine via a web browser. Users can use URL
1 l o c a l h o s t : 8 8 8 9 work@171 . 6 1 . 1 2 3 . 7 6
to reach edge from the local PC browser. Use the following command from
the local PC to reach the near edge machine.
1 s s h −N −f −L l o c a l h o s t : 8 8 8 6 l o c a l h o s t : 8 8 8 9 work@171
.61.123.76
The above will open a web page in the user machine and ask for a token,
bring a token from Step 2. The same token will be displayed in the running
window of the Jupyter Notebook.
Successful deployment of the abovementioned workflow will lead users
to have good control on near edge machines, where near edge machines
are expected to run real-time inference service for applications that are
subscribed inference service from a given near edge machine.
TensorFlow Lite is the official (from Nvidia) solution for running machine learning
models on mobile and embedded devices. It enables on-device machine learning
inference with low latency and a small binary size on Android, iOS, etc. TensorFlow
Lite 10.2 uses many techniques for this such as quantized kernels that allow smaller
and f aster (fixed-point math) models. Though deep learning networks run faster, it
comes at a trade-off of having lower accuracy.
Use URL [31] to install and use TensorFlow Lite.
Chapter 4
Hardware for DL Networks
This section proceeds to engage the previous section learning to empower the
advanced hardware system knowledge that powers a sturdy performance to train
the deep learning networks. A detailed hardware environment setting, configuration,
and presentation are presented on various processing and computing kinds including
AMD, POWER9, ARM, ARM .+ GPU, and X86 systems. These processing environ-
ments are showcased with an installation, setup, and configuration on edge servers.
Deep learning needs high computing systems with customized configurations for
various applications and tools, and hence, the book is not limited to deep learning
tools and application, but as well educates users and professionals to know insights
of what kinds of hardware and performance settings should be configured to achieve
the best deep learning results.
The book also provides sufficient URL reference links for coders to readers to
quickly download all relevant tools, applications, and hardware configuration tech-
niques in the need of the hour. Further, advanced installations like NVIDIA CUDA
compiler, GPU hardware, GeForce multiprocessor, thread processing, IBM Watson
CE, and large-scale AI business enterprise suite configuration are demonstrated in
simple steps. At last, deployment of AI on X86 and Android phone is also presented.
Increase in hardware performance is necessary to train deep learning networks.
The market is witnessing a proliferation of specialized hardware that not only
offers better performance on deep learning tasks, but also increased efficiency
(performance per watt). Figure 4.1 provides CPU and CPU .+ GPU combinations
which are used in enterprise level and also in research labs in academic institutes.
DGX Station A100 is very popular in enterprise-level performance and also
multiple of them form an on-prem cluster to manage the required computing
in training deep learning networks. AC922 and V100 GPUs are in the high-
performance segment. OpenPOWER CPU and PCI card (RTX 2080 or 2070) are
providing options to the cost-sensitive enterprise market.
The Jetson series Xavier and Orin are providing entry-level performance to train
deep learning networks.
CPU and FPGA are providing more options in the inference segment.
AI community’s demand for GPUs led to Google’s development of TPUs and
pushed the entire chip market towards more specialized products.
In the next few years we will see NVIDIA, Intel, SambaNova, Mythic, Graph-
core, Cerebras, and other companies bring more focus to hardware for AI workloads.
The Silicon Vendor team can take advantage of emerging requirements for
accelerated computing and move their GPU silicon into the IoT edge market. For
example, TI has their own inference engine (TIDL); Qualcomm has their own
(SPNE) as well ST Micro and many other silicon vendors.
install. Talos™ II drives the state of the art of secure computing forward. Talos™ II gives
you — and only you — full control of your machine’s security. Rest assured knowing
that only your authorized software and firmware are running via POWER9’s secure boot
features. Don’t trust us? Look at the secure boot sources yourself — and modify them as
you wish. That’s the power of Talos™ II.
During installation of CUDA SDK and its requirements, it is part of routine checks
to make sure of the VGA controller and its resource allocation. In some cases,
VGA controllers also go through PCI, so it is necessary to remove resource conflict
between other devices in PCI. Making a CPU to support GPU devices via PIC
requires the above study on resources used for VGA controllers.
The ASPEED controller as in Fig. 4.3 has a baseboard management controller,
or BMC [35], which is a small computer that sits on virtually every server
motherboard. Other components such as higher-end switches, JBODs, JBOFs, and
other devices now include BMCs as well. The largest vendor for BMCs today is
ASPEED whose AST2400 BMC is pictured below.
BMC support: Discrete GPU (VGA-compatible controller: GeForce RTX 2070).
Is it true that GeForce RTX 2070 is a discrete GPU? Most modern discrete GPUs
require firmware. As Talos™ II is aimed at a security-conscious audience, it does
not currently include GPU firmware in the production firmware images.
4.2 POWER9 with RTX 2070 GPU 39
Power 9
USB 2.0
x1 PCLe Gen10.4GB/s
LPC 33MHz
(Optional) TPM
Analog Video
DDR3
128 MB
Does this mean GeForce RTX 2070 does not have firmware support in Talos™
II?
In case yes, how can Talos™ II support firmware for GeForce RTX 2070? The
following boot sequence is useful to resolve issues during boot.
Boot:
1. Does Talos™ II supports Trusted Boot or Secure Boot?
2. In case Talos™ II has a Secure Boot on, then how do you disable the same?
Trusted Boot is the measurement (hashing) of system firmware boot components
and the creation of secure cryptographic artifacts that unambiguously demonstrate
that particular firmware has been executed by the system. Trusted Boot artifacts can
be used to remotely verify system integrity or to seal secrets so that they are only
available after certain firmware has been executed.
Secure Boot is the cryptographic signing and verification of firmware boot
components, failure of which is flagged for system administrator investigation and
action, including logging an error and halting the system boot. Secure Boot prevents
the system from executing either accidentally or maliciously modified firmware.
VGA ports in PCIe (is it gen 3 or 4?) bus of Talos™ II
VGA-compatible controller, NVIDIA Corporation Device ([Link].0), and
VGA-compatible controller, ASPEED Technology, Inc. ([Link].0), are
placed in PCIe slots of Talos™ II.
40 4 Hardware for DL Networks
Workaround 1: Disable the onboard VGA output via the VGA disable jumper,
J10109. See the user’s guide for additional information.
Workaround 2: Select desired GPU at run time (yes, this option is put in use).
More information “about configuring ASPEED controllers” is discussed in [34]
and the particular file name is [Link].
by extending rays into a scene and bouncing them off surfaces and towards sources
of light to approximate the color value of pixels.
Ray tracing is capable of simulating a variety of optical effects such as reflection,
refraction, soft shadows, scattering, depth of field, motion blur, caustics, ambient
occlusion, and dispersion phenomena (such as chromatic aberration). It can also be
applied to track the trajectory of sound waves, much like it does with light waves.
This feature makes it a suitable choice for enhancing the immersive sound design
in video games by generating lifelike reverberations and echoes. Additionally, it’s
important to note that there are 64 CUDA cores in this context. CUDA kernels also
have access to a unique variable that provides information about the number of
threads within a block.
.blockDim.x Using this variable, in conjunction with blockIdx.x and threadIdx.x,
The warp scheduler as shown in Fig. 4.5 looks at all warps assigned to it, to
determine which have instructions that are ready to issue. The warp scheduler then
chooses 1 or 2 instructions that are ready to execute and issues those instructions.
The process of issuing an instruction involves assigning functional units within an
SM to that execution (scheduling) of that instruction, warp-wide. A warp is always
32 threads; therefore, 32 functional units in one clock cycle, or a smaller number
distributed across multiple clock cycles, must be scheduled (and therefore must be
“available”) to issue the instruction.
Let “queue of blocks” be associated with each kernel launch. As resources on an
SM become available, the block scheduler will deposit a block from the “queue”
42 4 Hardware for DL Networks
onto that SM. The block scheduler does not deposit blocks warp-by-warp. It is
an all-or-nothing proposition, on a block-by-block basis. Let us consider a block
that is already deposited on an SM. A warp is “eligible” when it has one or more
instructions that are ready to be executed.
4.2 POWER9 with RTX 2070 GPU 43
single clock cycle, or, e.g., 16 over 2 clock cycles, or 8 over 4 clock cycles, etc.
The following questions provide a hint to understand more on SM and its efficient
usage:
1. What are all the different types of functional units in an SM?
2. How many of functional unit X are in SM architecture Y?
3. What is the pipeline depth of functional unit X?
4. What is the exact algorithm by which a warp scheduler chooses instructions to
issue?
Get source code of vectorAdd example from the Samples folder . /Simulations/
Build vectorAdd application by using make. Section 3 in URL [36] has an example
code for vector addition in GPU.
Use URL [34] to get more details on running body examples in RTX 2070 GPU.
PTX file creation
Following command line instructions, make output “user2” from “[Link]” and
run it as well. Produce the PTX for the CUDA kernel. Section 4 in URL [36] is
handling PTX file creation and its use.
Use Python to use GPU in run time
Use Python code to perform computation in CUDA cores. Matrix multiplication
is done in GPU0 by using Python code. Matrix addition is done in GPU1 by using
Python code.
Following Python code, run TensorFlow on multiple GPUs. The same code
provides the option to construct a defined model in a multitower fashion where
each tower is assigned to a different GPU. Use the following URL to get code and
associated workflow.
Following, Python code is used to test GPU availability for computation in Talos
II. Use link [36] to get [Link], [Link], and [Link] files. Use the same files to test
GPU availability in OpenPOWER CPU.
The training platform is different from the deployment platform. The same provides
obstruction to deploy a trained AI model (CNN, RNN networks, etc.) into limited
capability deployment edge. Mostly, there is a need to cut down the AI model size
or optimize weights of the AI model. Optimization of the AI model size or changing
weight of the AI model may not be there if deployment happens in the edge side by
using DGX Station A100. Enterprise business owners of all sizes are struggling to
find the next-generation AI solutions that will unlock the hidden patterns and value
from their huge volume of data.
Emerging AI-enabled microservices in a given enterprise are driven by the
confluence of ML/DL algorithms. Figure 4.6 provides details on layers in DGX
Station A100. Enterprise on-prem requirement appears to be matching with speci-
fications of DGX Station A100 which provides high levels of accuracy in business
solutions. However, AI-enabled enterprise service initiatives are complex and often
require specialized skills, ability, hardware, and software that are often not readily
available.
1. Training data set creation (on-prem or in IBM Cloud or in Colab or any other)
2. Building AI model by using TensorFlow or PyTorch. Building an AI model using
the custom framework DLtrain (for NN, limited version of CNN)
3. Training AI model by using DGX Station A100
4. Deploying AI model in IoT edge for inference service in real time
Enterprise customers have the option to train large models using a fully GPU-
optimized software stack and up to 320 gigabytes (GB) of GPU memory. With DGX
Station A100, enterprise can provide multiple users with a centralized AI resource
for all workload training, inference, and data analytics. DGX Station A100 brings
AI out of the data center with a server-class system that can plug in anywhere to
perform real-time inference. DGX Station A100 uses the NVIDIA DGX ™ software
stack and it is an ideal platform for teams from all enterprises, large and small.
Data science teams effortlessly providing multiple, simultaneous users with a
centralized AI resource, DGX Station A100 is the work group appliance for the
age of AI. It is capable of running training, inference, and analytics workloads in
parallel, and it can provide up to 28 separate GPU devices to individual data science
teams.
The AI workgroup server delivering 2.5 peta FLOPS organizations around the
world can provide multiple users with a centralized AI resource for all workloads
that delivers an immediate on-ramp to NVIDIA DGX™-based infrastructure and
works alongside other NVIDIA-certified systems with a DGX Station A100 rental,
which is a new-generation enterprise offering in multi-instance GPU (MIG), includ-
ing four NVIDIA A100 Tensor Core GPUs, a top-of-the-line server-grade GPU,
superfast NVMe storage, and leading-edge PCIe Gen4 buses. A100 includes remote
management so enterprise customers can manage their DGX Station A100 like a
server. With no complicated installation processes or significant IT infrastructure
required, the DGX Station A100 can truly be placed anywhere an enterprise
customer data science team requires complex computations. Simply plug your
station into any standard wall outlet to get it up and running in minutes—and work
from anywhere.
This supercomputer was truly designed for today’s agile data science teams that
work in corporate offices, labs, research facilities, or even from home as the DGX
Station A100 can run simultaneous processes from multiple users without affecting
performance.
4.2 POWER9 with RTX 2070 GPU 47
NVIDIA DGX Station A100 is providing an opportunity to use the world’s only
office-friendly system with four fully interconnected and MIG-capable NVIDIA
A100 GPUs, leveraging NVIDIA® NVLink® for running parallel jobs and multiple
users without impacting system performance.
DGX Station A100 is a server-grade AI system that does not require data center
power and cooling. It includes four NVIDIA A100 Tensor Core GPUs, a top-of-
the-line, server-grade CPU, superfast NVMe storage, and leading-edge PCIe Gen4
buses, along with remote management so you can manage it like a server. It is
suitable for use in a standard office environment without specialized power and
cooling.
Deployment of trained CNN model in X86 machine has the following items as part
of its deployment:
1. Ubuntu 18.04 OS is used in deployment machine which is X86.
2. Python is not required in deployment machine which is X86.
3. TensorFlow is not required in deployment machine which is X86.
4. FPGA (via PCI add-on) is not required in deployment machine which is X86.
5. GPU (via PCI add-on) is not required in deployment machine which is X86.
6. Item (b) and item (c) are always true.
7. Item (e) may be true sometimes.
8. Others.
Problem 4.2.1 Deployment of DLtrain application to train NN or CNN model
has a well-defined workflow. What are the items required in the following list
to successfully complete deployment of DLtrain for training a deep learning
network?
(a) User machine needs to install Docker (Ubuntu 18.04 X86 machine).
(b) User pulls “dltrain:1.0.0” docker image from Docker hub.
(c) User uses “dltrain:1.0.0” to train CNN model.
(d) CNN model definition is given in a txt file which is located in the user machine
(current working folder).
(e) Images folder is also required to be in the current working folder.
(f) Images and Network._prop.txt file are downloaded from Google Drive link
which is provided during demonstration.
(g) All items are true.
(h) Item (f) may not be true always.
48 4 Hardware for DL Networks
Problem 4.2.2 Deployment of “trained CNN model in Android Phone” has the
following items as part of its workflow for successful completion of deployment:
(a) NDK is (in Android Studio) used to build inference engine which is developed
in C and C++.
(b) Inference engine is not using GPU in phone.
(c) Inference engine is not using DSP in phone.
(d) Inference engine is capable of using updated trained CNN model from Ubuntu
machine (X86) via WiFi.
(e) Inference engine is used to collect data from the user (in this case the LKG
student).
(f) Inference engine is showing inference output in display.
Rich edge is expected to have Python with installed virtual environment. If the CNN
model or NN model is small enough to fit within the constraints of rich edge, then
deployment will go through successfully. In case there is an issue in memory size
available to load the trained CNN, then there are expected issues that might require
pruning of the trained CNN model such that it can fit in rich edge.
Rich edge may not have FP32 support, or if FP32 computation is costly, then
there is a need to move towards INT32 computation during inference.
Problem 4.2.3 Deployment of TensorFlow version of the NN or CNN model in IoT
edge is required to quantize the trained NN or CNN model. This happens because
(a) The CNN model in Python is not ready to be deployed in IoT edge.
(b) The CNN model in TensorFlow is not ready to be deployed in IoT edge.
(c) The trained CNN model might have floating point weights.
(d) The trained CNN model might have too many neurons.
(e) IoT edge technology is very different from deep learning-based inference
technology.
(f) All the above are true.
(g) Others.
Setup Jetson nano AI computer and more information is given in [38] to set up
Jetson nano.
AI Edge Computer: Run CUDA program in Jetson Series Devices.
Chapter 5
Data Set Design and Data Labeling
5.1 Insight
This section presents the most “tricky” and advanced data processing techniques
in the easiest way for the readers. More importantly, the book chapter reveals how
to read data from audio, speech, image, and text in different modes and techniques
for data sanitization and scaled data processing systems. The book also explains
statistical methods for interpreting and analyzing data for different deep learning
models; the Maxwell-Boltzmann statistic technique is deeply described specifically
with image signal processing, which demonstrates its main use in open CV libraries.
MNIST data handling is presented with training, test, and deployment mechanism.
More importantly, a novel technique, pixel normalization for image processing, is
presented including the global standards that facets the sequences in prediction,
classification, and sequence generation and sequence classification [39, 40] (see
Fig. 5.1).
5.2 Description
A data set is an essential part in training deep learning networks. In the following
paragraphs, various sources are given and these sources form mostly a basis for
modern-generation data sets to train deep learning networks and also machine
learning networks. The leftmost side in Fig. 5.2 indicates a low volume of data and
the rightmost side indicates huge volume of data.
Human beings generate too much data and add to this every digital device (IoT)
also creating 100 times more data compared to human beings. Imagine, 2000 plus
electricity transformers in a given city (may be, for example, Chennai in Tamil
Nadu) might produce too much of data for every hour. It will be almost impossible
for “Engineer in Power distribution substation or at Feeder to take a call by looking
at huge amounts of data from each transformer.” AI can reduce these data sizes
and provide good inference-level data to engineers to decide on load dispatch.
5.4 Data Set Creation and Statistical Methods 51
Healthcare is also coming up. Things are happening and few companies emerged
in this data-driven business segment as well.
Generated text data has been used to perform intelligent gathering for well-
defined objectives. Image data-based intelligence report regeneration and notifi-
cation service are becoming very popular at enterprise level and also getting into
consumer industries in the form of a doorbell with computer vision, advanced driver
assistance system (ADAS) with computer vision, and many more applications in
the healthcare segment as well. A proposed workshop provides an introduction to
cognitive computing in multimedia applications.
As shown in Fig. 5.2, collection of data might require different kinds of experiments
with the help of domain experts. Experiments, mostly statistical in nature, and, in
the following, popular statistical experiences are discussed.
Data set creation required a good amount of domain knowledge in a given
domain. For example, the following diagram illustrates the kind of signal and
associated networks that are used in deep learning models.
Data set size growth early-stage deep learning networks had used text data as
an input, but major success came after image data set is used in training models.
Speech and audio are also getting in as a part of natural language processing and
many more associated applications. Advanced driver assistance systems (ADASs)
appear to be integrating real-time sensor data as well.
Posterior probability distribution is the probability distribution of an unknown
quantity, treated as a random variable, conditional on the evidence obtained from an
experiment or survey.
In Fig. 5.3,
.k = 1 for Bernoulli trial
.k > 1 for binomial trial
.k = ∞, for Poisson trial
p =1−q
q =1−p
. p+q =1 (5.1)
S = {x1 , x2 , . . . xr }
X(xi ) = 1
Figure 5.4 is one element in a data set which will be used in training NN or
CNN. Each picture is associated with a label, where the label has two values such
as “normal” and “abnormal.” For normal values, xi is in the range of 60 to 80.
P (X(xi )) = 1 = p
. (5.2)
P (X(xi )) = 1 − p = q
.P (X(xi )) = 1 is the probability mass function. After r recording, get one picture
and use the same as an input to train CNN for training data set.
Inference result = normal breathing rate (1) or abnormal breathing rate (0). The
same provides a notification for a person’s breathing condition such as normal or
abnormal.
One sample recorded for each trial. Number of samples per trial = 1.
In the above example, E is [0 and 1] and it is a discrete case in which experiment
provides a binary outcome. But in general E includes values in an interval. For
example, .E ∈ [a, b], where a can be 0 and b can be 1.
. X : S × S × S × ... × S → Z (5.8)
Perform n trials and each trial r recordings. After completing n trials, create a
histogram of recorded data which is shown in Fig. 5.5.
Breath rate 80 to 100 is normal and the rest is abnormal. In the Bernoulli
experiment .n = 1 and in binomial .n > 1. The rest of the workflow is given in
data set creation in the Bernoulli trial.
5.5 Statistical Methods 55
.P (X(xi )) = 1 is the probability mass function. After r recording, get one picture
and use the same as an input to train CNN for training data set.
Inference result = normal breathing rate (1) or abnormal breathing rate (0).
The same provides a notification for a person’s breathing condition such as
normal or abnormal.
r samples recorded for each trial. Number of samples per trial .r > 1.
where p is the normal breathing rate (assigned value is 1) (Table 5.1 provides a
detailed comparison)
where q is the abnormal breathing rate (assigned value is 0)
where .p1 is the normal breathing rate (assigned value is 1) during the first trial
where .q1 is the abnormal breathing rate (assigned value is 0) during the first trial
where .pn is the normal breathing rate (assigned value is 1) during the nth trial
where .qn is the abnormal breathing rate (assigned value is 0) during the nth trial
where .p1 is not equal to .p2 and so on
where .q1 is not equal to .q2 and so on
P (X(xi ) = 1) = pi
. (5.9)
P (X(xi ) = 0) = 1 − pi = qi
λk
. P (k) = e−λ (5.10)
k!
5.6 Image Signal Processing 57
λ2 t2 t2
. = =⇒ λ2 = λ1 (5.11)
λ1 t1 t1
The above steps are useful to create a data set for many new lambda values from
a given set of lambda values.
Gaussian blur which is also known as Gaussian smoothing is the result of blurring
an image by a Gaussian function. It is used to reduce image noise and reduce
details. The visual effect of this blurring technique is similar to looking at an
image through the translucent screen. It is sometimes used in computer vision for
58 5 Data Set Design and Data Labeling
0 250 0 0 250
200 150
200 150 200
300 100
300 100 300
50
50 400
400 400
0
0 0 100 200 300 400 500
0 1 200 300 400 500 0 100 200 300 400 500 2 3
0 0
4 5
100 100
200 200
300 300
400 400
0 100 200 300 400 500 0 100 200 300 400 500
1. OpenCV library
2. PIL library
3. URLLIB library
4. matplotlib library
5. pickle module
6. skimage library
The following URL provides a code to read the fashion mnist data set. The
mentioned data set is read from the URL base and also from the local PC [42].
It is an order 5 tensor, and the dimensions are BatchSize × Depth×Height ×
Width×Channels
A data set of 60,000 .28×28 grayscale images of the 10 digits, along with a test set of
10,000 images. The original black and white images of MNIST had been converted
to grayscale in dimensions of 28*28 pixels in width and height, making a total of
pixels. Pixel values range from 0 to 255, where higher numbers indicate darkness
and lower as lightness.
Refer: [43] provides details on MNIST data set file format and also a necessary
code to read MNIST file.
After completing adding one file and testing, use all other files from your own
data set. Pixel values are often unsigned integers in the range between 0 and 255.
Although these pixel values can be presented directly to neural network models in
their raw format, this can result in challenges during modeling, such as slower than
expected training of the model. Instead, there can be a great benefit in preparing
the image pixel values prior to modeling, such as simply scaling pixel values
to the range 0–1 to centering and even standardizing the values. This is called
normalization and can be performed directly on a loaded image. The example below
uses the PIL library (the standard image handling library in Python) to load an image
and normalize its pixel values.
Problem 5.6.1 MNIST data set is used in DLtrain to train the NN or CNN model.
Given the MNIST data set is having a well-defined format and also its use in DLtrain
to train the NN or CNN model. In the following items, list items that are valid for
the above-defined MNIST data set:
(a) MNSIT data set includes 70,000 images.
(b) 28 × 28 is image size used in MNIST.
.
How to normalize pixel values to a range between zero and one. Use [44] to access
source code and perform normalization on a given image file.
How to center pixel values both globally across channels and locally per channel.
Use the following URL [44] to get a source code for global centering.
How to standardize pixel values and how to shift standardized pixel values to .+ve
domain.
Use the following URL [44] to get a source code for global standardization.
The given example calculates the mean and standard deviation across all color
channels in the loaded image and then uses these values to standardize the pixel
values.
Mostly, ML-based networks use data with labels during training and testing phases.
Image classification networks (DL based) also use data with label for training and
testing. Labeling a given data is mostly manual and it is driving a very vibrant
industry. There are companies providing service for data label work. By using
computer vision, there is a mix of manual and partly automatic also getting into
data label workflow.
Volume of data set defines storage options. For example, a local machine will be
the best option to store a data set, but a high-volume nature will require an on-prem
data center or cloud data center for large-volume data storage. Moreover, different
file systems are coming up to handle the distributed nature of data storage. In fact,
this is a vibrant segment and a lot more invention happening in every financial
quarter of the business cycle.
Stored data is required to be used in training DL/ML networks. For this there are
many methods and tool sets emerging. PyTorch, TensorFlow, etc. provide methods
to handle data set reading. But for large volumes of data set reading, there are
vendor-specific tool sets and services are emerging.
5.8 Audio Signal Processing 61
Load a file directly using the NumPy function loadtxt(). There are eight input
variables and one output variable (the last column). Once loaded we can split the
data set into input variables (X) and the output class variable (Y). Use the following
URL to get a source code which is useful to read CSV file. Refer to [45].
Text to speech synthesis (TTS) uses deep learning networks to synthesize high-
quality speech for a given speech:
1. Text is input.
2. Normalization.
3. Text preprocessing.
4. Phoneme (database of phoneme for a given word and also given language).
5. Acoustic model for given phoneme.
6. Speech waveform is output.
In the above, step 5 uses deep learning networks to synthesize speech. But other
steps from 1 to 4 provide processed data for step 5.
Speech data set creation requires Mel spectrogram computation for a given
phoneme.
Concatenation synthesis (words, syllables, diphones, or even individual phones),
statistical parametric synthesis (HMM), speech synthesis evaluation (MOS), and
speech synthesis with deep learning.
Problem 5.8.1 Let Y be input text sequence; target speech X can be derived by
where theta is the model parameter. Create a data set by using speech signal and
train deep learning network model such that trained deep learning network model is
used to estimate above X.
A neural vocoder achieves the encoding/decoding using a neural network. GAN-
based TTS and EATS (end-to-end adversarial text-to-speech by Deepmind). It
operates on pure text or raw phoneme sequences and produces raw waveform as
output [46–48].
62 5 Data Set Design and Data Labeling
IP stream data is stored in PCAP file. Overall, combining IP stream analysis with
deep learning can lead to more accurate and effective tools for network analysis and
security.
PCAP file-based data is required to transform into Tensor for TensorFlow and
also Tensor for PyTorch.
Refer to file [Link] in [49] to get information on flow capture tool set
tcpdump.
Figure 5.8 provides workflow in capturing data from the IP network.
Code in file [Link] in [50] is used to convert PCAP file in CSV
file, where the CSV file is used as a data set to train the deep learning network
model.
6.1 Insight
This section of the book primarily addresses the deep learning model designing and
development. Deep learning network is emerging as another tool set to model a
given physical process [51–53]. Observed data of a given physical process is used
in the design and development of deep learning networks. Probability distribution
for a given data set is associated with deep learning networks which represent a
given data set. A neural network is used to model Boltzmann machine, but training
Boltzmann machine is still an open problem. Thus, restricted Boltzmann machine
[54] is trending as a way ahead and the same is used in a model neural network,
convolutional neural network, etc. The mentioned restricted Boltzmann machine
uses Bayes network and data collection is required to support model parameters.
Innovation of CNN had resulted in providing a tool set to handle modeling of
observed data. Brooks–Iyengar algorithm [55, 56] provides methods and apparatus
to solve a special class of Boltzmann machine which is in line with multilayer
perceptron (MLP). The design of deep learning network uses NN, CNN, RNN, etc.
to model a network. The development of deep learning networks [57] requires to
train NN, CNN, RNN, etc. by using a data set. Finding a probability distribution
for a given data is defined as a computability problem in the sense of Kolmogorov
computability [58]. Back propagation is one class of algorithms that leads to sub-
optimal deep learning networks. Pre-trained deep learning networks become a
starting step to train a network with additional data set. Compression (quantization
of bias and weights, pruning) of a trained deep learning network also appears to be
critical for successful deployment of a trained deep learning network in a given IoT
native device or cloud native system. The abovementioned items are discussed in
this chapter, but still there is a scope to enhance with a lot more detail in Kolmogorov
complexity and also the use of Pontryagin duality [59–61] to handle Kolmogorov
complexity.
The main concern appears to be gathering existing data and utilizing deep learning networks
to learn a new capability. For example, the following list includes a few problems that have
good attraction in the research segment of a deep learning network model design:
Predicting the next value or kth value from the present value for a given input
sequence.
Weather forecasting: Given a sequence of observations about weather over time,
predict the expected weather tomorrow.
Stock market prediction: Given a sequence of movements of a security over time,
predict the next movement of the security.
Product recommendation: Given a sequence of past purchases of a customer,
predict the next purchase of a customer.
Predicting class label for a given input sequence. The input sequence may be
comprised of real values or discrete values.
DNA sequence classification: Given a DNA sequence of ACGT values, predict
whether the sequence codes for a coding or noncoding region.
Anomaly detection: Given a sequence of observations, predict whether the
sequence is anomalous or not.
6.3 Data and Probability Model 65
Generating a new output sequence that has the same general characteristics as other
sequences in the corpus.
Text generation: Given a corpus of text, such as the works of Shakespeare,
generate new sentences or paragraphs of text that read like Shakespeare.
Handwriting prediction: Given a corpus of handwriting examples, generate
handwriting for new phrases that have the properties of handwriting in the corpus.
Music generation: Given a corpus of examples of music, generate new musical
pieces that have the properties of the corpus. Image caption generation: Given an
image as input, generate a sequence of words that describe an image.
Stochastic models predict the output of an event by providing different choices (of
values of a random variable) and the probability of those choices.
If a distribution has unknown (not inferred yet) parameters, then it leads to a
family of distributions. Each value of the parameter is a different distribution. This
family is called a statistical model with parametrization. For example, Bernoulli,
binomial, exponential is a class of statistical model.
The term “probability model” (probabilistic model) is usually an alias for a
stochastic model [51]. Figure 6.1 provides a link between observed data and the
associated model.
1. Providing different choices (of values of a random variable)
2. Probability of those choices
Probability mass function is a function that gives the probability that a discrete
random variable is exactly equal to some value. A probability mass function differs
from a probability density function (PDF) in that the latter is associated with
continuous rather than discrete random variables.
The probability that the random variable y takes the value y given that random
variable x took the value x is the ratio of the probability that both events occur (y
takes the value y and x takes the value x) and the probability that x takes the value x
is
P (Y = y|X = x)
. P (Y = y, X = x) = (6.2)
P (X = x)
. S = x1 , x2 , . . . , xr [0, 1] (6.3)
Posterior, in this context, means after taking into account the relevant evidence
related to the particular case being examined. The posterior probability distribution
is the probability distribution of an unknown quantity, treated as a random variable,
conditional on the evidence obtained from an experiment or survey.
Figure 6.2 provides details on increasing complexity in probability models on the
right side and on the left side it provides details on models for inference. “k” trials
are required to obtain the first success in geometric distribution.
Bernoulli Trial
Binomial Experiment
Geometric distribution
A Bayesian network (also known
as a Bayes network, belief Poisson trials
network, or decision network)
Restricted Boltzmann
Machine
Boltzmann Machine
Bayesian networks are directed acyclic graphs whose nodes represent variables in
the Bayesian sense; they may be observable quantities, latent variables, unknown
parameters, or hypotheses [52]. Edges represent conditional dependencies; nodes
that are not connected (no path connects one node to another) represent variables
that are conditionally independent of each other. Each node is associated with a
probability function that takes, as input, a particular set of values for the node’s
parent variables and gives (as output) a probability distribution.
It is a probabilistic graphical model that represents a set of variables and their
conditional dependencies via a directed acyclic graph. Boltzmann distribution is a
probability distribution that gives the probability of the state as a function of the
state’s energy and a temperature of a system [53].
Gibbs sampling is applicable when the joint distribution is not known explicitly
or is difficult to sample from directly, but the conditional distribution of each
variable is known. The Gibbs sampling algorithm [62] generates an instance from
the distribution of each variable in turn, conditional on the current values of the
other variable. Gibbs sampling is particularly well adapted to the sampling posterior
distribution of a Bayesian network, since Bayesian networks are typically specified
as a collection of conditional distributions.
Maxwell–Boltzmann Statistics
The original derivation in 1860 by James Clerk Maxwell was an argument based on
molecular collisions of the kinetic theory of gases as well as certain symmetries in
the speed distribution function.
Maxwell also gave an early argument that these molecular collisions entail a
tendency towards equilibrium. After Maxwell, Ludwig Boltzmann in 1872 also
derived the distribution on mechanical grounds and argued that gases should over
time tend towards this distribution, due to collisions.
Maxwell later (1877) derived the distribution again under the framework of
statistical thermodynamics. Starting with the result known as Maxwell–Boltzmann
statistics (from statistical thermodynamics). Maxwell–Boltzmann statistics gives
the average number of particles found in a given single-particle microstate. Under
certain assumptions, the logarithm of the fraction of particles in a given microstate is
proportional to the ratio of the energy of that state to the temperature of the system:
Ni Ei
. − log ) ∝ (6.4)
N T
The assumptions in this equation are that the particles do not interact and that
they are classical.
Each particle’s state can be considered independently from the other particles’
states. Additionally, the particles are assumed to be in thermal equilibrium. This
6.4 Boltzmann Distribution 69
where
1. .Ni is the expected number of particles in the single-particle microstate i
2. N is the total number of particles in the system
3. .Ei is the energy of microstate i
DL Boltzmann Machine
d
v ise
er
s up ls Self Organising Maps
Un ode (SOMs), Autoencoders
m
distribution of each variable in turn, conditional on the current values of the other
variable.
Gibbs sampling is particularly well adapted to sampling the posterior distribution
of a Bayesian network, since Bayesian networks are typically specified as a
collection of conditional distributions.
Ei
e− kT
pi = (6.6)
N
.
Ej
− kT
j =1 e
In a sensor network, the term .T 6.6 is associated with the noise generation term
in a measurement process. If .T is .0, then the measurement is clean (which may not
be true in the real world of sensing).
A probabilistic graphical model represents a set of variables and their conditional
dependencies via a directed acyclic graph (DAG). Bayesian networks are directed
acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: They
may be observable quantities, latent variables, unknown parameters, or hypotheses.
Edges represent conditional dependencies; nodes that are not connected (no path
connects one node to another) represent variables that are conditionally independent
of each other. Each node is associated with a probability function that takes, as
input, a particular set of values for the node’s parent variables and gives (as output)
the probability (or probability distribution, if applicable) of the variable represented
by the node. Gibbs sampling is applicable when the joint distribution is not known
explicitly or is difficult to sample from directly, but the conditional distribution of
each variable is known and is easy (or at least, easier) to sample from. The Gibbs
sampling algorithm generates an instance from the distribution of each variable in
turn, conditional on the current values of the other variable.
Gibbs sampling in Fig. 6.5 is particularly well adapted to sampling the posterior
distribution of a Bayesian network, since Bayesian networks are typically specified
as a collection of conditional distributions. Given an input vector v, we are using
p(h|v) for the prediction of the hidden values h. Knowing the hidden values we
use .p(v|h) for the prediction of new input values v. This process is repeated k
times. After k iterations we obtain another input vector .vk which was recreated from
original input values .v0 , the specified multivariate probability distribution, where .Ei
is the probability of a certain state of our system .pi and N is the number of sensors
in a given sensor network.
Particles which are regulated by Maxwell–Boltzmann statistics have to be
distinguishable from each other and one energy state can be occupied by two or
more particles. Reconstruction is different from regression or classification.
Reconstruction estimates the probability distribution of the original input instead
of associating a continuous/discrete value to an input example. Gibbs sampling is
particularly well adapted to sampling the posterior distribution of a Bayesian net-
output
input v1 h1
v1 h1
v2 v2
T T
v3 h2 v3 h2
v4 h3 v4 h3
input time
input time
output time
Δt
Fig. 6.6 Training Boltzmann machine
6.4 Boltzmann Distribution 73
E1
output E2
v1
v2
E3 input
v3 E
output
E4
v4
Δt
E = E 1 + E2 + E3 + E4
Fig. 6.7 Inference by using Boltzmann machine
Step 2 (output from Boltzmann machine): Measure each state that is supporting
measurable conditions. The maximum possible energy of the above system
will provide provision to quantize energy levels.
The output energy E in Fig. 6.7 is a combination of all four energies from each
node v. Each energy level provides a possible state of each node. But in the above E
there is no contribution from hidden nodes but there is indirect contribution which
needs to be estimated by using a model of dynamical system [51].
Input vector .v is used to find .p(h|v) to predict hidden values h. Knowing the
hidden values, use .p(v|h) for the prediction of new input values v. This process
is repeated k times. After .k iterations, input vector .k is recreated from the original
input value .v0 . Target states include all possible states of the sensor.
Distribution is associated with a model and it is shown in Fig. 6.8. For a given
input data set, find a distribution and it is equal to finding a model.
74 6 Model of Deep Learning Networks
Boltzmann machine reduces to the Hopfield model. Figure 6.9 provides details on
the mentioned relationship between Boltzmann machine and Hopfield network. The
Boltzmann network energy level is a function of temperature. If temperature is high,
then energy also will be high in the Boltzmann network. If T .= 0 (temperature), then
the Boltzmann network reaches an energy level which is in equilibrium (energy level
need not be zero). In a sense at T .= 0, Boltzmann network becomes a deterministic
network. In particular, Boltzmann network becomes Hopfield network, because
Hopfield is having Lyapunov function which can be considered as a constraint (as it
comes from energy). In the case of MLP, there is no Lyapunov function and thus no
constraint as well. The BI algorithm is closer to multilayered perceptrons (MLPs),
because the BI algorithm is deterministic and does not have Lyapunov function.
Training the MLP network, the back propagation algorithm is used.
The BI algorithm is similar to back propagation to arrive at convergence in
node value. The Brooks–Iyengar algorithm performs the following: Every node is
given its input and also average values from other nodes (average over T). Nodes
jointly provide deterministic output. In the above, it is clear that no Lyapunov or
temperature is used in BI, and thus, BI is a special case of Hopfield network where
6.7 Kolmogorov Complexity for a Given Data 75
Kolmogorov complexity has its roots in probability theory [58], information theory,
and philosophical notions of randomness. Idea is intimately related to problems
in both probability theory and information theory. Kolmogorov complexity is the
length of the shortest binary program from which the object can be effectively
reconstructed.
Combining concepts of computability and statistics, we can express the complex-
ity of a finite object in terms of Kolmogorov complexity. Kolmogorov complexity
represents the length of the shortest computer program (algorithm) that can produce
the object as output. This complexity measure takes into account both the computa-
tional aspects (computability) and the statistical aspects (probability) of describing
the object. In essence, it quantifies the minimum amount of information needed to
generate the object using a universal Turing machine or a similar computational
model. It may be called the algorithmic information content of a given object: What
is the shortest binary representation of a program from which a parameter can be
reconstructed by using N-r sensors, w here N is the number of sensors used in a
sensor network and r is the number of faulty sensors? Output s is observed from
Turing machine T, where p is a program in T that outputs s and K T (s) is used
to detect regularities of a given sensor data in order to find new information from
a given sensor. For example, expression K is computable if and only if there is an
76 6 Model of Deep Learning Networks
effective procedure that, given any (k-tuple) x of natural numbers, will produce the
value f (x). f : Nk .→ R. In agreement with this definition, computable functions take
finitely many natural numbers as arguments and produce a value which is a single
natural number.
Construction of dual space of given sensor network G is the first step in getting
the shortest binary representation of a program. In Sect. 6.2, there are illustrations
that provide steps to construct dual space. The function f definition is key in the
construction of dual space. But f definition needs to have physical relevance to the
measurement process which is using an N-r good sensor. To measure the same, the
first dual space of G is constructed. And Kolmogorov complexity of G is estimated.
The method mentioned above for measuring the entropy of a given G indirectly
relies on the Kolmogorov complexity of G. It employs a well-defined result from
Pontryagin duality and utilizes the Kolmogorov complexity outcome as part of its
approach.
Problem 6.8.1 The NN or CNN model is used in deep learning networks. Optimal
model design requires many items to consider and arrive at parameter value. In the
following list, locate items that are used in deep learning model design:
(a) Kernel size option is given to user.
(b) Number of layer option is given to user.
(c) Each layer user can provide number of neurons.
(d) User can provide label on each image file.
(e) User can drop some of the connection in between layers.
(f) All of the above are true but item (e) and (d).
(g) All of the above are true but item (e).
This is the point where restricted Boltzmann machines meet physics for the
second time. The joint distribution is known in physics as the Boltzmann distribution
which gives the probability that a particle can be observed in the state with the
energy E. As in physics we assign a probability to observe a state of v and h, which
depends on the overall energy of the model. Unfortunately, it is very difficult to
calculate the joint probability due to the huge number of possible combinations
of v and h in the partition function Z. Much easier is the calculation A.3 of the
conditional probabilities of state h given the state v and conditional probabilities of
state v given the state h and so on. The essential is here, energy-based probability.
Global energy
E= wij si sj + θi si
i<j i
. (6.7)
s1 = v1 , s2 = v2 , s3 = v3 , s4 = v4 , s5 = h1
s6 = h3 , s7 = h3
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton created a “large, deep
convolutional neural network” (CNN) that was used to win the 2012 ILSVRC
(ImageNet Large-Scale Visual Recognition Challenge).
DL network developers focus on designing models with a reduced number of
parameters in the CNN model, thus reducing memory and execution latency while
aiming to preserve high accuracy.
“One of the most interesting features of machine learning is that it is on the
boundary of several different academic disciplines, principally computer science,
statistics, mathematics, and engineering. Machine learning is usually studied as part
of artificial intelligence, which puts it firmly into computer science. Understanding
why these algorithms work requires a certain amount of statistical and mathematical
sophistication that is often missing from computer science undergraduates.” It
appears that the convolutional neural network is a very new and yet proven tool to
model a given physical process as long as the given physical process can be captured
in the form of images or in the forming of video.
Problem 6.8.2 Error in “Image classification in Deep Learning Network Model
based method” is less compared to a human being or compared to ML-based image
classification methods. List items in the following that is useful in the mentioned
reduction in error:
(a) Deep learning network model training methods are using CNN.
(b) Deep learning network model training methods are using NN.
(c) Deep learning network model training methods are using CPU .+ GPU for
training.
(d) Deep learning network model training methods do not require feature vector.
(e) Deep learning network model training methods use too many kernel filters to
learn feature vector.
Brooks–Iyengar Algorithm [55] is very similar to MLP [56, 64]. The same is shown
in a flowchart by using Fig. 6.11.
Each sensor [65] has an energy level at a given time period. The energy level of
other sensors is also expected to have energy in a similar range. However, it is not
expected to have too much of a difference in energy level from sensor to sensor.
Sensor fusion using the BI algorithm 6.11 is using a processing element (PE) to
compute the accuracy range and also the measured value estimation. Let sensor .j
be used .t sec duration to record .k samples. And also let sensor .j receive measured
values from other sensors in a given network.
The PE of a given sensor .j is using:
1. Recorded k samples (0 to .t sec) in sensor .j
2. Measured values from other sensors from 1 to N but not sensor j
6.9 Brooks–Iyengar Algorithm for Binary Classification 79
But there is increasing interest in whether the biological brain follows back
propagation or, as Hinton asks, whether it has some other way (instead of back
propagation) of getting the gradients needed to adjust the weights on its connections.
A pre-trained model is a deep learning model that has already been trained on a
large data set and saved. The saved model can be used as a starting point for training
new models, or it can be used directly for making predictions on new data [66–71].
Pre-trained models have become popular in deep learning due to their ability
to save time and computational resources. Instead of training a new model from
scratch, developers can use a pre-trained model as a starting point and fine-tune it
on their own data set. This can be especially useful when the data set is small or
when computational resources are limited.
Pre-trained models are often trained on large and diverse data sets, such as
ImageNet for image classification and BERT for natural language processing. These
models are usually trained using deep learning techniques such as convolutional
neural networks (CNNs) or recurrent neural networks (RNNs).
Using pre-trained models has many benefits, such as the following:
Reduced training time: Using a pre-trained model can significantly reduce the
time required to train a new model from scratch.
Improved accuracy: Pre-trained models are often trained on large and diverse
data sets, which can improve the accuracy of the model.
Transfer learning: Pre-trained models can be used for transfer learning, where
the model is fine-tuned on a smaller data set for a specific task, such as object
recognition.
Overall, pre-trained models have become an important tool in the deep learning
toolbox, allowing developers to leverage existing models and data sets to solve new
problems more efficiently.
Post-training quantization reduces computing power demand and energy con-
sumption at the expense of a slight loss in accuracy.
With sophisticated pre-training objectives with huge model parameters, large-
scale PTMs are effectively capturing knowledge from massive labeled and unlabeled
data. Knowledge is stored into huge parameters and also fine-tuning process used on
specific tasks such that precision inference is possible. Rich knowledge implicitly
encoded in huge parameters can benefit a variety of tasks in industries such as
agriculture, healthcare, transport, food, education, etc. In the recent past, the same
has been extensively demonstrated via experimental verification and empirical
analysis.
Get results sooner by using pre-trained models and scripts are used more in
translating effort into better results sooner over by “do it yourself.” Large-scale pre-
trained models (PTMs) such as BERT and GPT are used in cloud native deployment
and still these models are not very popular in embedded devices. Recently, use
6.11 Compression of DL Networks 81
of pre-trained models has been achieving great success and become an attractive
milestone in the field of artificial intelligence (AI) for enterprise business owners.
It is now the consensus of the deep learning community to adopt PTMs as
backbone for well-defined tasks rather than develop learning models from scratch.
Deploying pre-trained models is discussed and also examples are given in tutorial
sessions.
Model compression allows the user to run the model on tiny devices and there are
two main ways to reduce the network:
1. Lower precision (fewer bits per weight). By default, the model weights are
float32-type variables, which lead to two problems: Firstly, the model is very
large because 4 bytes are associated with each weight, with a considerable
memory requirement; secondly, the execution is remarkably slow compared to
uint8-type variables. It is possible to considerably reduce the weights from 32
bits to 8 bits, obtaining a 4x reduction in the size of the NN. TensorFlow and
Keras give the possibility to apply quantization.
2. Fewer weights (pruning). This involves creating a smaller DNN that imitates the
behavior of larger DL networks. This is done by training the smaller DL networks
using the output predictions produced from the larger one and the smaller DL
networks approximate the function learned by the larger one.
Note that post-quantization is a technique that is carried out after training the
model, but it could be done even before training. DLtrain can be used efficiently to
model the number of bits required for weights in a given CNN or NN.
As stated above, the reduction of the model size can be obtained not only
with quantization, but also with pruning techniques that allow the elimination
of connections that are not useful to the NN. This leads to a decrease of the
computation request and program memory. Quantization and pruning approaches
have been considered individually as well as jointly.
Chapter 7
Training of Deep Learning Networks
Insight
This section of the book details training deep learning networks.
PyTorch is one of the popular AI frameworks to model deep learning networks and
also train, test, and deploy deep learning networks.
TensorFlow is another AI framework that has similar functionalities to PyTorch.
These two mentioned AI frameworks provide low-code options for developers.
However, major limitations arise from its dependency from many other open-source
packages in Python.
DLtrain overcomes the above issues and provides a clean AI framework which
can be classified as no-code category. DLtrain is developed by using C.++ and it
is easy to port on to many platforms across silicon vendors. Moreover, DLtrain is
GPU-friendly and it can be revised for large-scale CUDA Core machines like DGX
Station A100 or higher versions. Further, it also demonstrates and showcases how to
create, build, and configure Docker images of DLtrain for large-scale CNN models
and also including the support services [72].
experts to learn quickly and take good control of training of a deep learning model
with training data set.
An image classification problem is solved by using deep learning networks and in
particular by using convolutional neural networks. The industrial segment appears
to be using CNN for image classification with enterprise quality in inference.
Being an early-stage tool set, many of them moved away from regular use and
only few of the tool set stayed back, for example, PyTorch and TensorFlow are
those two to stay in regular use. Most interestingly, these two appear to be using too
many open-source packages with version-specific nature in its functional use. The
industry segment looks for a tool set which can be customized and also free from
dependencies on open-source packages.
DLtrain is a platform designed to work to train NN and CNN models by using
image signals as a training data set. DLtrain is created by using the “nvcc” tool set
from CUDA 10.2 (NVIDIA) and DLtrain is tested in IBM Power AC922 processor
with GeForce RTX 2070 GPU hardware. DLtrain works well for a given training
image data set and performs high-speed classification of a given image during
inference.
DLtrain as in Fig. 7.1 is designed to remove most of the issues because of too
many dependencies in open-source packages to support PyTorch and TensorFlow.
DLtrain provides a good solution to developers to perform training of deep learning
networks, test, and deploy given NN and CNN models. Deployment can be on the
cloud native side and also in edge native devices such as IoT nodes and IoT edges.
In Fig. 7.1, the left side shows a dependency list for the WML (Watson Machine
Learning) tool set to train deep learning networks.
The right side of the above picture shows the clean nature of DLtrain on its zero
dependency from open-source tool sets. DLtrain is capable of training CNN and NN
models of deep learning networks. In the case of WML, it appears that there are 100
plus dependencies to have successful installation of WML. But on the right side,
DLtrain does not have any dependency.
CUDA SDK is required for DLtrain to use NVIDIA CUDA Core and Tensor
Cores in GPUs such as V100, A100, etc. The enterprise application development
team can use CUDA Core and Tensor Core computing as part of their customized
tool set to train deep learning networks.
Watson Machine Learning Accelerator gives access to power-optimized versions
of all of the most popular deep learning frameworks currently available, including
TensorFlow, Caffe, and PyTorch. Watson Machine Learning Accelerator runs on
IBM Power-accelerated server HPC, a platform that runs not only on deep learning
training workloads but also on a wide variety of HPC and high-performance
data analytic workloads. It leverages unique capabilities of accelerated power
servers, delivering performance unattainable on commodity servers. For example,
a large model support facilitates the use of system memory such that there is
no performance impact in POWER9 CPU, yielding significantly larger and more
accurate deep learning models. The Watson Machine Learning Community Edition
(WML CE) is delivered as a set of software packages that can deploy a functioning
deep learning environment, potentially within a few hours by using a few simple
commands.
The DLtrain framework is ported on to POWER9 CPU with Ubuntu 22.04 OS.
DLtrain enables enterprise and academic researchers with ease of training their deep
learning network models such as NN and CNN. Most importantly, they can follow
the no coding path while using DLtrain to train deep learning network models.
Moreover, DLtrain does not use any third-party library, and thus, it is fully secured
and safe for enterprise and academic researchers to use DLtrain to run their AI
workloads.
DLtrain provides an inference engine which can be deployed in IoT edges.
Currently, we are witnessing a proliferation of specialized hardware that not only
offers better performance on deep learning tasks, but also increased efficiency
(performance per watt). The AI community’s demand for GPUs led to Google’s
development of TPUs and pushed the entire chip market towards more specialized
products. In the next few years, there will be a vendor list that includes NVIDIA,
Intel, SambaNova, Mythic, Graphcore, Cerebras, and other companies that bring
more focus to hardware for deep learning network-based training and inference
workloads.
“Bring Your Data on to Your Table” to perform training of the deep learning
model and also to deploy for your enterprise. In this process, the data set stays
within the customer premise and also it provides high security to the customer data
set.
86 7 Training of Deep Learning Networks
DLtrain is used to train deep learning models such as NN and CNN by using a
computing infrastructure that is available on your table. DLtrain provides a quick
solution for the abovementioned by using OpenPOWER/IBM Power Systems that
form a basis for “computing infra on your table.”
As previously mentioned, the computing infrastructure setup will be completed
in just a few hours. Subsequently, the development team can seamlessly deploy the
deep learning network model training workload onto this infrastructure. This entire
process is carried out adhering to the highest engineering standards, ensuring that
there are no external dependencies on obscure or untraceable software components
from open-source origins. DLtrain is developed by using C and C.++ such that it
can run best in the given CPU of various silicon vendors.
Most importantly, effort is given to make DLtrain very useful to subject matter
experts (domain knowledge holders) to bring their best via their own custom model
without doing a single line of coding.
DLtrain also provides provision to run the trained model in the above and move
to an Android phone such that large-scale deployment is feasible. After moving the
trained model to an Android phone, the application is designed to use phone camera
or local files to get the input image to perform inference in the phone locally. There
is no need to connect the camera with cloud for inferencing.
DLtrain is ported onto various silicons and the following provide more details.
DLtrain
DLtrain is ported onto many CPU and GPU combinations. For example:
1. Ported DLtrain to work in X86 with Ubuntu and also Windows 10 OS.
Tested DLtrain with training of CNN model by using the MNIST data set.
2. Ported DLtrain to work in OpenPOWER Raptor system (POWER9 CPU).
Tested DLtrain with training of a CNN model by using the MNIST data
set.
3. Ported DLtrain to work in the OpenPOWER Raptor system (POWER 9
CPU) and RTX 2070 GPU. Tested DLtrain with training of the CNN model
by using the MNIST data set.
4. Ported DLtrain to work in X86 with Windows OS. Tested DLtrain with
inference workload by using a trained CNN model and using input image
from local machine.
5. Ported DLtrain to work in Jetson Series SOMs(for example, Nano). Tested
DLtrain with inference workload by using a trained CNN model and using
input an image from the local machine.
6. Deployed trained CNN model in Android phone and successfully inference
is performed on a given local image.
Data set preprocessing is one of the most important tasks. In Fig. 7.2, the prepro-
cessing flow is provided for DLtrain, TensorFlow, and PyTorch.
Input to TensorFlow and PyTorch is Tensors; most importantly, input to Tensor-
Flow and PyTorch is not Numpy Arrays. Added to that, input Tensor to TensorFlow
is very different from input Tensor to PyTorch.
The amount of data copy and conversion effort is required for the abovemen-
tioned conversion of data from a given file to input to TensorFlow and PyTorch. In
fact, the same is highly challenging for huge data sets.
DLtrain takes the input file name of the data set. Read data from file (for example,
image) and copy to input array which is directly used by the next module to perform
training deep learning network model.
DLtrain is highly efficient in reducing the movement of data from memory to
memory. DLtrain is good for large-scale models and also for huge data sets.
Developers are required to design their own custom model in the form of CNN or
in the form of NN.
For example, the NN model requires:
1. The number of neuron in the input layer
2. The number of hidden layer
3. The number of neurons on each hidden layer
4. The number of neurons in the output layer
5. The kernel size
6. The number of kernel
The abovementioned parameters can be stored in a txt file which can have a file
name as well. Designing a CNN model or NN model requires no coding. The above
given reference provides a sample value for listed parameters.
For example,
-c network._config.txt
is a file name and the same file has information about CNN.
1. The reference code is provided in [74] and the same is used to train the CNN and
NN network.
2. The reference code is provided in [75] and the same is used to train the C NN
network.
The reference code is provided in [76] and the same is used to save the trained CNN
or NN networks in a file. Section 2 of [76] handles saving for the DLtrain framework
and Section 1 of [76] handles saving for the TensorFLow framework.
The source code for DLtrain is provided in [77]. [Link] is also given in
[77].
Developers are required to use the “cmake” tool set to build Makefile. After the
successful creation of Makefile, developers were required to use “make” to create
an executable version of DLtrain for POWER9 servers. Mentioning these two steps,
cmake and make are shown in [77].
The MNSIT data set is used. The version of DLtrain given in [77] handles MNIST
data set efficiently and makes use of the same in training CNN.
Hyperparameters are available for developers to choose the optimal value for a
given parameter. For example, the following parameters are available for developers:
7.4 Docker Image of DLtrain for X86 with Ubuntu 91
1 . / D L t r a i n −m t r a i n −s NewNetwork . d a t −c n e t w o r k P r o f . t x t −n
2000 −e 30 −d / home / j k / I m a g e s /
1 . / D L t r a i n −m i n f e r −s NewNetwork . d a t −c n e t w o r k P r o p . t x t −f
img . raw
Where
1. .−c is input. File name which has parameters of the model.
2. .−d is input. Data set folder path.
3. .−m is input. It is for training (this can have train or infer as a string).
5. .−f is the name of the input file which is used for inference.
Developers are required to refer to [77] for more information on inference work.
Docker is an open-source container engine and a set of tools to compose, build, ship,
and run distributed applications.
The reference code is provided for the following [78].
A drawback with this multi-platform support is that one Docker image has to be
built for each specific target platform:
1. A specific operating system
2. Hardware architecture(x86, ppc64el, arm, CUDA Core, Tensor Core, DSP, etc.)
92 7 Training of Deep Learning Networks
Developers are required to create two Docker images, one for Linux and one
for Windows. Developers are required to create each Docker image using a Docker
engine running on the specific target platform.
Few commands (to manage Docker) are provided in the above reference and the
same commands are useful to manage workflow to create the DLtrain Docker image
and also use the DLtrain Docker image to perform training of the CNN network and
perform inference on a given input by using trained CNN.
DLtrain is built for Windows machines and also the same is available for use in the
following reference in GitHub.
The reference code is provided for the following [79].
There is an issue with the runtime library in the Windows machine. Steps are
given in the above reference to obtain the missing library in Windows machine to
run DLtrain successfully. The LibGCC library part creates the above issue and the
same is resolved by downloading those two files and keeping it in the path or project
folder.
DLtrain executable (for Windows OS with X86) is used to train NN or CNN
models.
The data set is placed in the path or project folder.
Developers are required to model in a file, for example, “Network._prop.txt” is a
file which can be used as input.
Output is stored as [Link] and the same file includes parameters of .W
and b.
DLtrain executable (for Windows OS with X86) is used to train the NN or CNN
model by using a data set in the path or project folder. Figure 7.4 provides the
necessary workflow to use DLtrain in Windows machine.
Fully Connected
Random network model to small-world model.
Step 1. Let us assume the “given CNN is a random network” before starting
training CNN (in the place of CNN, NN can be used as well).
Step 2. After training, there is a high probability that CNN will tend towards
the small-world network.
The abovementioned process from step 1 to step 2 indicates that the random
network becomes a small-world network after training. Parallel computing of step 1
to step 2 is challenging.
Suppose you start with a small-world network (many small networks of a given
big network) and provide all input to each small network during training.
Fully Connected
Random network model to small-world model.
R-Step 1. Train small-world networks with all or most of the given input.
Perform for all small-world networks.
R-Step 2. Combine all small-world models to obtain a big model which can
handle all given inputs and provide inference for defined labels.
The above given revised steps appear to be good for parallel computing and
also represent a large model. The above can be verified or it can be worked out
independently. But the key challenge is still open on “training large models” by
using distributed deep learning networks.
7.7 Train NN and CNN Models in TensorFlow 95
Small-world networks have direct mapping with influence matrix. Issues with
influence matrix are not known a priori. Maybe the course computation of the
influence matrix is very useful to start a small-world network and its training by
using a parallel computing infrastructure.
Using TensorFlow requires a particular tool chain in a given computer and also
compatible versions of open-source software. In this regard detailed work is
provided in the GitHub page and developers can use the same (or vary if necessary)
to set up a working version of tool chain for TensorFlow.
The reference code is provided for the following in [80].
A virtual environment is recommended for a given project. In some sense it
is a lightweight version of a Docker image running environment like Container.
Having different versions of packages is possible if each project has its own virtual
environment.
TensorFlow 2.0 or above is recommended for new developers. In case developers
are required to support the old model (version before TF 2.0), then it will be good to
convert the old TF model into the TF 2.0 model. Keras is tightly integrated with TF
2.0 onwards, and thus, it is easy to use Keras to train a NN or CNN model by using
TensorFlow, where Keras is the layer above TensorFlow and it is making workflow
easy for developers.
Jupyter Notebook is recommended and, along with Jupyter Lab, also will help
developers during debugging time of the application development process. All these
mentioned work well in Python 3.6 or above.
The MNIST data set has handwritten images of numbers from 0 to 9. A large number
of images are created for each number. It has 60,000 images.
The reference code is provided for the following in [81].
Developers can use the MNIST data set during the early level of the project
and then move on to the custom data set of a given project. However, there is a
need to arrive at a data set size in terms of the number of images per label, number
of pixels per image, image width, and image height. These mentioned parameters
required critical revision because it contributes to the quality of inference of a given
application.
The above example uses the MNIST data set locally, but it can be downloaded
from multiple URL locations. Details are shared in the above reference link.
96 7 Training of Deep Learning Networks
Developers are required to partition the data set into two parts. The first part is
for the training NN or CNN model. The second part is to test the training NN and
CNN model.
For inference, developers can use a real-time image or image from file to perform
inference by using the training NN or CNN model.
Problem 7.7.1 Training the NN or CNN model by using the MNIST data set uses
a well-defined process. For example, import data set from file and transform data
into Tensor, in particular Tensor which can be used as a input to train the model via
TensorFlow training. List items from the following to perform successful use of a
MNIST data set to a train NN or CNN model by using TensorFlow:
(a) CPU is used in training workload.
(b) GPU is used in training workload.
(c) The number of epochs is set to the number which is above 100.
(d) The number of layers is taken from the “model configuration file” and used in
constructing the CNN model for training.
(e) Item (a) is always true.
(f) Item (b) may be true sometimes.
(g) Item (c) may be correct. In case yes, what will be number set to epochs?
Colab provides an option to load the data set from various sources. Cloud annota-
tions focus on the data set creation aspect of the model development life cycle and
leaving the training part to other tool sets. For example, use TensorFlow in Colab to
train CNN.
There are many ways to train NN and CNN models, each with their own use
cases and trade-offs. Developers can train from scratch using a framework like
TensorFlow or PyTorch.
Use the following references to get the source code and sample examples to use
Colab to train deep learning networks (Reference [82] and [83]).
The source code for DLtrain is provided in [84]. [Link] is also given in
[84].
Developers are required to use the “cmake” tool set to build Makefile. After the
successful creation of Makefile, developers are required to use “make” to create an
7.8 DLtrain for Jetson Nano Series SOM 97
executable version of DLtrain for POWER9 servers. Mentioning these two steps,
cmake and make are shown in Section 3 of [84].
7.9.2 Jetson Nano Series SOM The MNIST data set is used. The version of DLtrain
given in [84] handles the MNIST data set efficiently and makes use of the same
in training CNN. Developers are required to refer to Section 4 of [84] to get more
details on the “use of DLtrain to train CNN by using Jetson Nano.”
Hyperparameters are available for developers to choose the optimal value for a
given parameter. For example, the following parameters are available for developers:
1 . / D L t r a i n −m t r a i n −s NewNetwork . d a t −c n e t w o r k P r o f . t x t −n
2000 −e 30 −d / home / j k / I m a g e s /
The inference workload is run in Jetson Nano SOM, as given in the following.
Developers are required to refer to [84] for more information on inference work.
1 . / D L t r a i n −m i n f e r −s NewNetwork . d a t −c n e t w o r k P r o p . t x t −f
img . raw
Where
1. .−c is input. File name which has parameters of the model.
2. .−d is input. Data set folder path.
3. .−m is input. It is for training (this can have train or infer as a string).
4. .−s is output. It is the file name in which the trained model is saved.
5. .−f is the name of the input file which is used for inference.
Chapter 8
Deployment of Deep Learning Networks
8.1 Insight
In recent years, embedded systems started gaining popularity in the AI field. Due
to the transition of the AI and deep learning revolution from software to hardware,
embedded systems are now equipped with plug-in SOMs (System-on-Modules) that
incorporate essential components such as processors, memory, power supply, and
external interfaces. Since an embedded system is dedicated to specific tasks, design
engineers can optimize it for a given workflow and reduce the size and cost of
the product and enhance reliability and performance. They are commonly found
in consumer, cooking, industrial, automotive, medical, commercial, and military
applications.
The surge on the Internet and data has led to advanced deep learning systems, and
hence, the book also presents techniques for Internet of Things IoT in association
with deep learning networks. This section discusses and reveals the computing
infrastructure that sits on the edge of a network. More importantly in this section,
the chapter reveals the best deployment of deep learning network on IoT edge
devices and reveals the benefits of the implementation. The core areas addressed
here are how to reduce the latency, enhance the security, and communicate with less
bandwidth by deploying deep learning networks. Further, the chapter demonstrates
and details a comprehensive way to set up, install, compile, run, test, and deploy
different IoT edge devices. Through this chapter readers also understand and
gain strong learning in event data collection, flow data collection, vulnerability
assessment, network analysis, packet inspection, android deployment diagnosis, and
neural data communication with android services.
At the higher side, in this section the book presents how to set up and run the
IBM Watson Visual Recognition service in an Android device and associated visual
recognition application services. Further, deep learning network model pruning
and optimization, joint probability weight quantizer, and edge compilers are also
discussed. The chapter enumerates case studies on agriculture connected to IoT and
deep learning networks for reader understanding.
8.2 Description
Deep learning networks and the Internet of Things (IoT) edge are interconnected in
various ways [85–88]. IoT edge refers to the computing infrastructure that sits at the
edge of a network, close to the devices that generate data. These devices can include
sensors, cameras, and other data sources that produce massive amounts of data.
Deep learning networks can be deployed on IoT edge devices to process and
analyze this data in real time, allowing for faster decision-making and more efficient
use of resources. This is especially important in applications such as smart cities,
autonomous vehicles, and industrial automation, where real-time processing and
decision-making are critical.
The deployment of deep learning networks on IoT edge devices has several
advantages, including the following:
1. Reduced latency: By processing data at the edge, deep learning networks can
reduce the latency associated with sending data to a remote data center or cloud.
This is important in applications where real-time processing is critical, such as
in autonomous vehicles.
2. Improved security: By processing data at the edge, deep learning networks can
help reduce the risk of data breaches and ensure that sensitive data is kept secure.
3. Reduced bandwidth: By processing data at the edge, deep learning networks can
help reduce the amount of data that needs to be transmitted to the cloud, reducing
bandwidth requirements and associated costs.
Overall, the connection between deep learning networks and IoT edge is
crucial in enabling real-time decision-making and efficient use of resources in IoT
applications.
Intelligence in IoT edge is playing a critical role in services that require real-time
inferencing. Historically, there have been systems with a high amount of engineering
complexity in terms of deployment and also in operation. For example, SCADA is
one such system that has been working in the power generation industry, oil and
gas industry, cement factories, etc. In fact, SCADA includes humans in a loop and
makes it as supervisory control and data acquisition.
In the advent of deep learning and its success in the modern digital side, there
have been huge amounts of interest among researchers to carry deep learning models
to the abovementioned industrial verticals and trying to bring up intelligent control
and data acquisition. In the place of a supervisor, it appears that an intelligent IoT
edge is coming up to perform those tasks that are handled by human beings in
the form of a supervisor. Thus, there is immense interest in making IoT edge as
intelligent systems in these core engineering verticals apart from consumer industry
requirements.
8.2 Description 101
CNN is one particular class of deep learning networks. After training CNN, it
is necessary to deploy CNN in a machine such that that inference work can be
performed on a given set of input data. Inference work can be image classification,
or object detection or sequence to sequence translation. Edge devices might have
any one of these following combinations to perform computation:
1. CPU
2. CPU+GPU
3. CPU+FPGA
4. CPU+GPU+FPGA
Variables in IoT edge are shown in Fig. 8.1. Trained DL networks might be
modified to fit in a given computing capability of IoT edge.
Emerging trend shows that “embedded devices” also have GPU along with
multicore CPU. Some OEM devices appear to be including FPGA as well. Thus,
the challenge is to port trained deep learning networks on to embedded devices and
run inference service applications.
Deployment of “trained CNN model in X86 machine” requires many items for
successful completion. The list given in the following has items that might be
essential to complete CNN deployment in X86 (CPU) machines.
Problem 8.2.1 Identify necessary items in the following list for successfully
porting CNN model on to X86 CPU processor. And also provide reason for selecting
a particular item as a part of essentials for deployment of CNN on X86 CPU.
(a) Ubuntu 22.04 OS-based devices.
(b) Python is not required in deployment devices.
(c) TensorFlow is not required in deployment devices.
102 8 Deployment of Deep Learning Networks
(a) The CNN or NN model is in Python and it is not ready to be deployed in IoT
edge.
(b) The CNN or NN model is in TensorFlow and it is not ready to deploy in IoT
edge.
(c) The trained CNN model might have floating point weights and bias coefficients.
(d) The trained CNN or NN models might have too many neurons and embedded
devices might not have resources for all neurons.
(e) IoT edge technology is very different from deep learning-based inference
technology.
(f) All the above are true.
(g) Other.
DLtrain designed to support custom models by using NN and CNN. Figure 8.2
provides a detailed workflow to deploy NN in IoT edge. DLtrain is used to train a
(CNN and NN) model with training data and validate trained (CNN and NN) models
before use in deployment in IoT edge.
In the case of deployment, there is a huge interest in making smartphones as IoT
edge such that the same device can be used without much investment during the
learning time of each learner. However, industrial deployment is expected to happen
in devices like Jetson Series GPUs, zynq ultrascale+s FPGA, mmWave Radar, etc.
The DLtrain inference source code is in C, C++.
The DLtrain inference source code is open for developers for further value
addition on the same.
Problem 8.2.4 Provide your thought process to find a method and apparatus to port
DL networks in an embedded device by using DLtrain.
REST API appears to be a method in which the client can communicate with the
inference engine which is performing inference for a given input data, where input
data comes from client applications which are written in different languages.
Intelligence IoT edge plays a critical role in services that are required for real-
time inferencing. Historically, there have been such systems with high amounts of
engineering complexity in terms of deployment and also in operation. For example,
SCADA is one such system that has been working in the power generation, oil and
gas industry, cement factories, etc. In fact, SCADA includes humans in a loop and
makes it as supervisory control and data acquisition.
Figure 8.3 provides healthcare support workers vs. use of Watson in healthcare.
During pandemic time there is a need to manage a given bed, and there is a need
to have supporting healthcare staff. The following appears to be a valid issue among
the healthcare team.
At the forefront of the battle are healthcare workers who have been struggling to cope both
physically and mentally
The advent of deep learning and its success in the modern digital side have been
igniting a huge amount of interest among researchers to carry deep learning models
to the abovementioned industrial verticals and try to bring off intelligent control and
data acquisition.
In the place of a supervisor, it appears that intelligent IoT edges are coming up to
perform those tasks that are handled by human beings in the form of a supervisor.
106 8 Deployment of Deep Learning Networks
Thus, there is immense interest in making IoT edge as intelligent systems in these
core engineering verticals apart from consumer industry requirements. Kalman
filter has been there for 50 plus years. Moreover, it provides instant prediction
with local measurement data. Figure A.4 provides workflow in IoT edge to perform
inference by using CPU along with GPU.
1. Create an NN model.
2. Training data (mostly use MNIST).
3. Train an NN model.
4. Validate trained NN model.
5. Go for deployment.
Deep learning is good and it will outperform results that are obtained by
using Kalman filter
In the case of deployment, there is a huge interest in making a smartphone as IoT
edge such that the same device can be used without much investment during pilot
deployment time. However, industrial deployment is expected to happen in devices
like Jetson Nano, Ultra96-V2, etc.
Problem 8.2.5 Deployment of a trained CNN model in X86 machine has a well-
defined workflow for successful deployment. List items in the following such that
they are used in successful deployment:
(a) Ubuntu 18.04 OS is used in a deployment machine which is X86.
(b) Python is not required in a deployment machine which is X86.
(c) TensorFlow is not required in a deployment machine which is X86.
(d) FPGA (via PCI add-on) is not required in a deployment machine which is X86.
(e) GPU (via PCI add-on) is not required in a deployment machine which is X86.
(f) Item (b) and item (c) are always true.
(g) Item (e) may be true sometimes.
items required to form the following for successful completion of deployment of the
abovementioned model in IoT edge?
(a) The CNN model in Python is not ready to be deployed in IoT edge.
(b) The CNN model in TensorFlow is not ready to be deployed in IoT edge.
(c) The trained CNN model might have floating point weights.
(d) The trained CNN model might have too many neurons.
(e) IoT edge technology is very different from deep learning-based inference
technology.
(f) All the above are true.
108 8 Deployment of Deep Learning Networks
IP Packets
Indents
IP Packets
Assistant design
Entities capture
Dialog
Flow aggregation
Flow pipiline Flow direction
flows
Flow sources Application
identification
Deduplication
Superflows
View flow data on the
Network Activity tab Tuning false positive events
from creating offenses
VLAN fields
Configuring a flow collector
Event data collection requires a packet sniffing tool set to collect IP packets in real
time. Events are generated by log sources such as firewalls, routers, servers, and
intrusion detection systems (IDS) or intrusion prevention systems (IPS).
Flows provide information about network traffic and can be sent to IoT edge in
various formats and the list is given in the following:
1. Including flow log files
2. NetFlow
3. JFlow
4. sFlow
5. Packeteer
Data provides flow arrival time, common dst port, and RFC 1700 ports 0-1023.
TAP devices provide a way to access the data flowing across a computer network,
typically for the benefit of network security and performance monitoring tools. The
monitored traffic is referred to as the “pass through” traffic and the ports used for
monitoring are called “monitor ports.”
8.4 Deploying DL Networks in Kanshi 111
For greater visibility into the network, a TAP can be placed between the router
and the switch. To begin with, port mirroring, also known as SPAN or roving
analysis, is a method of monitoring network traffic that forwards a copy of each
incoming and/or outgoing packet from one or more port (or VLAN) of a switch to
another port where the network traffic analyzer is connected. SPAN is often used
on simpler systems to monitor multiple stations at once by using the following data
format.
arp, ether, fddi, cmp,ip, p6, link, pp. radio, rarp, slip, tcp, tr, udp, wlan
In network communication, a packet typically consists of two main parts: the
header and the payload. Here’s a breakdown of the information typically found in
each:
Packet Header:
Source and Destination Addresses: The header contains information about the
source and destination IP addresses or MAC addresses, depending on the layer of
the network protocol stack (e.g., IP addresses in the network layer, MAC addresses
in the data link layer).
Packet Length: The total length of the packet, including both the header and the
payload.
Packet Sequence Number: In some cases, there may be sequence numbers to
ensure packets arrive in the correct order.
Error Checking Information: Checksums or CRC (Cyclic Redundancy Check)
values are included in the header to verify the integrity of the packet.
Protocol Information: Indicates the type of data carried in the payload (e.g., TCP,
UDP, ICMP).
Time-to-Live (TTL) or Hop Limit: Prevents packets from circulating indefinitely
by decrementing on each hop in the network.
Packet Payload:
Data: The actual content of the packet, which can include application data,
messages, or any other information being transmitted. To investigate the information
in the header and payload, various technologies and tools can be used:
Packet Sniffers/Analyzers: Tools like Wireshark, tcpdump, or Microsoft Network
Monitor can capture and analyze network packets, providing detailed insights into
both headers and payloads.
Protocol Analyzers: These specialized tools focus on specific network protocols,
making it easier to dissect and understand the information contained in the header
and payload of those protocols.
Deep Packet Inspection (DPI) Systems: DPI systems go beyond basic packet
analysis to inspect the content of packets for security, quality of service, or traffic
112 8 Deployment of Deep Learning Networks
shaping purposes. They can analyze and classify payloads based on application-
layer content.
Network Monitoring and Intrusion Detection Systems (IDS/IPS): These sys-
tems can inspect packet headers and payloads for patterns that may indicate network
intrusions or malicious activities.
Custom Software: Depending on your needs, you can develop custom software
to parse and analyze packet headers and payloads, especially when working with
proprietary or custom protocols.
Remember that investigating the payload content may require additional knowl-
edge and tools specific to the application or protocol being used in the communica-
tion.
The flow inspection level might require network packet appliances to capture
up to 10 Gbps. Packet header and payload: which information is available in the
header and packet and which technologies to use to investigate header and payload
information.
IoT edge analyzes TCP/IP traffic flow data for applications, flow direction, and
superflows. Deployment engineers also learn how to build an IoT edge flow rule and
how to perform flow searches in IoT edge.
SSH in a nonstandard port might be an issue. The header does not have extra
information on issue, but payload might have it.
IoT edge collects network activity information, or what is referred to as “flow
records.” Flows represent network activity by normalizing IP addresses, ports, byte,
and packet counts, as well as other details, into “flows,” which effectively represent a
session between two hosts. QRadar can collect different types of flows, which differ
greatly in the collected details. The following list provides available IoT edges in
the market to handle flow collection.
1. Cisco NetFlow
2. QRadar QFlow
3. QRadar Network Insights (QNI)
Packet analysis use deep learning networks to perform real-time inference to get
vulnerability assessment (VA) information. IBM Cloud account provides use of
Watson Assistant in the development of Kanshi to perform security audit in IP
networks. IoT edge can import VA information from various third-party scanners.
IoT edge network insight appliances connect to network TAPs, SPAN, or mirror
ports to access full packet data for real-time analysis. Mostly, IoT edge network
insight appliances provide a detailed analysis of network flows to extend the threat
detection capabilities of network insight appliances.
8.4 Deploying DL Networks in Kanshi 113
IoT edge network insight appliance provides a detailed analysis of network flows
to extend the threat detection capabilities of IBM QRadar. CPU and GPU are used
in IoT edge to obtain real-time performance. More secure operating systems run on
Red Hat Enterprise Linux® version 7.9.
Berkeley Packet Filters (BPFs) provide a powerful tool for intrusion detection
analysis. Use BPF filtering to quickly reduce large packet captures to a reduced set
of results by filtering based on a specific type of traffic. Both admin and non-admin
users can create BPF filters. Build complex filter expressions by using modifiers and
operators to combine protocols with primitive BPF filters.
Cyber Physical Systems Increasingly Under Threat from “n00bs”
Throughout 2021, we observed low sophistication threat actors learn that they could
create big impacts in the operational technology (OT) space—perhaps even bigger than
they intended. Actors will continue to explore the OT space in 2022 and increasingly
use ransomware in their attacks. This targeting will occur because of the need to keep
OT environments fully operational, especially when the systems are part of critical
infrastructure. Attacks against critical OT environments can cause serious disruption and
even threaten human lives, thereby increasing the pressure for organizations to pay a
ransom. To compound the issue, many of these OT devices are not built with security at
the forefront of the design, and we’re currently seeing a massive uptick in the number of
vulnerabilities being identified in OT environments. Reference R E P O R T | M A N D I A
N T, 14 cyber security predictions for 2022 and beyond.
IP stream analysis and deep learning are two fields that can be combined to create
powerful tools for analyzing network traffic and detecting anomalous behavior.
IP stream analysis involves capturing and analyzing network traffic to identify
patterns and trends, detect security threats, and troubleshoot network issues. This
can involve analyzing data at the packet level, looking at protocol headers, or
examining flow records.
Deep learning, on the other hand, is a subset of machine learning that uses arti-
ficial neural networks with multiple layers to model and solve complex problems.
It involves training the neural network on large data sets to learn patterns and make
predictions or classifications.
By combining IP stream analysis with deep learning, it is possible to create
sophisticated tools for detecting anomalies and security threats in network traffic.
For example, deep learning models can be trained on large data sets of normal
network traffic to learn patterns of behavior. These models can then be used to detect
deviations from normal behavior, which could indicate a security threat.
One example of this is using deep learning to detect distributed denial-of-service
(DDoS) attacks. By analyzing network traffic and training a deep learning model
to identify patterns of normal traffic, the model can be used to detect when traffic
patterns deviate from the norm. This can help to detect and mitigate DDoS attacks
in real time.
Overall, combining IP stream analysis with deep learning can lead to more
accurate and effective tools for network analysis and security.
Scapy is a utility for allowing a user to manipulate packets on networks. Scapy
is a powerful Python-based interactive packet manipulation program and library.
Figure 8.9 provides workflow to obtain a data set from the PCAP file.
Write a program that can use malicious pcap files as data sets and predict if other
pcaps files have malicious packets in them.
1. Download two pcap files and concatenate them to extract packet._timestamp and
packet._data.
2. Preprocess the packet._data, add labels on it, and create a training data set.
3. Create testing data set; if it is in a file, then zip them to pcap files.
4. Passing a data set of (feature, label) pairs is all that is needed from the above.
Researchers working on computer network or cyber security often need to
analyze network traffic. In that case, they use a Wireshark Packet Analyzer or
any other similar traffic analysis tools to capture and analyze packets. However,
if you want to perform data analysis, cleaning, modeling, or feature analysis and
classification for the network traffic, you might want to convert the PCAP files into
a CSV file.
1. Wireshark is an open-source cross platform software.
2. tcpdump is Linux utility.
3. Firesheep is Firefox extension.
4. Packet sniffers can store captured packets in PCAP (PacketCAPture) files.
8.4 Deploying DL Networks in Kanshi 115
Port Scan Network Scan TCP Network Scan UDP Network Scan ICMP
DoS
Pcap file
Data Set
CNN model in
Tensorflow / DLtrain
Android studio is used to build inference engines as given in the following workflow.
Use file transfer functionality to copy created APK into Android phone. Workflow
for the same is provided in Fig. 8.12.
Windows 10 or Ubuntu 18.04 with Android Studio is used in the J7 app project,
where the latest stable version of Android Studio version is [Link] .
NDK 20.1.5948944 (ndk;20.1.5948944) is used to build JNI lib for inference engine.
Inference engine full source code is given in [95].
Update inference engine with the revised model. The model update application
source code is given in [95].
The following diagram provides information on workflow to create a J722
application in the form of APK.
1 j k : ~ / J a n 2 8 $ j a v a − j a r t o P h o n e / SndModel . j a r
2 Open J 7 2 2 and Load
3 Enter f i l e path : j2xxxx
120 8 Deployment of Deep Learning Networks
Use the GPU of the Android device to perform computation in inference for a given
image as input. Many Android phones may not have GPU, but in case the Android
phone has GPU, then how is GPU used to perform computation which is part of
inference.
Problem 8.5.1 Develop CUDA Core code for a given C, C++ source code of a J7
app inference, where a C, C++ code is working well in the CPU of an Android
phone. The inference engine source code of NN/CNN is given in [95].
Objective Port J7 app inference engine C, C++ code into CUDA programming
and use CUDA cores of GPU of a given Android phone for real-time inference.
Figure 8.15 provides details on workflow for the abovementioned application.
8.5 Deploying DL in Android Phone 121
Objective Share the trained deep learning network model in the host PC with J7
app in the Android device.
Share the trained deep learning network model in the host PC with J7 app in the
Android device. Assume the host PC and Android device are connected via local
Wi-Fi access point. Figure 8.16 provides details on “workflow” for sharing CNN
with J7 application.
Design and develop server application in host PC and run it in PC.
J7 app has a client application on Android phones.
Host machines (Windows or Ubuntu) use DLtrain to train NN or CNN. Assume
that the MNIST data set is available in the above host machine.
A sample application is made for the above functional requirements. The source
code of the mentioned sample application is shared in [95]. Perhaps focus on the
host processor side and revise the given source code to perform better ways to
transfer the “trained deep learning model from the host PC to the Android device.”
Problem 8.5.2 Develop application in host (which can improve the above work in
Mini Project 2) such that users can transfer trained deep learning network models
such as NN or CNN from the host computer to the Android device. Assume that
both are connected via the TCP/IP network and have the same subnet address.
122 8 Deployment of Deep Learning Networks
Objective Pull the trained deep learning network model into the Android device
via Wi-Fi by using an application in the Android device and also running server
application in the host processor. Figure 8.17 provides a detailed workflow to pull
the trained model from the host PC.
The J7 app is a client application that is designed to perform receiving trained
NN or CNN from the host machine by using Wi-Fi (local network).
Problem 8.5.3 Develop application in Android device such that it can automati-
cally perform synchronization to pull a revised deep learning networks from the
GitHub server or any other server.
The source code of the J7 app is given in [95].
8.5 Deploying DL in Android Phone 123
The IBM Watson Visual Recognition service is deployed in IBM Cloud. Client
application in the Android device is used to collect image data by using the device
camera and send image data to the IBM Cloud-based VR service for inference. The
mentioned visual recognition service in IBM Cloud appears to be having a status as
given in Fig. 8.18.
IBM Watson leverages unique capabilities of accelerated power servers, deliver-
ing performance unattainable on commodity servers and provides for hyperparam-
eter search and optimization, and elastic training to allocate the resources needed
to optimize performance. Distributed deep learning provides for rapid insights at
massive scale. Large model support facilitates the use of system memory with little
to no performance impact, yielding significantly larger and more accurate deep
learning models.
IBM Watson Visual Recognition service uses deep learning algorithms to analyze
images of scenes, objects, and other content. The response includes keywords that
provide information about the content. The Watson Machine Learning Accelerator, a
new piece of Watson Machine Learning, makes deep learning and machine learning
more accessible to team in the customer side and brings the benefits of AI into
customer business. It combines popular open-source deep learning frameworks,
efficient AI development tools, and accelerated IBM® Power Systems™ servers.
Now small and medium organization can deploy a fully optimized and supported
AI platform that delivers blazing performance, proven dependability, and resilience.
The Watson Machine Learning Accelerator is a complete environment for data sci-
ence as a service, enabling small and medium organization to bring AI applications
into production.
It enables rapid deployment in customer locations. The deployment process
includes most popular deep learning frameworks, including all required depen-
dencies and files, precompiled and ready to deploy. The entire AI suite has been
validated and optimized to run reliably on accelerated power servers.
It incorporates the most popular deep learning frameworks. The Watson Machine
Learning Accelerator gives access to power-optimized versions of all of the
most popular deep learning frameworks currently available, including TensorFlow,
Caffe, and PyTorch. Watson Machine Learning Accelerator runs on IBM Power-
accelerated server HPC, a platform that runs not only customer deep learning
networks but also a wide variety of high-performance computing workloads.
3. Image capturing (color, gray), and provide names of two types of camera:
(a) usb camera
(b) CSI camera
4. Image synthesis (drawing by using software), and provide the name of one
software that provides option to edit pic and save pic:
(a) Paint brush
(b) GNU Image Manipulation Program
5. What unit is used to measure the size of the image?
(a) Number of pixels in horizontal axis
(b) Number of pixel in vertical axis
6. How many bytes are required to store one pixel?
(a) Color [Link] bits
(b) White and black: 8 bits
(c) binary 1 bit
7. How do you create one file by using many picture files?
(a) Use zip or gz
(b) Compression tool to perform the above
Key Items
Data set, NN model, CNN model, training of NN/CNN model, deep learning model,
testing DL model, deployment of trained model, inferencing on given hypothesis
Challenges in the rollout of deep learning enabled service for enterprise require-
ments.
1. Inferencing required a well-trained deep learning model.
2. Deployment for a trained DL model in camera is not easy.
3. The cost of the camera will be high, if the camera performs inference on a given
click.
4. Training of a CNN model requires huge data set.
5. Training of a large CNN model requires IBM Watson Visual Recognition service.
Infrastructure The following list provides items that are required before starting
a project:
1. IBM Cloud account (free or paid version)
2. PC (Windows or Ubuntu machine) with Internet connection
3. One or few smartphones (Android)
4. Android Studio (Windows machine or Ubuntu machine)
5. Watson Studio Project
6. Watson Visual Recognition service
7. Cloud object storage service in IBM cloud
8.5 Deploying DL in Android Phone 127
Fig. 8.20 AI client in Android phone and use IBM Watson VR service
Problem 8.5.6 Build a custom model to test tomato quality. Reject a tomato if it
has yellow patch on it.
8.5 Deploying DL in Android Phone 129
Problem 8.5.8 Deploy custom model for image classification. The IBM Watson
Visual Recognition service is used to create custom models. And also the IBM
Watson Studio is used to train, test, and deploy custom models in IBM Cloud. User
application in Android phone can perform the following:
(a) Take a picture by using the camera and send it to IBM Watson for inferencing.
(b) Receive inferencing result from IBM Watson and display result locally.
(c) The automatic driver assistance system in a car can use items (a) and (b) such
that ADAS can help the driver.
(d) Non-real-time applications can use (a) and (b) such that the image classification
result is useful in their application.
(e) Batch processing of given images can be handled by using (a) and (b).
New-generation IoT edge for AI-driven applications uses FPGA devices to perform
real-time inference. Creating applications on FPGA requires VHDL or Verilog.
There is a challenge to run deep learning models that are trained in TensorFlow
or in PyTorch. In this regard, there is a need to use C, C++ languages to deploy deep
learning networks in FPGA.
A very-early-stage tool set is provided to deploy a deep learning network which
is trained by using TensorFlow. Perhaps this revised tool set can bring down effort
required to deploy deep learning networks in FPGA. Developers can create high-
quality IoT edge with inference ability. Input to inference engine can come from a
camera which is there in the embedded device. Figure 8.21 provides details on a
tool set which is used in porting DL networks on to Xilinx FPGA.
Custom board Ultra96-V2 uses “Zynq UltraScale + MPSoC ZU3EG A484.” The
DLtrain version of the deep learning tool set does not use:
1. “Ai model pruning and optimization”
2. “ AI model quantizer”
130 8 Deployment of Deep Learning Networks
These two mentioned efforts are plugged in model creation time such that the
model used in training is ready for deployment as well without going through the
abovementioned truncation of the trained model.
Xilinx provides:
1. Edge compiler (DNNC is used)
2. Edge run time
The above tool set provides easy options to deploy the custom model in FPGA.
OEMs can get the trained model from vendors and deploy it in the embedded device
which has FPGA. The DLtrain AI framework has a provision to use the custom
model. DLtrain is developed by using C and C++ such that it is feasible to work
with embedded devices that are using FPGA.
A neural network is designed and coded to work with POWER9 and also with the
NVIDIA RTX 2070 GPU. Customers can focus fully on training their model instead
of worrying about 400+ dependency packages for Python 3.6 and TensorFlow 2.0.
Moreover, training of the DL model in DLtrain is distributed in POWER9 and also
in GPU (via CUDA 10.1). This will make training time short and also fine-tune
hyperparameters with ease.
8.5 Deploying DL in Android Phone 131
FP32 is the same as a 32-bit floating point number. DLtrain uses the following
data format:
1. wi,j is a 32-bit floating point number (FP32).
2. bj is a 32-bit floating point number (FP32).
3. ai is a 32-bit floating point number (FP32).
Above items 8.1 in list use FP32 for wi,j , bj , and ai . Performing FP32
multiplication and addition in FPGA might consume a high amount of resources.
Problem 8.5.9 Let
1. wi,j be a 32-bit integer (INT32)
2. bj be a 32-bit integer (INT32).
3. ai be a 32-bit integer (INT32).
Use the above INT32 values in Eq. 8.1 above that connects wi,j , bj , and ai . Provide
the method to match the quality of inference by using INT32 computations instead
of FP32 computations.
Problem 8.5.10 Let
1. wi,j be a 32-bit integer (INT32).
2. bj be a 32-bit integer (INT32).
3. ai be a 16-bit integer (INT16).
Use INT32 and INT16 to represent the above parameters in Eq. 8.1 above that
connects wi,j , bj , and ai . Provide the method to match the quality of inference
in INT32 (for wi,j , bj ), INT16 (for ai ) computations instead of using FP32
computations in the above equation.
Chapter 9
Tutorial: Deploying Deep Learning
Networks
The tutorial is designed to handle workflow from data set creation, deep learning
networks model design, training the deep learning networks model, testing the
deep learning networks model, and deploying the deep learning networks model
in Internet of Things (IoT) edges and also in cloud native applications. Moreover,
there is a list of challenges involved in deploying trained deep learning networks in
IoT edges. In particular, if the application is in real-time service, then a microservice
is introduced into IoT edge. Figure 9.1 shows the steps in the tutorial.
1. Train and validate a neural network (NN), convolutional neural network (CNN)
model with a user-defined data set
2. Deployment of the NN, CNN models of the deep learning network in the IoT
edge.
For example, sub systems are used to collect real-time sensor data from
respective sources and perform inference in the IoT edge to provide micro service
to other applications.
Loading the trained deep learning network model onto embedded systems is a
challenging task and many silicon vendors appears to be providing custom-made
solutions to fit into their own silicon devices.
The tutorial provides the necessary documents in Google Drive and source code
in GitHub. Most importantly, the tutorial connects the above-mentioned assets via
a web page that is designed to support the user to navigate the tutorial session in
autonomous learning by optimal use of the resources. The tutorial provides a quick
start and guide - a person can refer to a resource document online and make quick
progress in learning how to deploy “deep learning networks in edges.” The URLs of
the necessary resources are associated with a QR code or via a reference link.
Data set processing is presented in item 1. It appears that domain knowledge in a
particular data set will help to create an effective data set to train the deep learning
network model.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 135
J. Singaram et al., Deep Learning Networks,
[Link]
136 9 Tutorial: Deploying Deep Learning Networks
Training the deep learning network model is presented in item 2, which has many
sub items. A few of those items require CPU.+GPU hardware such that accelerated
computing is used to train deep learning networks. Also mentioned are training tasks
illustrated by examples for using cloud native servers in Colab to train deep learning
networks or in on-premises Power 9 server clusters.
Deployment of the trained model onto an edge requires a lot more care and hard
work. The training platform is different from the deployment platform. Mostly there
is a need to perform pruning of a trained model or optimize the weights of each node
by using INT8, INT16, etc., instead of using FP32.
9.1 Prerequisites
The following steps are used in the deployment of Deep Learning Networks in
cloud native and also in edge native applications. The tutorial is designed so that
the given steps can be handled within 30 h. Most of the workflow requires good
attention to read a given document for a specific workflow and implement the
recommended workflow such that accelerated learning is possible in a short period
of time. However, there will be issues that might come up and they will be discussed
in trying out the new workflow.
The following steps are recommended for deploying deep learning networks in
cloud native systems. The tutorial is designed to handle the required workflow
within 6 h. Cloud application engineers will find it very useful to learn the steps
involved in handling the deployment of deep learning networks in cloud-based
servers.
The tutorial workflow uses documents from Google Drive so that a learner can refer
to a resource document online and make quick progress in learning how to deploy
“DL networks in an edge.”. The URL of a given resource is associated with a QR
code. The following is the QR code for URLs that are used in the tutorial.
Item 7c provides information on digital twin and the associated physical process.
The URL [99] offers a brief introduction to the concept of a digital twin within the
context of a deep learning network.
Item 1b handles the workflow to store a data set in a local machine and use a locally
stored data set for training a CNN or NN model. There is no link associated with 1b
because it is trivial to handle image data from a local machine.
Item 7b handles the workflow to “add custom image data along with MNIST data
set.”
An MNIST data set trains an NN or CNN model by using TensorFlow. The
MNIST data set is well defined and it uses an image of hand written numbers from
.0, 1, 2 . . . 9. There is a problem included in 5.6.1 on this.
140 9 Tutorial: Deploying Deep Learning Networks
Item 2a handles the workflow for “Training Deep Learning Networks” in Colab.
Colab is used to train the TensorFlow model. The link [82] provides a detailed
workflow on this and the learner can use their Colab account in Google.
Item 2b handles the workflow for “Training Deep Learning Networks” in a .×86
Ubuntu machine
The URL [24] provides more information on the above task.
Item 2d handles the workflow for “Training Deep Learning Networks” by using
Power 9 servers along with RTX 2070 GPU.
If there is access to the above system, then they can use the following link to
perform the given task on a Power 9 CPU.
The URL [77] has a CPU version.
9.8 Deploying Deep Learning Networks in an IoT Device 141
Item 3 handles the workflow that is used in “Saving Deep Learning Networks” by
using the TensorFlow tool set.
The save model is used to store in local storage or in cloud storage. The URL
[100] provides a workflow for understanding tasks involved in storing deep learning
networks.
Item 4 handles the defined workflow for “Loading Deep learning Networks.”
Loading a model from local storage or from cloud storage is handled at the URL
[101].
Item 6a handles the workflow that is required to deploy a deep learning network
model by using the Flask microservice. The URL [102] can be used to access
detailed workflow documentation with examples.
Item 6b handles the workflow that is required to deploy a deep learning network
model using JavaScript. The URL [103] can be used to access detailed workflow
documentation with examples.
Item 6d handles the workflow that is required to deploy a deep learning network
model by using TensorFlow Serving. The URL [104] can be used to access detailed
workflow documentation with examples.
Glossary
DLtrain Deep Learning Model Training Platform. And also perform Inference
by using Deep Learning Networks in a given IoT Edge. 16, 29
Kanshi Name of network security audit software by using Deep Learning
networks. 21
MQTT Message Queuing Telemetry Transport .messaging protocol for the Inter-
net of Things (IoT). 19
XMPP Extensible Messaging and Presence Protocol. 20
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 143
J. Singaram et al., Deep Learning Networks,
[Link]
Appendix A
Training Restricted Boltzmann Machine
For a neural network, the technique of gradient descent is used to minimize the cost
function C. Here is an overview of how it works.
Cost Function
This function works by taking two vectors, the input to the neural network and
the predetermined correct output we want from the neural network. It then runs the
input through the entire network and then checks how much the final layer of n
neurons varies from the provided correct output. In short minimizing this function
is the goal of our optimization problem.
Neural Network
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 145
J. Singaram et al., Deep Learning Networks,
[Link]
146 A Training Restricted Boltzmann Machine
where .Wi is the weight matrix of the edges connecting the neurons from layer i to
i + 1 and the vector .b⃗i is a vector corresponding to the biases.
.
The Derivative
Let’s take a brief diversion to define the concept of derivatives for functions in
multidimensional spaces.
If .f : Rm → Rn is a function, then the derivative is a function
.Df : R
m → L(Rm , Rn ).
where .L(Rm , Rn ) stands for the space of all linear maps from .Rm to .Rn , in other
words the space of all .n × m matrices over real numbers.
A Training Restricted Boltzmann Machine 147
The function Df assigns to every point .x ∈ Rm a linear map .Df (⃗ x ) which is the
best linear approximation of f at .x⃗.
In other words it assigns a matrix to every point which at that point is the best
linear approximation of f . Concretely we can define the derivative as follows if
x ) = (f1 (⃗
.f (⃗ x ), f2 (⃗ x )) and .x⃗ = (x1 , x2 , . . . , xn )
x ), . . . , fm (⃗
and the partial derivatives are defined like
∂fi (⃗
x) x + t x⃗j ) − f (⃗
f (⃗ x)
. = lim
∂ x⃗j t→0 t
This derivative also follows the beloved chain rule which we will now exploit.
We define the Hadamard product of two matrices of the same dimension as
.A ⊗ B = [aij · bij ].
and we define
.fi : [0, 1] → [0, 1]
i i
x , o⃗) = d(N(⃗
C(⃗
. x ), o⃗)
where d is some function with range .[0, 1] that tells how far apart two vectors are.
Now the derivative with respect to the weights in the ith layer can be calculated as
Then we apply the D operator but we differentiate with respect to .Wi that is we
assume everything else is a constant so we get the following by the repeated chain
rule:
148 A Training Restricted Boltzmann Machine
d ' (N (⃗
. x ), o⃗)DNr (Nr−1 ◦ · · · ◦ Ni (Wi a⃗ + bi ))DNr−1 (Nr−2 ◦ · · · ◦ Ni (Wi a⃗ + bi ))
We can compute this for a value of i to see what it looks like; let .i = r; then this
map is as follows:
W → d ' (N (⃗
. x ), o⃗)((σ ' )kr (Wr a⃗ + br ) ⊗ W a⃗
∂C ∂C ∂Mi ∂C
. = = a⃗
∂Wi ∂Mi ∂Wi ∂Mi
∂C ∂C ∂Mi ∂C
. = =
∂bi ∂Mi ∂bi ∂Mi
(Fig. A.2)
(Fig. A.3)
(Fig. A.4)
A Training Restricted Boltzmann Machine 149
1. C. Wang, S.S. Iyengar, K. Sun, AI Embedded Assurance for Cyber System, 1st edn. (Springer
Nature, Berlin, 2023)
2. I.S. Sitharama, A. Sabharwal, F.G. Pin, C.R. Weisbin, Asynchronous production system
for control of an autonomous mobile robot in real-time environment. Applied Artificial
Intelligence an International Journal 6(4), 485–509 (1992)
3. P. Santosh, R. Buyya, K.R. Venugopal, S.S. Iyengar, L.M. Patnaik, Searching for the iot
resources: fundamentals, requirements, comprehensive review and future directions. IEEE
Commun. Surv. Tutorials 20(3), 2101–2132 (2018)
4. S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M.P. Reyes, M.-L. Shyu, S.-C. Chen, S.S.
Iyengar, A survey on deep learning: algorithms, techniques, and applications. ACM Comput.
Surv. (CSUR) 51(5), 92 (2018)
5. H. Tian, S. Pouyanfar, J. Chen, S.-C. Chen, S.S. Iyengar, Automatic convolutional neural
network selection for image classification using genetic algorithms, in In 2018 IEEE
International Conference on Information Reuse and Integration (IRI) (IEEE, New York,
2018), pp. 444–451
6. S.K. Ramani, S.S. Iyengar, Evolution of sensors leading to smart objects and security issues
in iot, in In International Symposium on Sensor Networks, Systems and Security (Springer,
Cham, 2017), pp. 125–136
7. I. Vasanth, S.S. Iyengar, N. Paramesh, G.R. Murthy, M.B. Srinivas, Machine learning and data
mining algorithms for predicting accidental small forest fires, in In The Fifth International
Conference on Sensor Technologies and Applications (2011), pp. 116–121
8. A.U. Rajendra, P.S. Bhat, S.S. Iyengar, A. Rao, S. Dua, Classification of heart rate data using
artificial neural network and fuzzy equivalence relation. Pattern Recogn. 36(1), 61–68 (2003)
9. M.M. Htay, S.S. Iyengar, S.Q. Zheng, t-error correcting/d-error detecting (d > t) and
all unidirectional error detecting codes with neural network. ii, in Proceedings of the
International Conference on in Information Technology: Coding and Computing, 2002 (IEEE,
New York, 2002), pp. 383–389
10. Y. Xia, S.S. Iyengar, N.E. Brener, An event driven integration reasoning scheme for handling
dynamic threats in an unstructured environment. Artif. Intell. 95(1), 169–186 (1997)
11. N. Krishnakumar, S.S. Iyengar, R. Holyer, M. Lybanon, An expert system for interpreting
mesoscale features in oceanographic satellite images. Int. J. Pattern Recognit. Artif. Intell.
4(03), 341–355 (1990)
12. S.I. Newsletter, What is low-code/no-code application development?. [Link]
insights/[Link]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 153
J. Singaram et al., Deep Learning Networks,
[Link]
154 References
37. J. S, Handling deployment of deep learning networks in edge devices (2018). [Link]
[Link]/dltrain/deploy-dl-networks/edge-native-service/j7-app
38. J. S, Setting up edge native Jetson Nano AI computer (2018). [Link]
tool-set/setup-jetson-nano
39. G.S. Thejas, Y. Hariprasad, S.S. Iyengar, N.R. Sunitha, P. Badrinath, S. Chennupati, An
extension of synthetic minority oversampling technique based on kalman filter for imbalanced
datasets. Mach. Learn. Appl. 8, 100267 (2022)
40. A.S. Nasreen, Dr, S. Iyengar, Deep learning based object recognition in video sequences.
International Journal Of Computing and Digital System, 11(1), (2022)
41. J. S, Pre-processing and 2d filter (2019). [Link]
jk/Data-Set/Pre-processing/2Dfilter
42. J. S, Data set pre-processing (2019). [Link]
Data-Set/fashionMNIST
43. J. S, Data set pre-processing (2019). [Link]
Data-Set/MNIST
44. J. S, Pre-processing and normalization (2019). [Link]
tree/jk/Data-Set/Pre-processing
45. J. S, Pre-processing and normalization (2019). [Link]
tree/jk/Data-Set/CSV
46. M. Binkowski, J. Donahue, S. Dieleman, A. Clark, E. Elsen, N. Casagrande, L.C. Cobo,
K. Simonyan, High fidelity speech synthesis with adversarial networks (2019). [Link]
org/pdf/[Link]
47. S. Karagiannakos, Speech synthesis: a review of the best text to speech architectures with
deep learning (2021). [Link]
48. A. Brown, Text to speech—lifelike speech synthesis demo (part 1) (2021).
[Link]
f991ffe9e41e
49. J. S, tcpdump for flow capture (2018). [Link]
Edge/Kanshi/FlowCapture/[Link]
50. J. S, scapy is used to convert PCAP file to CSV file (2018). [Link]
dltrainBook/blob/jk/Edge/Kanshi/FlowCapture/[Link]
51. G.E. Hinton, How a boltzmann machine models data, deep learning (2017). [Link]
[Link]/watch?v=kytxEr0KK7Q
52. W. Wolf, A thorough introduction to boltzmann machines (2018). [Link]
10/20/thorough-introduction-to-boltzmann-machines/
53. R. Salakhutdinov, G. Hinton, Deep boltzmann machines, in Proceedings of the 12th Interna-
tional Conference on Artificial Intelligence and Statistics (AISTATS) 2009, Clearwater Beach,
Florida, USA, (Department of Computer Science University of Toronto, Toronto, 2009)
54. R. Salakhutdinov, A. Mnih, G.E. Hinton, Restricted boltzmann machines for collaborative
filtering, in Appearing in Proceedings of the 24th International Conference on Machine
Learning, Corvallis (University of Toronto, Canada, 2007)
55. R.R. Brooks, S.S. Iyengar, Robust distributed computing and sensing algorithm (1996)
56. S.S. Iyengar, R.R. Brooks, J. Chen, Automatic correlation and calibration of noisy sensor
readings using elite genetic algorithms. Artif. Intell. 84(1–2), 339–354 (1996)
57. M. I. to Deep Learning 6.S191 and L. 2, Mit 6.s191: Recurrent neural networks, transformers,
and attention (2023). [Link]
58. F. Soler-Toscano, H. Zenil, J.-P. Delahaye, N. Gauvrit, Calculating kolmogorov complexity
from the output frequency distributions of small turing machines (2017). [Link]
org/plosone/article?id=10.1371/[Link].0096223
59. J.S, P.S.S. Iyengar, P.N.K. Chaudhary, Sensor fusion and pontryagin duality, in International
Conference on Information Security, Privacy and Digital Forensics (ICISPD 2022), Goa
(National Forensic Sciences University (NFSU), Goa Campus, 2022)
60. S. Gogioso, W. Zeng, Fourier transforms from strongly complementary observable (2018)
61. D. Su, The fourier transforms for locally compact abelian groups (2016)
156 References
87. A.K. Belman, T. Paul, L. Wang, S.S. Iyengar, P. Sniatała, Authentication by mapping
keystrokes to music: the melody of typing, in AISP’20-International Conference on Artificial
Intelligence and Signal Processing (2020)
88. Iyengar, S. Sitharama, S. Gulati, J. Barhen, Smelting networks for real time cooperative
planning in the presence of uncertainties, in In Applications of Artificial Intelligence VI, vol.
937 (International Society for Optics and Photonics, New York, 1988), pp. 586–594
89. J. S, Dltrain model based inference service in Jetson Nano (2019). [Link]
home/jkevents/baranovichi/inference/jetsonnano-dltrain
90. J. S, Kanshi for TCP IP network safety (2021). [Link]
edge/kanshi
91. J. S, scapy for flow capture (2018). [Link]
Edge/Kanshi/FlowCapture/[Link]
92. J. S, PCAF file is used to train NN model (2018). [Link]
dltrainBook/blob/jk/Edge/Kanshi/DeepLearning/[Link]
93. J. S, Android ndk 3.1.2 installation issue (2018). [Link]
34353220/how-do-i-select-android-sdk-in-android-studio
94. J. S, Installation of android studio in ubuntu (2018). [Link]
android-studio-on-ubuntu-18-04/
95. J. S, Deploy trained cnn in android phone for real time inference on hand written numbers
(2018). [Link]
96. J. S, Share trained model from host pc to j7app application in android device (2018). https://
[Link]/DLinIoTedge/dltrainBook/tree/jk/Edge/Send2Phone
97. J. S, IBM Watson visual recognition service (2018). [Link]
vendors/ibm-watson-vr
98. J. S, Deploy in xilinx zynq ultrascale+ mpsoc zu3eg a484 (2018). [Link]
DLinIoTedge/dltrainBook/tree/jk/Edge/FPGA
99. J. S, Digital twin models and its association with deep learning networks model (2018).
[Link]
100. J. S, Save deep learning networks (2018). [Link]
save-dl-networks
101. J. S, Load deep learning networks (2018). [Link]
networks/load-dl-networks
102. J. S, Micro service using the flask micro framework (2018). [Link]
deploy-dl-networks/cloud-native-service/flask-micro-service
103. J. S, Javascript to run tensorflow models in browser (2018). [Link]
deploy-dl-networks/cloud-native-service/inference-via-javascript
104. J. S, Deploy deep learning network model by using serving of tensorflow (2018). [Link]
[Link]/dltrain/deploy-dl-networks/cloud-native-service/deploy-in-cloud
Index
A Convolution 3D array, 9
AI framework, 17, 83, 130 Critical information, 113
AI hardware, vii, 44–45 CUDA cores, 40, 41, 44, 83, 85, 91, 93, 121
AI model pruning and optimization, 129 CV libraries, 49
AI model quantizer, 129 Cyber physical systems, 113
Android phone, 11, 35, 44, 48, 86, 102,
116–134
Artificial intelligence (AI), vii, viii, 1–6, 8, D
11–20, 23, 25–29, 31, 35–38, 45–48, Data labelling, 49–62
50, 78, 81, 85, 89, 99, 105, 106, 108, Data set, 4, 6–8, 14–15, 17, 28, 46, 49–63, 73,
124, 127, 129, 131–134 76, 80, 84–88, 90–93, 95–97, 108, 114,
Audio, speech image, 49 115, 121, 126, 128, 135, 137–140
Autonomous vehicles, 10, 13, 100 Deep AI, 131
Deep learning model, 7, 17, 28, 51, 63, 80,
84–86, 89, 92, 100, 102, 105, 114, 115,
B 121, 127, 129, 131, 138
Bernoulli experiment, 54, 57 Deep learning networks model, 87, 131,
Boltzmann distribution, 7, 15, 68–73, 77 135–142
Boltzmann machine, 63, 64, 72–78 Deep neural networks (DNNs), 5, 6, 8, 81, 131
Brooks–Iyengar algorithm, 63, 74, 75, 78–80 Deep programming, 6–7
Deploying deep networks, 12, 109–142
Deployment, v, vii, viii, 1, 9–14, 16–21, 24,
C 28, 32, 33, 35, 45, 47–49, 63, 84, 86,
Caffe, 28, 85, 108, 124 99–138, 141, 142
CNN model, 17, 31–32, 44, 47, 48, 59, 77, Designing and machine learning training, vii,
78, 83, 84, 86–90, 92, 95–96, 101–103, 7, 15, 63, 78, 88
106, 116, 118–122, 126, 135, 138–142 Deterministic network, 74
Compression DL networks, 81 Development of learning networks, 63
Computer science, 1, 2, 78 DGX Station, 35, 45, 46
Convolutional neural networks (CNNs), 1, 7–9, DGX Station for DL networks, 46–47
15, 17, 29, 31–32, 44, 46, 53, 55, 57, Distributed deep learning (DDL), 28, 89, 93,
63, 78, 80, 81, 83–86, 88–94, 96, 97, 94, 124
101–103, 109, 118–123, 131, 135, 137, Distributed execution, 25
142, 151, 152 DL in IoT edge, 12, 17, 99–101, 138, 141–142
© The Editor(s) (if applicable) and The Author(s), under exclusive license 159
to Springer Nature Switzerland AG 2024
J. Singaram et al., Deep Learning Networks,
[Link]
160 Index
L S
Learning algorithms, 4 SCADA, 11, 100, 104
Log sources, 110, 113 Software tool sets, 23–33
Low-code, 11–21, 83 Statistical learning algorithms, 4
Index 161