[go: up one dir, main page]

100% found this document useful (1 vote)
513 views173 pages

Deep Learning Networks Guide

.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
513 views173 pages

Deep Learning Networks Guide

.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Jayakumar Singaram

S. S. Iyengar
Azad M. Madni

Deep Learning
Networks
Design, Development and Deployment
Deep Learning Networks
Jayakumar Singaram • S. S. Iyengar •
Azad M. Madni

Deep Learning Networks


Design, Development and Deployment
Jayakumar Singaram S. S. Iyengar
Mistral Solutions Pvt. Ltd. Florida International University
Bangalore, India Miami, FL, USA

Azad M. Madni
Astronautical Engineering Deptarment,
RRB 201 3100
University of Southern California
Los Angeles, CA, USA

ISBN 978-3-031-39243-6 ISBN 978-3-031-39244-3 (eBook)


[Link]

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


Foreword

This book, written by experts in AI and machine learning, is unique. Unlike current
books on this subject that either cover the theory and mathematical underpinnings
of deep learning, or focus exclusively on programming-centric concepts, tools and
languages, this book addresses and bridges both aspects. It seamlessly connects
theoretical methods with pertinent technologies and toolsets in a manner that makes
the material suitable for students, educators, and practitioners.
Its proposition lies in its multifaceted treatment of the subject. It conveys com-
plex Deep Learning concepts in simple terms, making the material understandable
to a wide audience. In addition, it elucidates the intricate landscape of the different
technologies and toolsets currently available, thereby offering readers the much
needed clarity needed to make informed decisions for their respective applications
and problem domains.
By bridging theory and practice, this book empowers readers to not only grasp
fundamental concepts but to also confidently navigate the practical applications of
Deep Learning. Ultimately, this book will serve as a comprehensive guide for Deep
Learning enthusiasts, practitioners, educators, and researchers alike. Its focus on
holistic understanding and actionable insights makes it an invaluable “must read,”
and an essential resource for anyone interested in delving into the exciting realm of
Deep Learning.

Prof. John Hopcroft


Turing Award Laureate
Member of the NAE and NAS
Cornell University

v
Preface

This book presents multiple facets of deep learning networks involved in the design,
development, and deployment of these networks. More specifically, this book is
an introduction to the toolset and its associated deep learning techniques. The
book also presents design and technical aspects of programming and provides
pragmatic tools for understanding the interplay of programming and technology
for several applications. It charts a tutorial which provides wide-ranging conceptual
and programming tools that underlie the deep learning applications.
Furthermore, the book presents a clear direction toward a path forward that
profoundly engages and challenges the art of science and engineering programming
for students taking undergraduate courses.

Bangalore, India Jayakumar Singaram


Miami, FL, USA S. S. Iyengar
Los Angeles, CA, USA Azad M. Madni

vii
Acknowledgements

This research was sponsored by the Army Research Office and was accomplished
under Grant Number W911NF-21-1-0264. The views and conclusions contained in
this document are those of the authors and should not be interpreted as representing
the official policies, either expressed or implied, of the Army Research Office or
the US Government. The US Government is authorized to reproduce and distribute
reprints for Government purposes notwithstanding any copyright notation herein.

ix
Purpose of This Book

The reader is introduced to methods for designing, modeling, developing, building,


training, and deploying deep learning in artificial intelligence applications. These
include IoT, computer vision, natural language processing, and reinforcement learn-
ing. A series of deep learning environment design and building exercises provide
a succinct, project-driven deep learning tutorial. Furthermore, this book offers a
comprehensive, consistent treatment of the current thinking and technology trends
in this critical, rapidly expanding subject area. More importantly the book delivers
fundamental and deep learning techniques and principles packed with real-world
deep learning applications and examples. The book also provides a forward-thinking
perspective in advanced deep learning infrastructure building and deployment
methods. Implementation issues are discussed in a companion framework of deep
learning networks that takes the reader through a logical sequence of discussions
about core concepts and issues related to deep thinking.
This book is unique in that it offers a comprehensive, end-to-end look at deep
learning principles and frameworks for building implementation algorithms. As
with business organizations and government standards, implementation guidelines
are structured and organized within an overall programming strategy. This structure
is a valuable contribution of this book. This book is intended to serve as a valuable
resource in the artificial intelligence discipline for students, professionals, and data
scientists who want to understand how a successful implementation of deep learning
algorithms and frameworks looks like from the programming context.
This book discusses feed-forward techniques and tools for setting up deep
learning applications including CPU and GPU systems. The novelty of this book
stems from the fact that it provides a comprehensive approach and toolset for
developing deep learning applications and building virtual environments for various
AI Platforms. It demonstrates and conveys core insights into various native AI
hardware configurations including Edge Native AI hardware, and in setting up
Jetson Nano in IoT Edge environment. It also describes and demonstrates the hand-
holding techniques to configure the open-source operating system and edge devices.
Advanced deep learning deployments are discussed and illustrated with specific

xi
xii Purpose of This Book

examples that contribute to detailed understanding of real-time deep learning


applications.
The book introduces the reader to neural nets, convolutional neural nets, word
embeddings, recurrent or nets, sequence-to-sequence learning, deep reinforcement
learning, unsupervised learning models, and other basic concepts and methods. As
important, this book covers artificial intelligence research on design, development,
training, test, and deployment of applications and hardware services including the
IoT micro services. TensorFlow, an open- source machine learning platform, allows
students, AI experts, and data scientist professionals to work through programs and
master the fundamentals of deep learning.
This book claims that building the best deep learning environment is the ultimate
way to study deep learning science, and the book reflects this philosophy. Each
section of the book presents stepwise command-oriented instructional guidelines to
create, configure, and build the deep learning environments with the widely used
Tensorflow and other relevant tools in various hardware and software environments.
Ultimately, this book aims to deliver deep learning insights with zero dependency
or familiarity with probability, statistics, multivariate calculus, and linear algebra. It
caters to a wide readership that includes both graduate and undergraduate students,
practitioners, and researchers in academia.
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Artificial Intelligence (AI). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 AI vs. ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Deep Learning (DL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 DL vs. ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Deep Learning and Deep Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Deep Learning Network Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Low-Code and Deep Learning Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Role of Tool Set in Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Schematic Representation of Deep Learning Architecture . . . . . . . . . 13
2.3 Deep Learning Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 Data Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Model Design for Deep Learning Network . . . . . . . . . . . . . . . . . 15
2.3.3 Train Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.4 Test Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.5 Save Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.6 Load Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.7 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Custom Framework: DLtrain for AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Sample AI Application Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Quick Look: IBM Watson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.2 IBM Watson Service and Monitor Tomato Farm . . . . . . . . . . . 18
2.5.3 Real-Time Audit of IP Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Introduction to Software Tool Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Virtual Environment for Required Tool Set . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 TensorFlow: An AI Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Keras in TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 TensorFlow Image in Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

xiii
xiv Contents

3.3 JupyterLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 JupyterLab: Latex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Setting Up Edge AI Computer (Jetson Nano) . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 IBM Watson Machine Learning: Community Edition . . . . . . . . . . . . . . . 28
3.7 Tool Set to Build DLtrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7.1 Target Machine Is X86 with Ubuntu . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7.2 Use Docker: Target Machine Is X86 with Ubuntu . . . . . . . . . 30
3.7.3 Target Machine Is Power 9 with Ubuntu . . . . . . . . . . . . . . . . . . . . 30
3.7.4 Target Machine Is Jetson Nano with Ubuntu . . . . . . . . . . . . . . . 31
3.7.5 Target Machine Is X86 Windows 10 . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Docker Image of DLtrain Application to Train CNN . . . . . . . . . . . . . . . . 31
3.9 Deploy DL Networks in Near Edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.9.1 Deploy DL Networks by Using TensorFlow RT . . . . . . . . . . . 33
4 Hardware for DL Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Open Source for Edge Native Hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 POWER9 with RTX 2070 GPU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 OpenPOWER CPU with ASPEED VGA Controller . . . . . . . 38
4.2.2 CUDA Installation and PCI Driver for RTX 2070 . . . . . . . . . 40
4.2.3 Build Application Using nvcc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.4 Edge Native AI Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.5 On-Prem Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.6 DGX Station A100 for DL Networks . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.7 Deployment of AI in X86 Machine . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.8 Deployment of AI in Android Phone . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.9 Deployment of AI in Rich Edge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Data Set Design and Data Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Source of Data: Human and Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 Data Set Creation and Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.5 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.5.1 Bernoulli: Binary Classification of Data . . . . . . . . . . . . . . . . . . . . 52
5.5.2 Binomial: Binary Classification of Data . . . . . . . . . . . . . . . . . . . . 54
5.5.3 Poisson: Binary Classification of Data . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Image Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6.1 Image Data and Maxwell-Boltzmann Statistics . . . . . . . . . . . . 57
5.6.2 Working with Image Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.6.3 Pixel Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6.4 Global Centering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6.5 Global Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7 Data Set: Read and Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7.1 Data Set with Label Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7.2 Working with CSV Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Contents xv

5.8 Audio Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


5.8.1 Speech Synthesis by Using Deep Learning Networks . . . . . 61
5.9 Data Set by Using PCAP File and Stream to Tensor. . . . . . . . . . . . . . . . . 62
6 Model of Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Data and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.1 Sequence Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.2 Sequence Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.2.3 Sequence Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.4 Sequence to Sequence Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Data and Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.1 Measurement and Probability Distribution . . . . . . . . . . . . . . . . . 65
6.4 Boltzmann Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.5 Multilayer Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.6 Reduction of Boltzmann Machine to the Hopfield Model . . . . . . . . . . . 74
6.7 Kolmogorov Complexity for a Given Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.8 Restricted Boltzmann Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.9 Brooks–Iyengar Algorithm for Binary Classification . . . . . . . . . . . . . . . . 78
6.10 Pre-Trained Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.11 Compression of DL Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7 Training of Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.1 DLtrain Is a No-Code Deep Learning Framework . . . . . . . . . . . . . . . . . . . 83
7.2 DLtrain: Training of NN and CNN Models . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2.1 Preprocessing Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2.2 Design Deep Learning Network Model . . . . . . . . . . . . . . . . . . . . . 88
7.2.3 Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2.4 Training Deep Learning Network Model . . . . . . . . . . . . . . . . . . . 89
7.2.5 Save Deep Learning Network Model . . . . . . . . . . . . . . . . . . . . . . . 89
7.3 DLtrain Tested in POWER9 with GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.3.1 Build DLtrain for POWER9 Servers . . . . . . . . . . . . . . . . . . . . . . . . 90
7.3.2 DLtrain to Train CNN in POWER9 Servers . . . . . . . . . . . . . . . . 90
7.3.3 DLtrain for Inference in POWER9 Servers . . . . . . . . . . . . . . . . . 91
7.4 Docker Image of DLtrain for X86 with Ubuntu . . . . . . . . . . . . . . . . . . . . . 91
7.5 DLtrain: Train DL Models in Windows 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.6 DLtrain: Large Model Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.7 Train NN and CNN Models in TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.7.1 Setup Tool Chain for TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.7.2 MNIST Data Set to Train NN or CNN Model . . . . . . . . . . . . . . 95
7.7.3 Colab: Train NN and CNN Models . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.8 DLtrain for Jetson Nano Series SOM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.8.1 Build DLtrain for Jetson Nano Series SOM . . . . . . . . . . . . . . . . 96
7.8.2 DLtrain to Train CNN in Jetson Nano Series SOM . . . . . . . . 97
7.8.3 DLtrain for Inference in Jetson Nano Series SOM . . . . . . . . . 97
xvi Contents

8 Deployment of Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99


8.1 Insight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.3 Silicon Vendors in IoT Edge Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.4 Deploying DL Networks in Kanshi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.4.1 Event Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.4.2 Flow Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.4.3 Vulnerability Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.4.4 IP Stream Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8.5 Deploying DL in Android Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.5.1 Installing Android Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.5.2 Build Inference Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.5.3 Send CNN or NN Model to Phone . . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.5.4 Using the J7 Application in Android Phone . . . . . . . . . . . . . . . 120
8.5.5 Mini Project 1: Inference Using GPU. . . . . . . . . . . . . . . . . . . . . . . 120
8.5.6 Mini Project 2: On Sharing Trained CNN . . . . . . . . . . . . . . . . . 121
8.5.7 Mini Project 3: Pull Trained CNN from Host . . . . . . . . . . . . . . 122
8.5.8 IBM Watson Visual Recognition Service . . . . . . . . . . . . . . . . . . . 124
8.5.9 Build a Custom Model to Test Tomato Quality. . . . . . . . . . . . . 128
8.5.10 Deploying DL in FPGA (Ultra96-V2) . . . . . . . . . . . . . . . . . . . . . 129
8.5.11 Port FP32 Inference Code to INT32 . . . . . . . . . . . . . . . . . . . . . . . . 133
9 Tutorial: Deploying Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.2 Deploying Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.2.1 Deploying Deep Learning Networks in Cloud and Edge . . 137
9.2.2 Deploying Deep Learning Networks in Edge Native. . . . . . . 137
9.2.3 Deploying Deep Learning in Cloud Native . . . . . . . . . . . . . . . . . 138
9.3 Deep Learning Networks, Digital Twin, Edge . . . . . . . . . . . . . . . . . . . . . . . 138
9.3.1 CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3.2 Digital Twin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.4 Data Set Used in Training Deep Learning Networks . . . . . . . . . . . . . . . . 139
9.4.1 Data-Set Storage in a Local Machine . . . . . . . . . . . . . . . . . . . . . . . 139
9.4.2 Adding Custom Image Data Along with an
MNIST Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.5 Training the Deep Learning Networks Model by Using a
CPU and a GPU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.5.1 Training Deep Learning Networks in Colab . . . . . . . . . . . . . . . . 140
9.5.2 Training in Ubuntu 18.04 ×86 CPU . . . . . . . . . . . . . . . . . . . . . . . . 140
9.5.3 Training in Power 9 CPU + RTX 2070 GPU. . . . . . . . . . . . . . . 140
9.5.4 Training Deep Learning Networks in a Jetson
Nano GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.5.5 Watson VR Service: Deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.6 Saving Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.7 Loading Deep Learning Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Contents xvii

9.8 Deploying Deep Learning Networks in an IoT Device . . . . . . . . . . . . . 141


9.9 Inference as a Microservice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.9.1 Microservice Using the Flask Micro Framework . . . . . . . . . . 142
9.9.2 JavaScript to Run TensorFlow Models in a Browser . . . . . . . 142
9.9.3 Docker Image for a TensorFlow Serving Model . . . . . . . . . . . 142

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A Training Restricted Boltzmann Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A.1 Gradient Descent Is Used to Minimize Cost Function . . . . . . . . . . . . . . . 145
A.2 Score and Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.3 Data Flow in Computation of W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.4 Use of GPU to Compute W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
About the Authors

Dr. Jayakumar Singaram a semiconductor electronics and signal processing


veteran, specializes in deploying AI solutions in IoT Edge environments. He began
his career at Hindustan Aeronautics in Bangalore, where he was a part of the
very early-stage technical team at Cranes InfoTech and Mistral Solutions. Later, he
ventured into entrepreneurship, associating with brands like Epigon Media Tech-
nologies and Rinanu Semiconductor. With degrees in Aeronautical Engineering,
Mathematics, and a Doctorate in Systems and Control Engineering, he is a seasoned
researcher. Jayakumar led notable projects, including the Karaoke Machine in
collaboration with Analog Devices and Satellite Radio Receivers for WorldSpace
Broadcast Satellites. During his doctoral studies at IIT Bombay, his thesis focused
on “Simultaneous Stabilization of Feedback Systems.” Subsequently, at KU Leuven
University, he was part of a research team working on reconfigurable processors.
Beyond his professional journey, Jayakumar has made significant contributions
as a board member and Dean at Periyar Maniammai Institute of Science and
Technology ([Link]). Jayakumar acknowledges the invaluable support and
guidance of numerous individuals who have played a significant role in his journey,
including Prof. Bijnan Bandyopadhyay from IIT Bombay, Anees Ahmed and Rajeev
Ramachandra, founders of Mistral Solutions, Mike Haidar from Analog Devices,
Prof. Barry Vercoe of MIT Media Lab, Prof. P. G. Poonacha, Satyanarayana Reddy,
and Radhakishan Rao, who are the founders of Epigon Media Technologies, and
Saritha Shetty, Founder and Managing Director at SUNPLUS Group. Among his
cherished family members, he includes Dr. Suneetha Rao MDS (a Professor at
Vydehi Institute of Medical Sciences and Research Centre), Dr. Vaidhehi MBBS,
and Niranjan Kumar (an [Link]. Mathematics student at the Chennai Mathematical
Institute) for their unwavering support during the writing of this early version of the
book. Their encouragement and belief have been instrumental in shaping his journey
and accomplishments.
Dr. S. S. Iyengar is currently the Distinguished University Professor, Founding
Director of the Discovery Lab, and Director of the US Army-funded Center of
Excellence in Digital Forensics at Florida International University, Miami. He is

xix
xx About the Authors

also the Distinguished Chaired Professor at National Forensics Sciences University,


Gandhinagar, India. He has been involved with research and education in high-
performance intelligent systems, Data Science and Machine Learning Algorithms,
Sensor Fusion, Data Mining, and Intelligent Systems. Since receiving his Ph.D.
degree in 1974 from MSU, USA, he has directed over 65 Ph.D. students, many
number of postdocs, and many research undergraduate students who are now faculty
at major universities worldwide or scientists or engineers at national Labs/industries
around the world. He has published more than 600 research papers, has authored/co-
authored and edited 32 books, and holds various patents. Over the lifetime, his work,
Brooks–Iyengar Algorithm, has over 5223 publication(s) within this topic and has
received 138,976 citation(s). The topic is also known as: Brooks–Iyengar hybrid
algorithm. His h-index is 67 and is identified among the top 2% cited scholars and
world scientists (from Stanford University and EBMs of JSAN journal). The book
titled Fundamentals of Brooks–Iyengar Distributed Sensing Algorithm authored
by Prof. Pawel (Poland) and others and published by Springer in 2020 celebrates
S.S. Iyengar’s accomplishments that led to his 2019 Institute of Electrical and
Electronics Engineers’ (IEEE) Cybermatics Congress "Test of Time Award" for his
work on creating Brooks–Iyengar Algorithm and its impact in advancing modern
computing. His work has been featured on the cover of many scientific journals like
IEEE transactions and the National Science Foundation’s breakthrough technologies
report to the US Congress by his research group in both 2014 and again in 2016
He has served on many scientific committees and panels worldwide and has
served as the editor/guest editor of various IEEE and ACM journals. His books are
published by MIT Press, John Wiley and Sons, CRC Press, Prentice Hall, Springer
Verlag, IEEE Computer Society Press, etc. One of his books titled Introduction
to Parallel Algorithms has been translated into Chinese. During the last 30 years,
Dr. Iyengar has brought in over 65 million dollars for research and education.
More recently in Spring 2021, Dr. Iyengar in collaboration with HBCUs were
awarded a $2.25 M funding for setting up a Digital Forensics Center of Excellence
over a period of 5 years (2021-2026). He received an honorary Doctor of Science
from Poznan University of Technology in Poland in May 2023. He has been
awarded the Lifetime Achievement Award for his contribution to the field of Digital
Forensics on November 8, 2022, during the 7th INTERPOL DIGITAL FORENSICS
EXPERT GROUP (DFEG) MEETING at National Forensics Sciences University,
Gandhinagar, Gujarat, India. He has provided the students and faculty with a vision
for active learning and collaboration at Jackson State University, Louisiana State
University, Florida International University, and across the world.
Dr. Iyengar is a member of the European Academy of Sciences, a Life Fellow of
the Institute of Electrical and Electronics Engineers (IEEE), a Fellow of the Associa-
tion of Computing Machinery (ACM), a Fellow of the American Association for the
Advancement of Science (AAAS), a Fellow of the Society for Design and Process
Science (SDPS), and a Fellow of the American Institute for Medical and Biological
Engineering (AIMBE). He has received various national and international awards
including the outstanding Test of Time Research (for his seminal work which
has impacted billions of computer and internet users worldwide) and Scholarly
About the Authors xxi

Contribution Award from 2019 IEEE Congress on Cybermatics, the Times Network
NRI (Non-Resident Indian) of the Year Award for 2017, most distinguished
Ramamoorthy Award at the Society for Design and Process Science (SDPS 2017),
the National Academy of Inventors Fellow Award in 2013, and the NRI Mahatma
Gandhi Pradvasi Medal at the House of Lords in London in 2013, among others. He
was awarded Satish Dhawan Chaired Professorship at IISc, then Roy Paul Daniel
Professorship at LSU. He has received the Distinguished Alumnus Award of the
Indian Institute of Science. In 1998, he was awarded the IEEE Computer Society’s
Technical Achievement Award and is an IEEE Golden Core Member. Professor
Iyengar is an IEEE Distinguished Visitor, SIAM Distinguished Lecturer, and ACM
National Lecturer. In 2006, his paper, entitled A Fast-Parallel Thinning Algorithm
for the Binary Image Skeletonization, was the most frequently read article in the
month of January in the International Journal of High-Performance Computing
Applications. His innovative work called the Brooks–Iyengar algorithm along with
Professor Richard Brooks from Clemson University is applied in industries to
solve real-world applications. Dr. Iyengar’s work had a big impact; in 1988, he
and his colleagues discovered "NC algorithms for Recognizing Chordal Graphs
and K-trees" [IEEE Trans. on Computers 1988]. This breakthrough result led to
the extension of designing fast parallel algorithms by researchers like J. Naor
(Stanford), M. Naor (Berkeley), and A. A. Schaffer (AT&T Bell Labs). Professor
Iyengar earned his undergraduate and graduate degrees at UVCE-Bangalore and the
Indian Institute of Science, Bangalore, and a doctoral degree from Mississippi State
University.
His research has been funded by National Science Foundation (NSF), Defense
Advanced Research Projects Agency (DARPA), Multi-University Research Initia-
tive (MURI Program), Office of Naval Research (ONR), Department of Energy/Oak
Ridge National Laboratory (DOE/ORNL), Naval Research Laboratory (NRL),
National Aeronautics and Space Administration (NASA), US Army Research Office
(URO), and various state agencies and companies. He has served on US National
Science Foundation and National Institute of Health Panels to review proposals in
various aspects of Computational Science and has been involved as an external
evaluator (ABET-accreditation) for several Computer Science and Engineering
Departments across the country and the world. Dr. Iyengar has also served as
a research proposal evaluator for the National Academy. Dr. Iyengar has been a
Visiting Professor or Scientist at Oak Ridge National Laboratory, Jet Propulsion
Laboratory, and Naval Research Laboratory and has been awarded the Satish
Dhawan Visiting Chaired Professorship at the Indian Institute of Science, the Homi
Bhabha Visiting Chaired Professor (IGCAR), and a professorship at the University
of Paris-Sorbonne.
Dr. Azad M. Madni is a researcher, educator, entrepreneur, author, and phi-
lanthropist. He is a member of the National Academy of Engineering and is a
University Professor (highest academic designation) at the University of Southern
California. He is the holder of the Northrop Grumman Foundation Fred O’Green
Chair in Engineering and is the Executive Director of USC’s Systems Architecting
xxii About the Authors

and Engineering Program. He is also the Founding Director of the Distributed


Autonomy and Intelligent Systems Laboratory. He is the Founder and CEO of
Intelligent Systems Technology, Inc., an R&D company specializing in modeling,
simulation, and augmented intelligence technologies for education, training, and
human performance enhancement. His research interests include exploiting the
synergy of computer science, cognitive science, systems science, and entertainment
arts to better understand, frame, and explore solutions to complex sociotechnical
systems problems. He is specifically focused on transdisciplinary systems engi-
neering education, cyber-physical-human systems, and model-based approaches to
sustainability and resilience of complex sociotechnical systems. He is the creator of
TRASEE™, an award-winning transdisciplinary engineering education paradigm
that exploits storytelling and principles from the learning sciences to replace
disconnected knowledge resulting from today’s stove-piped courses with connected
knowledge that students can readily build on. He is a Fellow/Life Fellow of ten
professional science and engineering societies and has received the highest awards
from NAE, IEEE, INCOSE, AIAA, ASEE, ASME, SDPS, SES, and NDIA. In
2023, he received the National Academy of Engineering’s Bernard M. Gordon Prize
for Innovation in Engineering and Technology, the highest honor in engineering
education. He also received the prestigious 2023 IEEE Simon Ramo Medal, the
highest honor given for exceptional achievements in systems engineering and
systems science; the 2023 ASME Honorary Membership; and the 2023 ASME
CIE Lifetime Achievement Award. He is a member of the Advisory Board of
the London Digital Twin Research Centre, a Faculty Affiliate of USC’s Ginsburg
Institute of Biomedical Therapeutics in the Keck School of Medicine, and the
Founding Chair of IEEE SMC Technical Committee on Model Based Systems
Engineering. He is the author of Transdisciplinary Systems Engineering: Exploiting
Convergence in a Hyperconnected World (Springer, 2018) and the co-author of
Tradeoff Decisions in System Design (Springer, 2016). He is the series Co-Editor-in-
Chief of the Conference on Systems Engineering (CSER). He is the Editor-in-Chief
of the Handbook of Model Based Systems Engineering with Norm Augustine,
Springer 2023. He has 400.+ publications comprising authored books, edited books,
book chapters, journal articles, peer-reviewed conference publications, and research
reports. He has given nearly 100 keynotes and invited talks worldwide. He has an
abiding commitment to mentoring and helping members of under-represented and
under-resourced groups, and most recently he and his wife donated his share of the
$500k Gordon Prize to the NAE to advance transdisciplinary systems engineering
with active involvement of women and other underrepresented groups. He is a
member of the NAE’s Marie Curie Donor Society and the Albert Einstein Donor
Society. He received his B.S., M.S., and Ph.D. degrees in Engineering from UCLA.
He is also a graduate of AEA/Stanford Executive Institute
Acronyms

CMAKE is an open-source, cross-platform family of tools designed to build, test,


and package software.
CNN Convolutional neural network. 15
CUDA Compute unified device architecture.
FP32 32 bit version of Floating point computation. 48
GAN Generative adversarial network. 15
INT32 32 bit version of Integer arithmetic computation. 48
IoT Internet of Things. 19, 99
LSTM Long and short term memory. 15
MNIST Modified National Institute of Standards and Technology.
NeurIPS Neural information processing systems.
NN Neural network. 15
NVCC Nvidia CUDA compiler.
RNN Recursive neural network. 15
SOM Silicon on module. 99
TIDL Texas instrument deep learning.
VAE A variational autoencoder. 15

xxiii
Chapter 1
Introduction

Artificial Intelligence is the future, and deep learning is its most


powerful tool.

This chapter presents a comprehensive concept of data, deep learning, and the
design, training, testing, loading, and saving various network models associated
with machine learning. Significantly, it does so using the PyTorch and TensorFlow
open-source tools with the DLTrain with suitable examples and simple command-
controlled instructions. Further suitable hardware configuration, setup, testing,
and other installation-associated infrastructures are structured within a simple
programming framework. This chapter employs the simplest and most widely
used deep learning training models that frequently take first place in competitions.
The learning from the book also envisions the hands-on experience for all kinds
of machine learning users with strong practical demonstrations by supporting
foundational concepts. It is recommended that the reader have a laptop or desktop
handy while reading, in order to write the material learned into a permanent memory
for greater future clarity.
The book is a state-of-the-art treatment of deep learning environments. It caters
to both basic users and experienced data scientists. Standard and specific tools
in demand for deep learning design, development, and deployment are covered.
Furthermore, illustrative screenshots are provided for every topic to help users
acquire hands-on knowledge in deep learning.
“One of the most interesting features of machine learning is that it lies at the
intersection of multiple academic disciplines, principally computer science, statis-
tics, mathematics, and engineering.” Machine learning is usually studied as part
of artificial intelligence, which puts it firmly into the computer science discipline.
However, understanding why these algorithms work requires a certain level of
statistical and mathematical sophistication that is often missing in computer science
undergraduate courses. Question: Did convolutional neural networks (CNNs) find a
way around statistical or mathematical methods, or did it come up with a new theory
of modeling physical processes?

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 1


J. Singaram et al., Deep Learning Networks,
[Link]
2 1 Introduction

1.1 Artificial Intelligence (AI)

It is worth recalling that AI refers to a computer system’s capacity to imitate human


cognitive processes such as learning and problem-solving (Fig. 1.1). A computer
system today can replicate human reasoning in specific domains, learn from new
knowledge, and make judgments in a particular domain using AI.
A subdiscipline of artificial intelligence and computer science known as machine
learning (ML) focuses on using data and algorithms to simulate how people learn
while continuously increasing the accuracy of the model. ML is the practice of
assisting a computer in learning without direct instruction by applying mathematical
models of data. As a result, a computer system may keep picking up new skills and
become better on its own. In general, most of our daily activities involve the use of
AI and ML. Some commonly known examples include the following:
1. Siri, Alexa, and other smart assistants
2. Self-driving cars
3. Robo-advisors
4. Conversational bots
5. Email spam filters
6. Netflix’s recommendations
AI enables the incorporation of specific aspects of human intelligence into
machines (algorithms). AI is made up of the terms “artificial” and “intelligence,”
where “artificial” refers to anything developed by humans and “intelligent” refers
to the capacity to comprehend or reason in accordance with the circumstances of a
problem to find a solution. Artificial intelligence (AI) teaches computers to imitate
the workings of the human brain. To achieve optimum effectiveness, AI concentrates
on three abilities: learning, reasoning, and self-correction. AI is a type of computer
algorithm that demonstrates intelligence through judgment (Fig. 1.2).
1. Artificial narrow intelligence (ANI): The only sort of artificial intelligence now
in use in our society is artificial narrow intelligence (ANI), commonly referred to
as weak AI. Narrow AI is goal-oriented, is trained to carry out a single job, and
is extremely clever at carrying out the activity that it is trained to complete.
Siri, an airplane’s autopilot, chatbots, self-driving cars, etc. are a few instances
of ANI. In contrast to humans, narrow AI systems take data from a certain data

Fig. 1.1 AI learning illustration


1.1 Artificial Intelligence (AI) 3

Fig. 1.2 Skeptical ANI


illustration

Fig. 1.3 Skeptical AGI illustration

set and only carry out the one activity for which they were created. They are not
aware, sentient, or motivated by emotions or ethics (Fig. 1.3).
2. Artificial general intelligence (AGI): Artificial general intelligence, often
known as powerful machines, that demonstrate human intellect are said to have
artificial intelligence (AI). In this, machines can learn, comprehend, and behave
in a way that is identical to a person in a certain circumstance.
While the general AI does not yet exist, it has been featured in several science
fiction films starring humans interacting with sentient, feeling-driven, and self-
aware robots. Strong AI will enable us to create computers that can reason, plan,
and carry out a variety of activities in a variety of unpredictable environments.
While making decisions, they may use their existing knowledge to provide
original, creative, and out-of-the-box answers.
3. Artificial superintelligence (ASI): The concept of artificial superintelligence
(ASI) envisions a future in which robots will be able to demonstrate intellect that
is greater than that of the smartest humans. In this sort of AI, robots will not only
have the multidimensional intellect of people, but they will also be significantly
more capable of making decisions and solving problems than people. It is the
kind of AI that will have a significant influence on people and might eventually
wipe out the human species entirely.
4 1 Introduction

1.2 Machine Learning (ML)

A branch of artificial intelligence called machine learning (Fig. 1.4) employs


statistical learning algorithms to create systems that can automatically learn from
their experiences and get better over time without explicit programming.
Most of us utilize machine learning every day when we use services like
recommendation engines on Netflix, YouTube, and Spotify; search engines like
Google and Yahoo; and voice assistants like Google Home and Amazon Alexa.
With machine learning, we train the algorithm by giving it plenty of data and letting
it get to know the information that has been processed. ML algorithms can be
broadly classified into three categories: supervised, unsupervised, and reinforcement
learning. An algorithmic approach to making predictions and decisions using data
(machine learning) is a subset of artificial intelligence. There are three main groups
of algorithms in ML:
1. Supervised learning: The labeling of the data set used for supervised learning
is often done by an external supervisor, a subject matter expert (SME), an
algorithm, or a computer program. For model training and validation, the data
set is divided into training and test data sets. The model is then used to make
predictions on unlabeled data that has not been seen before but falls into the
type of data the model was trained on. Once more, this may be separated into
classification and regression, with classification being utilized in applications like
image classification and 11K-nearest neighbors to detect customer churn. Sales,
property prices, and other variables are predicted using regression methods.
2. Unsupervised learning: Unsupervised learning is the process of discover-
ing hidden patterns in an unlabeled data set by using reasoning. They are
unsupervised since they do not have supervision as supervised algorithms do.
Unsupervised learning can generally be classified into clustering, association,
anomaly detection, and dimensionality reduction.
3. Reinforcement learning: Reinforcement learning is essentially the process of
learning through continually interacting with the environment. It is a form
of machine learning technique in which an agent learns from an interactive
environment in a trial-and-error manner while continually utilizing input from

Fig. 1.4 Machine learning illustration


1.4 Deep Learning (DL) 5

its prior actions and experiences. The agents in reinforcement learning receive
rewards for carrying out the right actions and penalties for doing them poorly.

1.3 AI vs. ML

The below steps show how AI and ML can be seen as a one big picture and ML as
a part of AI:
Step 1 An AI system is built using machine learning and other models.
Step 2 Machine learning models are created by studying patterns in the data.
Step 3 Data scientists optimize the machine learning models based on patterns in
the data.

1.4 Deep Learning (DL)

Deep learning (Fig. 1.5), a method of machine learning, is essentially a sort


of learning from examples and is inspired by the way the human brain filters
information. Using layers of data filtering to forecast and categorize information
is helpful to computer models. Deep learning is mostly employed in applications
that people perform daily since it processes information similarly to how the human
brain does. Driverless cars can detect a stop sign and tell a pedestrian from a lamp
post thanks to the core technology that powers them. Since most deep learning
techniques make use of neural network topologies, they are sometimes referred to
as deep neural networks.
Deep learning makes use of neural networks with several layers or nodes. Every
node in each layer is linked to the layer below it. The network is built deeper as
the number of layers increases. Signals tend to move across layers of nodes and
give matching weights in artificial neural networks. The impact on the nodes in the
following layer will be greater if a layer is given a bigger weight. The last layer,

Fig. 1.5 Deep learning illustration


6 1 Introduction

which comes before creating the output, compiles the weights of the input nodes
and proclaims the outcome. Deep learning requires sophisticated mathematical
computations and data processing. As a result, the system hardware must be highly
strong. Yet, even with extremely strong technology, training neural networks on it
takes weeks.

1.5 DL vs. ML

Given that deep learning and machine learning are frequently used synonymously,
it is important to understand their differences. Neural networks, deep learning,
and machine learning are all branches of artificial intelligence. Deep learning is
a subfield of neural networks, which are in turn a subfield of machine learning. The
way each algorithm learns is where deep learning and machine learning diverge.
While supervised learning, sometimes referred to as labeled data sets, can be used
by “deep” machine learning to guide its algorithm, it is not a requirement. Deep
learning can automatically identify the collection of features that separate several
categories of data from one another after ingesting unstructured material in its raw
form (such as text or photos). This reduces the need for some human interaction
and makes it possible to handle bigger data sets. Deep learning can be equated to
“scalable machine learning.” Traditional, or “non-deep,” machine learning is more
reliant on human input. In order to grasp the distinctions between different data
inputs, human specialists choose a set of features, which typically requires more
structured data to learn.
Artificial neural networks (ANNs), often known as neural networks, are built
from node layers that each have an input layer, one or more hidden layers, and
an output layer. Each node, or artificial neuron, is connected to others and has a
weight and threshold that go along with it. Any node whose output exceeds the
defined threshold value is activated and begins providing data to the network’s
uppermost layer. Otherwise, that node does not transmit any data to the network’s
next layer. The term “deep learning” simply describes the quantity of layers in a
neural network. Deep learning algorithms or deep neural networks can be defined
as neural networks with more than three layers, inclusive of the input and output.
Just a basic neural network is one with three layers or less. Deep learning and
neural networks are credited with quickening development in fields including speech
recognition, computer vision, and natural language processing.

1.6 Deep Learning and Deep Programming

Deep learning is a subset of machine learning that uses artificial neural networks
with multiple layers to model and solve complex problems. It involves training
the neural network on large data sets to learn patterns and make predictions or
1.7 Deep Learning Networks 7

classifications. Deep learning has been successful in a wide range of applications,


including computer vision, natural language processing, speech recognition, and
robotics.
On the other hand, deep programming refers to the process of creating complex
software systems with many layers of abstraction and complexity. It involves
designing and implementing software systems that are highly modular, scalable,
and maintainable. Deep programming involves a range of programming techniques
and paradigms, such as functional programming, object-oriented programming, and
design patterns.
While both deep learning and deep programming involve complex systems
with many layers of abstraction, they are fundamentally different in terms of their
goals and techniques. Deep learning is concerned with learning from data to make
predictions or classifications, while deep programming is concerned with designing
and implementing complex software systems that are efficient, maintainable, and
scalable.
The DL model uses deep learning networks such as NN, CNN, RNN, etc. to
model a given data set. A data set may be an outcome of the Boltzmann distribution
of a particular physical process. Observed data sets can be considered as Gibbs
sampling on a given physical process.
Deep programming is emerging as a new trend in code generation in a given
language for a given silicon architecture.

1.7 Deep Learning Networks

Deep learning networks are artificial neural networks with multiple layers of
interconnected nodes, also known as artificial neurons. These networks are typically
composed of an input layer, one or more hidden layers, and an output layer. Each
layer consists of many nodes that perform a specific computation and communicate
with nodes in the adjacent layers.
The input layer receives data from the outside world and passes it to the hidden
layers, where the data is transformed through a series of nonlinear transformations.
The output layer produces the final output of the network, which is a prediction or
classification based on the input data.
Deep learning networks can be divided into two main types: feed forward neural
networks and recurrent neural networks. Feed forward neural networks are the
most common type of deep learning network and are used in tasks such as image
recognition and speech recognition. Recurrent neural networks, on the other hand,
are used in tasks such as natural language processing and speech recognition, where
the input data is a sequence of values, such as a sentence or a sound waveform.
The power of deep learning networks comes from their ability to automatically
learn complex features from the input data without human intervention. This makes
them well suited for tasks where the data is high-dimensional and complex, such
as image and speech recognition. With the help of large data sets and powerful
8 1 Introduction

hardware, deep learning networks have achieved state-of-the-art performance in


many areas, including computer vision, natural language processing, and speech
recognition.
Deep learning is essentially a human brain imitation; it is also a multi-neural
network design with a lot of parameters and layers. Below are the three main types
of network designs.
Convolutional Neural Network Convolutional neural networks, which are essen-
tially artificial neural networks, are most frequently employed in the field of
computer vision for the analysis and classification of pictures. It is a deep learning
technique that takes an input picture and applies weights or biases to distinct
characteristics or objects so that it can distinguish between them. Convolutional
layers, pooling layers, fully connected layers, and normalizing layers are frequently
seen in a CNN’s hidden layers. The arrangement of the visual cortex served as
inspiration for the design of a ConvNet, which is like the connection network of
neurons in the human brain.
Recurrent Neural Network Recurrent neural networks are a class of neural
network design that are widely employed in the discipline of natural language
processing and are used in sequence prediction issues. Recurrent neural networks
(RNNs) are so named because they consistently complete the same job for every
element in a sequence, with the results depending on the previous calculations.
Another way to conceive of RNNs is that they have a “memory” that stores details
about previous calculations.
Recursive Neural Network To make a structured prediction across input struc-
tures of varying sizes or a scalar prediction on it, a recursive neural network uses
the same set of weights repeatedly over a structured input. This is done by traversing
a given structure in topological order.
Deep learning is a category of ML that emphasizes training the computer about
the basic instincts of human beings. Deep learning required large data sets to learn
from and to train the model. Deep learning is used in many real-world scenarios
such as the following:
1. Vision for driverless cars (Tesla)
2. Services of chatbots (insurance, banking, e-shopping) .•
3. Pharmaceuticals (customizing medicines based on the genome and diseases)
Deep learning is a type of machine learning and artificial intelligence (AI) that
imitates the way humans gain certain types of knowledge [1]. It is beneficial in
collecting, analyzing, and interpreting large volumes of data. This will in turn speed
up the process and fast analytics can be performed to obtain accurate predictions.
A deep neural network (DNN) is a neural network with multiple hidden layers
between the input and output layers. Similar to shallow NNs, DNNs can model
complex nonlinear relationships.
This 1D convolution is a cost-saver; it works in the same way but assumes a
1-dimension array that makes a multiplication with the elements. If you want to
1.8 Deep Learning Network Deployment 9

visualize a matrix of either row or columns, i.e., a single dimension when we


multiply, we get an array of the same shape but of lower or higher values; thus,
it helps in maximizing or minimizing the intensity of values. The link here provides
a simulation of 1D convolution. More discussion on 3D convolution here.
3D array from jpg or gif file data. So the main difference is that it can pass a data
format argument to img_to_array to put the channels either at the first axis or the last
axis. Further, it would ensure that the returned array is a 3D array (for example, if
the given input img is a 2D array which might represent a grayscale image, then
it would add another axis with dimension 1 to make it a 3D array). In TensorFlow
there are different convolution layers: Conv1d, Conv2d, and Conv3d. The first one
is used for one-dimensional signals like sounds; the second one is used for images,
grayscale or RGB images; and both cases are considered to be two-dimensional
signals. The last one is used for three-dimensional signals like video frames and
images as two-dimensional signals vary with time. In this case Conv1d is used as a
one-dimensional signal and you can specify the number of filters in the arguments
of a method.

1.8 Deep Learning Network Deployment

Neural network (NN), convolutional neural network (CNN), recursive neural


network (RNN), etc. are considered to be deep learning networks. Deep learning
networks are used in real-time application systems in various domains such as health
and industrial technology [2–11].
Deep learning is a subset of machine learning that uses artificial neural networks
with multiple layers to model and solve complex problems. It is inspired by the
structure and function of the human brain, where each neuron processes information
and communicates with other neurons through connections called synapses.
In deep learning, a neural network is typically composed of multiple layers of
interconnected nodes that process input data and gradually transform it into output
data through a process called forward propagation. During training, the network
adjusts its parameters (weights and biases) to minimize the difference between
its predictions and the true labels of the training data, using a technique called
backpropagation.
Deep learning has proven to be very effective in solving a wide range of
problems in computer vision, natural language processing, speech recognition, and
other domains, achieving state-of-the-art performance on many benchmarks. It has
enabled many applications such as self-driving cars, image recognition, language
translation, and more.
10 1 Introduction

There are numerous examples of deep learning applications across various


domains. Here are a few examples:

Image recognition: Deep learning has been used to develop highly accurate image
recognition systems, such as Google Photos, which can accurately identify and
categorize images based on their content.
Natural language processing (NLP): Deep learning has been applied to NLP tasks
such as sentiment analysis, language translation, speech recognition, and text
generation. For example, the Google Assistant uses deep learning to understand
natural language queries and respond with relevant information.
Autonomous vehicles: Deep learning is a key technology in the development of
autonomous vehicles. Self-driving cars use deep learning to analyze sensor data,
such as camera images, LIDAR data, and radar data, to recognize and respond to
different driving scenarios.
Healthcare: Deep learning is being used to improve medical diagnoses and
treatments. For example, deep learning algorithms can analyze medical images
to detect diseases such as cancer or to predict patient outcomes based on medical
records.
Robotics: Deep learning has been applied to robotics, enabling robots to perform
complex tasks such as grasping and manipulation of objects. This has numerous
applications in manufacturing, agriculture, and other industries.
Chapter 2
Low-Code and Deep Learning
Applications

Simplicity is the ultimate sophistication in the world of deep


learning applications.

2.1 Role of Tool Set in Applications

Artificial intelligence (AI) training environments are different from deployment


platforms. A similar programming environment provides obstruction to carry
trained networks into limited deployment capabilities. In creation of these model
developments, there is certainly a strong need to minimize the size of the model
in the context of weights at the cloud deployment. This optimization process is not
needed if there are cloud-side deployment policies.
In creation of these model developments, there is certainly a strong need to
minimize the size of the model in the context of weights at the cloud deployment.
This optimization process is not needed if there is a cloud-side deployment policy.
Deployment of trained models mapped on to edge requires a lot more attention
and specification. In the context of an Android phone working as an IoT device,
it will work well for small-size models. Deploying a huge model in an Android
phone might create challenges and may not work well, if model size is reduced to
fit in an Android phone. Intelligence IoT edge is playing a critical role in real-time
inferencing.
Historically, there have been systems with a high amount of engineering
complexity in terms of deployment and also in operation. For example, SCADA
is one such system that has been working in the power generation industry, oil and
gas industry, cement factories, etc. In fact, SCADA includes humans in a loop and
makes it as supervisory control and data acquisition.
In the advent of deep learning networks and its success in the modern digital side,
there have been huge amounts of interest among researchers to carry deep learning
models to abovementioned industrial verticals and trying to bring up intelligent
control and data acquisition. In the place of a supervisor, it appears that an intelligent
IoT edge is coming up to perform those tasks that are handled by human beings in
the form of a supervisor. Thus, there is immense interest in making IoT edge as

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 11


J. Singaram et al., Deep Learning Networks,
[Link]
12 2 Low-Code and Deep Learning Applications

intelligent systems in these core engineering verticals apart from consumer industry
requirements.

Working with Deep Learning Networks


Deploying DL networks in cloud and also in IoT edge.
1. How to Scale up Deep Learning Workloads in IoT Edge?
2. How is the time to train and deploy models of deep learning networks
shortened?
3. How is a lack of deep learning skills in a given organization addressed?

In the case of deployment, there is a trend in making smartphones as IoT edge


such that the same device can be used without much investment during the learning
time of each learner. However, industrial deployment is expected to happen in
devices like Jetson Nano, Ultra96-V2, mmWave Radar IWR 6843, etc.
Figure 2.1 provides a detailed information on the capability of the tool set which
is used in the design and development of AI applications. As shown in Fig. 2.1, a
good tool set can go along with a less skilled human team. AI being a new element in
industry, there is not much maturity in the tool set and also in human skill on given
tasks. Effort is made to provide details on useful tool set and sample commands.
Moreover, effort is given to train human resources for the AI market segment. Tool
set capability has been improving a lot in open source.
Discussion on low code or no code is given in [12]. Mostly, it appears that low
code is the order of the day in the cloud native side of AI, ML, and DL applications.

Fig. 2.1 Tool set in AI


applications
2.2 Schematic Representation of Deep Learning Architecture 13

But in embedded systems (called as IoT edge or IoT node in modern Industry 4.0),
still there is a challenge in deploying AI, ML, and DL applications in edge or in
node native applications.
In the past, there have been two major routes for businesses to take on their way
to application development. Buy apps ready-made from an external vendor or build
and customize them from scratch using skilled developers and coders. Trending
news in market shows, there is rise and growing sophistication of low-code and no-
code development alternatives that bring the power of application development to
users across the business.
Experiment performed on chatGPT to get code for “matrix multiplication in
cores.” A generated code by chatGPT is clean and as good as a hand coded
by human. It appears that code generation is much more involved and requires
information on silicon architecture and also suitable algorithms for the same silicon.
Code generation is not new in industry, for example, MATLAB used to generate C
code for a given Simulink diagram. DLTRAIN is a new-generation tool set that is
going step ahead and it is very close to the no-code route. Tool sets play a major
role in industry. Most of them are open-source tools and increasingly complex to
use and provide commercial service in inference. DLTRAIN is developed to serve
as a single tool set to handle training and embedded deployment. Discussion from
[13] “Software Engineering Institute in Carnegie Mellon University” is given in the
following.
The need for an engineering discipline to guide the development and deployment of AI
capabilities is urgent. For example, while an autonomous vehicle functions well cruising
down an empty race track on a sunny day, how can it be designed to function just
as effectively during a hail storm in New York City? AI engineering aims to provide
a framework and tools to proactively design AI systems to function in environments
characterized by high degrees of complexity, ambiguity, and dynamism. The discipline of
AI engineering aims to equip practitioners to develop systems across the enterprise-to-edge
spectrum, to anticipate requirements in changing operational environments and conditions,
and to ensure human needs are translated into understandable, ethical, and thus trustworthy
AI.

2.2 Schematic Representation of Deep Learning Architecture

Data collection is presented in the bottom layer of Fig. 2.2. Engineering domain
knowledge is key to handling data collection for training, testing, and deployment
of deep learning networks.
The top layer handles deployment of deep learning networks, where deployment
might happen in CPU, GPU, FPGA, DSP, or combinations of mentioned computing
devices. In between layer includes data preprocessing, training, and testing of deep
learning networks. A model of deep leaning networks also provided the above data
preprocessing layer. Generic AI applications include the following steps as part of
the necessary steps to design, develop, and deploy deep learning networks.
14 2 Low-Code and Deep Learning Applications

Fig. 2.2 Layers in AI Application design and deployment

2.3 Deep Learning Applications

2.3.1 Data Set

Independent Chap. 5 is added to provide information to create a data set which is


used to train and test restricted Boltzmann versions of deep learning networks.
2.3 Deep Learning Applications 15

Data set collection requires domain knowledge of a particular physical system.


Thus, readers request to extend their reading of respective domain research papers
or books to create a data set.
Configuration “data set with labels” itself is a huge segment of programming.
There are many open-source data set models available for this purpose. In the
tutorial chapter, few data sets are given to use with workflow in learning. For
example, MNIST data set and Potato Leaf data set are those two items used in the
tutorial.

2.3.2 Model Design for Deep Learning Network

In the deep learning framework, the network models include NN, CNN, RNN,
LSTM, GAN, VAE, etc. Many more variations also persist but mostly all are based
on the restricted Boltzmann machine (RBM).
Independent Chap. 6 is included to handle mathematical theory which is used in
designing deep learning networks. Readers are expected to refer to books or research
papers on probability distribution, Boltzmann distribution, restricted Boltzmann
distribution, and neural networks.

2.3.3 Train Model

Training of a deep learning network model uses the available hardware and is one
of the most time-consuming.
PyTorch and TensorFlow are two major open-source platforms that are generally
used in training a given deep learning network model through the available data set.
The TensorFlow tool set is used in this book to illustrate examples and associated
events in training NN and CNN.
IBM Cloud service offers both of these open-source platforms along with IBM
WATSON Studio.
DLtrain is designed and developed (with no dependency on open-source AI
software packages).

2.3.4 Test Model

A trained model is required to undergo testing by using test data set which
is segmented for testing. Testing also requires complex platforms like PyTorch,
DLtrain, TensorFlow, etc. But workload may be compared to training deep learning
networks.
16 2 Low-Code and Deep Learning Applications

2.3.5 Save Model

A trained and tested model is required to be stored in storage media by using save
model methods that are defined in PyTorch, TensorFlow, DLtrain, etc. As of now
there is no IEEE standard file format to store a deep learning network model.

2.3.6 Load Model

A trained and tested model is useful to perform inference. Load model methods are
using a tool set available in PyTorch, TensorFlow, DLtrain, etc. The model can be
deployed on to different types of embedded devices.

2.3.7 Deployment

Microservice appears to be trending in using the inference segment. Example is


given to illustrate methods and apparatus used in the training of DL networks by
using CPU and CPU.+GPU hardware configurations.
DLtrain provides a tool set which can be used in cloud native AI applications and
also edge or node native AI applications. Thus, the deployment team is not required
to develop a new set of tools or quantize trained networks to fit in small computing
infra-enabled embedded systems.

2.4 Custom Framework: DLtrain for AI

Business owners for enterprises of all sizes are struggling to find the next generation
of solutions that will unlock the hidden patterns and value from their data. Many
organizations are turning to artificial intelligence (AI), machine learning (ML),
and deep learning (DL) to provide higher levels of value and increased accuracy
from a broader range of data than ever before. They are looking to AI to provide
the basis for the next generation of transformative business applications that span
hundreds of use cases across a variety of industry verticals. AI, ML, and DL have
become hot topics with global IT clients. They are driven by the confluence of next-
generation ML and DL algorithms, new accelerated hardware, and more efficient
tools to store, process, and extract value from vast and diverse data sources that
ensure high levels of AI accuracy. However, AI client initiatives are complex and
often require specialized skills, ability, hardware, and software that are often not
readily available. AI-enabled application deployment includes both the software and
2.4 Custom Framework: DLtrain for AI 17

the hardware infrastructure that are deeply optimized for a complete production AI
system.
The engineering workforce in industries is highly enthusiastic about adopting
new development tools and the accompanying environments. Engineering college
teaching staff with good interest in setting up a “Cognitive Computing Lab” in their
college after going through the proposed workshop. Self-motivated students with an
interest in learning DL-based application development and deployment in IoT edge.
The abovementioned problems and associated tool sets have their own difficulties
at many levels. DLTRAIN is designed to remove most of the issues and provides a
good solution for train, test, and deploy given NN and CNN models. Deployment
can be on the IoT edge as well.
A custom AI framework provides consistency across AI in IoT edges. For
example, real-time inference is emerging as a critical need of the food and medical
service delivery industry to process and extract value from vast and diverse data
sources that ensure high levels of accuracy in delivered service. However, AI-
enabled enterprise service initiatives are complex and often require specialized
skills, ability, hardware, and software that is often not readily available. AI-enabled
application deployment requires being deeply optimized and also production-ready.
The host OS is provided by NVIDIA and the same is used by the team in
NVIDIA. NVIDIA provides a driver to handle A100 hardware from CPU. Most
importantly, “container runtime” provides remote access to deploy containers for
AI model training. Enterprise business customers have the option to use their
application containers for DL/ML. Fresh AI model scripts or pre-trained models
can be used as an input to build AI applications.
Democratize deep learning: Pushing the limit on deep learning’s accuracy
remains an exciting area of research, but as the saying goes, “perfect is the enemy
of good.” Existing models are already accurate enough to be deployed in a wide
range of applications. Nearly every industry and scientific domain can benefit from
deep learning tools. If many people in many sectors are working on the technology,
we will be more likely to see surprising innovations in performance and energy
efficiency.
The DLTRAIN platform provides options to train NN and CNN models by using
an image class of data set. DLTRAIN is designed to make easy-to-deploy DL in edge
computing devices. DLTRAIN is a perfect tool to handle issues in porting trained
DL models in edge computers that are having CPU and GPUs. A silicon vendor can
take advantage of the above infrastructure and move their GPU silicon into IoT edge
device market. Porting PyTorch and TensorFlow models on to embedded device is
one of the challenging problems and DLTRAIN is solving the same issue. DLTRAIN
provides C and C.++ code along with a license for the customer team to quickly
deploy DL-enabled devices into the market.
18 2 Low-Code and Deep Learning Applications

2.5 Sample AI Application Deployment

2.5.1 Quick Look: IBM Watson

AI is used everywhere by everyone, specifically by professionals to transform data;


this book resources the innovative business models with AI. Further, the book
reveals and demonstrates how to use data to make recommendations with confidence
and design, develop, deploy, and conduct advance research and discovery through
IBM Watson and other open-source tools with innovative learning experiences. The
goal of this book is to introduce and experience the AI systems to supplement human
intelligence along with IBM Watson Studio.
Watson Studio streamlines the machine learning and deep learning operations
necessary to integrate AI into your company and spur creativity. It offers a set of
tools that enable data scientists, application developers, and subject matter experts
to connect to data, manage it, and utilize it to create, train, and deploy models at
scale. An extremely strong computational infrastructure is necessary for successful
AI initiatives, together with a team, data, and algorithms.

2.5.2 IBM Watson Service and Monitor Tomato Farm

Customization-Ready Visual Recognition Microservice


Application is created to grade tomato quality. Tomatoes must be assessed before
they are submitted to distribution outlets. For example, sandwich grade tomato
requires “high water content and also zero spots (as shown in Fig. 2.3) in defection
on its skin.” This is because uncooked tomato is used in sandwiches and human
beings are subjected to eat uncooked tomato along with the sandwich. Thus,
the safety of human health is important. The agriculture sector brings up major
challenges in handling workflow to monitor the health of growing crops. Most
workers look at crop growth on a daily basis and make a decision on “to apply
pesticide or not.” If there is a delay in applying pesticides, then the crop will not
yield a good harvest. Labor cost per day increased and also not many young people

Fig. 2.3 Agriculture sector


2.5 Sample AI Application Deployment 19

have the inclination to take up work on a farm on a daily wage basis. Added to this,
there is a need to have the capital to train these workforces and deploy them in the
field. All these added up to the level in which farmland owners get nervous to go in
for short-term crops such as potato, tomato, wheat, etc.
IBM Watson Studio-based visual recognition service is used to build an appli-
cation that can be a digital assistant to the agriculture workforce in the agriculture
industry. In case, if this is expected to work locally, then a local deployment of
infrastructure (visual insights) is required for visual intelligence service, leveraging
automation to enhance agriculture workers’ productivity, identifying crop disease,
and acting on insights faster with machine learning optimization. This will lead to
sustained output from harvest and also provide relief to agriculture farm owners
to manage cash flow well. Deploy AI-based applications in agriculture farms on a
large scale by using on-premise inference ability in the form of mobile applications
or web applications. In this direction, IBM cognitive computing (visual insights)
infrastructure appears to be the best fit to deliver high-performance computing
requirements.
Deployment companies can customize inference applications for smartphones.
Tomato packing line workers use visual intelligence micro web service to become
part of the workflow to monitor and deliver good-quality tomatoes. During moni-
toring, workers can be efficient by using a “customized visual insight application
service” as a digital assistant to check the quality of tomatoes.
Cost per diagnosis is a critical parameter and the complexity of workflow to
perform diagnosis is another parameter. “Customized visual insight application
service” addresses both these parameters by using the IBM Watson IoT platform
to reduce complexity in workflow and the visual recognition platform to reduce
cost per diagnosis. Innovation in creating optimal yet robust models by using deep
learning convolutional neural networks has led to low-cost “customized visual
insight application service.”
For example, agriworkers start diagnosis work and get results within 2–3 minutes
by using a smartphone app with a few clicks (sub 5 clicks). Also cost per diagnosis
is 5 Rs. Workflow complexity for diagnosis is removed and this is brought down to
a few clicks in smartphone applications.
Tomato crop monitoring requires sensor deployment in the tomato field. These
sensors (IoT node) are used to record data (for example, humidity, wind speed, rain
level, sunlight intensity, soil moisture, etc.) and send recorded data to IoT edge.
The application of artificial intelligence at the IoT edge is aimed at comprehending
incoming sensor data and transmitting classification or prediction outcomes to the
Watson IoT platform.
The application deployed in edge works as an MQTT client device and provides
the following two services: send notification service to those who are in the
subscription list and receive notifications from those IoT devices that are in the
publish list. The Pub-Sub model-based “application deployed in edge” provides
the latest information on crop health to agriculture workers and also to those in
the subscription list. IBM Watson provides the MQTT broker platform to manage
MQTT clients that are deployed in IoT nodes, IoT edge, applications in IBM
20 2 Low-Code and Deep Learning Applications

Cloud, and user access devices such as smartphone, desktop PC, etc. For example,
applications in IoT sensors and IoT edge are working in asynchronous mode. In this
case, there is a need to have a broker to handle data collection from IoT nodes or
from IoT edge to both. “Application deployed in edge” is designed to work with IoT
nodes or IoT edges that are connected via 4G or 5G or Starlink satellite modem.
“Application deployed in edge” is working as a microservice to manage title-based
Pub-Sub message handling service. IoT nodes and IoT edges are not required to
have global IP addresses to use the abovementioned service. It is expected that IoT
node devices in the field may not have good hardware and software infrastructure to
have clients that are based on rest API, XMPP, etc. “Application deployed in edge”
is supporting text string, number string, and JPG data.
A sensor network is deployed in the tomato field. Optionally, sensor nodes can
be connected directly to the Watson IoT platform by using the MQTT client in the
sensor. But this is not recommended because sensor nodes need to have a good
amount of hardware infrastructure to make the above happen. It appears that the
optimal way is to deploy IoT edges in the field and connect with IoT nodes (sensors).
In this process, the amount of investment required for an IoT node (sensor) network
will be optimal. IoT edge will have MQTT clients and edge will be connected
with the Watson IoT platform as well by using 4G network or by using an Internet
connectivity infrastructure in a given tomato field.
“Application deployed in edge” is a limited capability MQTT broker and it
is used to include all those nodes and edges that are part of a 2G/3G network.
Mostly, it provides customized service to each node, each edge, agriworkers, tomato
traders, tomato buyers, and farm owners. Web service and smartphone app services
are deployed in IBM Cloud. The IBM Watson AI component is used to provide
machine learning and deep learning capability to web application and mobile
application service. The mentioned service is deployed by using containers (for
example, Docker). For long-term benefit to farm owners, it is recommended to have
an on-premise mini cloud platform such that monthly expenditure is cut down in
communication, and also, farm owners can derive advantage by having near real-
time service for the abovementioned personas.
“Create a real-time stream of sensor data and receive control data for the
Agriculture fields with MQTT and Kubernetes.” This is meant to build the required
product technical prototype and will show how to turn open data into an open
event stream with MQTT and microservices on Kubernetes. MQTT is a lightweight
messaging protocol which is useful in situations of low network bandwidth.
Featured technologies:
1. Kubernetes: container orchestration
2. MQTT: lightweight publish/subscribe protocol
3. Application deployed in edge (limited and yet AI-driven MQTT broker)
Workflow with major tool chain:
1. Create Kubernetes cluster.
2. Setup Openshift.
2.5 Sample AI Application Deployment 21

3. Configure container registry.


4. Build required images.
5. Create Helm [Link].
6. Install with Helm.
The user (farm owner, agriworker, tomato sales shop, tomato buyer) accesses
the website to get to know the tomato field. The purpose of each person might be
different, though data is the same for all. Web browsers directly access the MQTT
service.
Edge native application polls the agriculture field every minute looking for new
data. Data is pushed to MQTT broker. The user is subscribed to the MQTT service
and sends all new data to the database. On any new data, it computes the current
state of the tomato crop in a given field and publishes both.

2.5.3 Real-Time Audit of IP Networks

Kanshi is an application which is using deep learning networks to perform real-time


audit of IP networks. An early version of Kanshi and its source code is shared in
GitHub and Tutorial chapters provide a detailed workflow which is very useful for
learners or experts to try out deep learning applications in a short period of time.
More importantly, tutorial is based mostly on no code. Mentioned workflow with
screenshots given in Google Drive slides and the same is used by many in the past
to learn deploying DL in IoT edges.
Chapter 3
Introduction to Software Tool Set

The right tools are the bridge between ideas and results in the
world of deep learning.

This section of the book presents tool sets for deep learning applications and
primarily focuses to illustrate the instructions to configure the environment step
by step with data, operating system, application, hardware, and other auxiliary
services. The novelty of this section is describing the detailed practical configuration
techniques for setting up of virtual environments with TensorFlow and PyTorch
open-source tools in governance with IBM Watson and Keras Support. The book
further presents the eye-opening techniques to install, configure, and run methods
for machine learning coding editors such as Jupyter Notebook in various environ-
ments. More importantly, the practical training of tool engagement for deep learning
is inevitable for students, professionals, domain experts and data scientists which is
highly recommended to gain the best learning experience.
Also, this chapter includes a real-time console diagram on each of the tool and
application configuration that makes domain experts understand data science and
associated workflows that are possible in the Watson AI platform. On the other
hand, it will be practically infeasible to train data scientists on domain knowledge.
The intent is to ensure readers of this book will have insight knowledge of tools and
their working environment prior to writing deep learning techniques. Open-source
software provides a major boost for deep learning applications. It is known that
open-source software might undergo rapid change or it might vanish as well. Thus,
busing applications based on open-source software modules require its own list of
tool sets to maintain for building apps and deploying applications.

3.1 Virtual Environment for Required Tool Set

virtualenv is a tool to create isolated Python environments. virtualenv creates a


folder which contains all the necessary executable to use the packages that a Python
project would need. A virtual environment provides an option for each project

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 23


J. Singaram et al., Deep Learning Networks,
[Link]
24 3 Introduction to Software Tool Set

Fig. 3.1 Virtual environments for projects

to have its own tool chain for the development of an application and also for
deployment of an application. In a given machine many applications might have
deployed and each application might require a list of libraries or packages.
For example, in Fig. 3.1, library A has two versions such as A1 and A2. Suppose
application J1 is using A1 and J2 is using A2. In this case, it is important to keep two
versions of library A1 and A2. Keeping two versions of the library creates problems
at a given point of time.
In the above example, library A has two versions such as A1 and A2. It is not
safe to keep A1 and A2 in Project 1 and expect application J1 to work properly.
For example, /tmp/AI/ is a folder used as project root directory. And “WorkDL”
is used as a name for a virtual environment. Following, it creates a copy of Python
in an /tmp/AI/ folder in which the user runs commands and also places it in a folder
named WorkDL.
Use link [14] for source code and workflow documentation.
After completing installation of “virtualenv,” users can use pip or pip3 to install
packages of their choice. Importantly, installed packages via pip or pip3 will be
placed in the WorkDL folder. The same installed packages will be not available
globally in a given PC or device. Using pip will be placed in the WorkDL folder
isolated from the global Python installation. The following is used to check versions
of pip in this environment:
Virtual environment 1.7 onwards appears to be good.
3.2 TensorFlow: An AI Platform 25

Virtual environment 1.7 onwards will not include the packages that are installed
globally.
Listing tool sets installed in WorkDL can be obtained by using pip or pip3. In this
case both pip and pip3 provide the same result. The location of pip will be useful
and use the following to get the location of the installation folder. Use freeze to keep
the user environment consistent. File [Link] has a list of packages that are
installed or requires installation. The same is equal to “pip list,” where “pip list”
displays installed packages.

3.2 TensorFlow: An AI Platform

TensorFlow is an open-source machine learning platform to handle deep learning,


machine learning, and other tasks including statistical and predictive analytics. It
provides the following four key abilities:
1. Efficiently executing low-level tensor operations on CPU, GPU, or TPU
2. Computing the gradient of arbitrary differentiable expressions
3. Scaling computation to many devices, such as clusters of hundreds of GPUs
4. Exporting programs (“graphs”) to external runtimes such as servers, browsers,
mobile, and embedded devices
Keras is a deep learning API written in Python, running on top of the machine
learning platform TensorFlow. The following steps are useful to set up Keras and
use Keras to build deep learning applications. In TensorFlow, data is not stored as
integers, floats, or strings. These values are encapsulated in an object called a tensor,
a fancy term for multidimensional arrays. It is good to keep data in TensorFlow
instead of a list in Python.
1. Parallelism By using explicit edges to represent dependencies between opera-
tions, it is easy for the system to identify operations that can execute in parallel.
2. Distributed execution By using explicit edges to represent the values that
flow between operations, it is possible for TensorFlow to partition your program
across multiple devices (CPUs, GPUs, and TPUs) attached to different machines.
TensorFlow inserts the necessary communication and coordination between
devices.
3. Compilation TensorFlow’s XLA compiler can use the information in your
dataflow graph to generate a faster code, for example, by fusing together adjacent
operations.
Starting with TensorFlow 1.6, binaries use AVX instructions which may not run
on older CPUs. However, it is good to use TensorFlow 2.0 or latest versions of
TensorFlow.
AVX (Advanced Vector Extension) is very useful to perform SIMD computations
in a given CPU hardware. The following is useful to test, given CPU provides AVX
computing support or not.
26 3 Introduction to Software Tool Set

If the user machine does not have a GPU and the user wants to utilize CPU
as much as possible, then the user should build TensorFlow from the source
optimized for user CPU with AVX, AVX2, and FMA enabled. (This is required to
build TensorFlow for a given CPU instead of installing TensorFlow that is directly
available as an installation executable.)
Refer to link [15] for the source code and sample documentation.
Check the TF installed in the user virtual environment. For example, the above
is an old virtual environment and in that TF is installed. In a given new virtual
environment WorkDL, install by using URL [16]

3.2.1 Keras in TensorFlow

Install TensorFlow and Keras in WorkDL. Use link [16] for more information on
Keras Installation
File [Link] is created in the /tmp/jetson folder by using the command
“pip freeze [Link].” The created file is empty and has added items for
Keras and TF installation. Use link [17] to obtain a workflow which is used in Keras
to train the TensorFlow model.
Step 1. Set up your environment.
Step 2. Install Keras.
Step 3. Import libraries and modules.
Step 4. Load image data from MNIST.
Step 5. Preprocess input data for Keras.
Step 6. Preprocess class labels for Keras.
Step 7. Define the model architecture.
Step 8. Compile model.
Step 9. Fit model on training data.
Step 10. Evaluate the model on test data.

3.2.2 TensorFlow Image in Docker

What is needed to run a TensorFlow-based application in a container?


Docker is the easiest way to enable TensorFlow GPU support on Linux since only
the Nvidia® GPU driver is required on the host machine (the Nvidia® CUDA®
toolkit does not need to be installed). Users can use multiple variants at once.
For example, the following URL TensorFlow releases images to your machine
TensorFlow Docker images that are already configured to run TensorFlow.
How are Python 3 and TensorFlow brought up in Docker to run a TensorFlow
container?
The above given questions are handled in URL [14]. A sample code is given with
screenshots.
3.5 Setting Up Edge AI Computer (Jetson Nano) 27

3.3 JupyterLab

JupyterLab is a popular web-based user interface for Project Jupyter. Execution can
be done cell by cell and the same is very useful for design engineers to trace issues
with ease.
How is Python 3-enabled JupyterLab brought up?
Install JupyterLab on Python 3.5 or above. The above given questions are handled
in URL [14].

3.3.1 Jupyter Notebook

The Jupyter Notebook is the early web application for creating and sharing com-
putational documents. It offers a simple, streamlined, document-centric experience.
Jupyter supports over 40 programming languages, including Python.
Install Jupyter Notebook for Python 3.5 or above. Details on the use of Jupyter
Notebook is given in [14].
Access to Remote Jupyter Notebook is a very useful tool set while working
with near edge machine. Use URL [18] to access information on “Remote Jupyter
Notebook.”

3.4 JupyterLab: Latex

Latex is very useful to create scientific and research-level documents. Jupyter Lab
provides extensions to create Latex versions of content that are present in cells of
Jupyter Lab.
Install Latex extension with JupyterLab on Python 3.5 or above. Use URL [14]
for more information.
To convert to PDF, nbconvert uses the TeX document preparation ecosystem. It
produces an intermediate .tex file which is compiled by the XeTeX engine with the
LaTeX2e format to produce a PDF output.
Users can use an Overleaf account [19] to compile and generate PDF files from
a given file which are output from the Jupyter Lab.

3.5 Setting Up Edge AI Computer (Jetson Nano)

IoT edge devices use GPU for real-time inference. Jetson Nano is one of the
emerging IoT edge devices and Nvidia has released a development kit for Nvidia
Jetson Nano.
28 3 Introduction to Software Tool Set

The URL [20] offers a comprehensive example with detailed instructions for
configuring an AI computer. It furnishes 15 essential steps for the installation of the
required software to conduct inference on a Jetson Nano device.

3.6 IBM Watson Machine Learning: Community Edition

IBM Watson Machine Learning Accelerator for Enterprise AI: Watson Machine
Learning Accelerator, a new piece of Watson Machine Learning, makes deep learn-
ing and machine learning more accessible to your staff and brings the benefits of
AI into your business. It combines popular open-source deep learning frameworks,
efficient AI development tools, and accelerated IBM® Power Systems™ servers.
Now your organization can deploy a fully optimized and supported AI platform that
delivers blazing performance, proven dependability, and resilience. Watson Machine
Learning Accelerator is a complete environment for data science as a service,
enabling your organization to bring AI applications into production. It enables rapid
deployment.
It includes the most popular deep learning frameworks, including all required
dependencies and files, precompiled and ready to deploy. The entire AI suite
has been validated and optimized to run reliably on accelerated power servers. It
incorporates the most popular deep learning frameworks. Watson Machine Learning
Accelerator gives access to power-optimized versions of all of the most popular
deep learning frameworks currently available, including TensorFlow, Caffe, and
PyTorch. Watson Machine Learning Accelerator runs on IBM Power-accelerated
server HPC, a platform that runs not only your deep learning but also a wide
variety of HPC and high-performance data analytic workloads. It leverages unique
capabilities of accelerated power servers, delivering performance unattainable on
commodity servers, and provides for hyperparameter search and optimization and
elastic training to allocate the resources needed to optimize performance, and
distributed deep learning provides for rapid insights at massive scale. A large model
support facilitates the use of system memory with little to no performance impact,
yielding significantly larger and more accurate deep learning models.
The IBM Watson Machine Learning Community Edition is available as a no-
charge orderable part number from IBM.
Install PowerAI in Conda Environment to use GPU Get “Conda” for Power 9
machine by using the URL [21].
The WML CE packages are installed into a Conda environment, so after
installation is complete, the frameworks are ready for use. Each framework provides
a test script to verify some of its functions. These test scripts include tests and
examples that are sourced from the various communities. Note that some of the
included tests rely on data sets (for example, MNIST) that are available in the
community and are downloaded at run time.
3.7 Tool Set to Build DLtrain 29

3.7 Tool Set to Build DLtrain

DLtrain is an embedded AI-ready tool set. Details on the same are given on the
GitHub link with working source code. Ubuntu 18.04 machine is used to build
DLtrain. And also it is built for X86, arm, ppc64le.
The DLtrain platform in Fig. 3.2 uses multiple resources (for example, CPU and
GPU) to train deep learning networks. The DLtrain platform is available for multiple
CPUs such as X86, ARM, and ppc64le.
Users having access to Jetson series hardware can use DLtrain to run training
workload and also inference workload.
The DLtrain platform is available for Windows machines as well.

3.7.1 Target Machine Is X86 with Ubuntu

The GitHub page of DLinIoTedge provides the necessary source code and informa-
tion to build the DLtrain.
Reference link [22] provides the source code of deep learning networks for deep
learning network training application.
Reference link [22] provides the source code of deep learning networks for
inference by using a deep learning network model.
The abovementioned deep learning network model training and inference plat-
form is named as DLtrain.

Fig. 3.2 DLtrain to train NN and CNN


30 3 Introduction to Software Tool Set

The objective is to build DLtrain for x86 Ubuntu machines.

Use URL [23] to build DLtrain for inference. [Link] for x86 gcc tool set
is given in the above URL.
Use cmake to generate makefile. Use make to build executable of DLtrain. Use
DLtrain to train deep learning networks. Use DLtrain to perform inference by using
deep learning networks.
Use URL [24] to build DLtrain to train deep learning networks. The above-
mentioned four steps are used in the above URL to build DLtrain for the training
workload. The [Link] for x86 gcc tool set is given in the above URL.

3.7.2 Use Docker: Target Machine Is X86 with Ubuntu

The Docker image of DLtrain is created by using a source code in the following link
[25].

3.7.3 Target Machine Is Power 9 with Ubuntu

DLtrain for Power 9 machine is created by using source code in the following link
[26].
Ubuntu 18.04-based g++, gcc tool set used (cmake also used to create makefile)
to create the DL application that is running in Power 9.
The objective is to build DLtrain for Power 9 Ubuntu machines.

The objective is to build DLtrain for Power 9 Ubuntu machines.

Run it in Power 9 (training workload) and store it in a model with the name as
jjnet. Where jjnet is a model and this is output after training. For inference, this
model alone is enough to perform inference at IoT edge. Then use the jjnet model
and perform inference by using edge devices and DLtrain for edges.
Use URL [26] to build DLtrain for training workload and also for inference
workload. The [Link] for Power 9 gcc tool set is given in the above URL.
1. Use cmake to generate makefile.
2. Use make to build executable of DLtrain.
3. Use DLtrain to train deep learning networks.
4. Use DLtrain to perform inference by using deep learning networks.
3.8 Docker Image of DLtrain Application to Train CNN 31

3.7.4 Target Machine Is Jetson Nano with Ubuntu

DLtrain for Jetson Nano machine is created by using the source code in the
following link [27].
The objective is to build DLtrain for machines.

The objective is to build DLtrain for Jetson Nano ( ARM ) machines.

Use URL [27] to build DLtrain for the training workload and also for inference
workload. [Link] for Power 9 gcc tool set is given in the above URL.
1. Use cmake to generate makefile.
2. Use make to build executable of DLtrain.
3. Use DLtrain to train deep learning networks.
4. Use DLtrain to perform inference in Jetson Nano by using deep learning
networks.
Running inference workload is the key focus by using ARM and GPU. The above
URL is handling ARM and also GPU during creating executables. The GPU side of
the source code required further development.

3.7.5 Target Machine Is X86 Windows 10

DLtrain for Windows 10 machine is created by using the source code in the
following link [28].

The objective is to build DLtrain for X86 Windows 10 machines.

Use the above URL to build DLtrain for the training workload and also for the
inference workload.

3.8 Docker Image of DLtrain Application to Train CNN

DLtrain is an embedded compatible deep learning platform to handle issues in


porting trained AI models in edge computers and also perform inference in edge
devices. A drawback with this multiplatform support is that one Docker image has
32 3 Introduction to Software Tool Set

to be built for each specific target platform, for example, a specific operating system
and hardware CPU architecture. It is required to create a Docker image of a DLtrain
for a specific target platform. Docker image read-only templates are nothing but the
building blocks of a Docker container. A Docker container is the running instance
of a Docker image.
URL [29] provides details on “using Docker image of DLtrain.”

3.9 Deploy DL Networks in Near Edge

DLtrain is a tool set which can be used to train NN and CNN models of deep
learning networks. DLtrain is used in the deployment of “deep learning networks
in near edge.” DLtrain is the best option for the embedded application development
team and also for the deployment team. Algorithms and edge silicon startups
have attracted huge investment, but tool developers are still catching up. Tool
development is compensating for lingering skills gaps by moving to higher levels of
abstraction.
Open-source tool sets are used in the training of NN or CNN. But during
deployment, there is a need to use a tool set from a particular silicon vendor.
Inference engine clients are expected to work from near edge and receive input data
in real time from IoT nodes, for example, home gateway machine receiving image
or video from doorbell camera for real-time inference.
Add the following port along with the IP address of

1 http ://192.168.1 3:8765/


2 localhost .

The following might help users to get connected with near edge and IoT nodes
via the TCP/Ip network.
If a user runs a web server listening on [Link] as opposed to [Link] or user-
specific IP, [Link] is the local loop back device and is only accessible to the
device which is running on. [Link] is used to make an application listen on all
network devices. Users can provide IP addresses to edge. For example, an address
can be

1 http ://[Link]:8000/

or similar in local network with a local IP address. Users can use the above URL
from other networked IoT devices to run client applications.
URL [30] has a necessary workflow for the following five steps in near edge:
Step 1 Virtual environment: Activate virtual environment and in this case dlBox is
the virtual environment in near edge machine.
Step 2 use Jupyter Notebook: Run a Jupyter Notebook as given below. Keep in
server mode and no browser mode is on, and in this process, the remote
3.9 Deploy DL Networks in Near Edge 33

machine can use its own browser to work with a Jupyter Notebook that runs
in a near edge (Power 9) processor.
Step 3 Near edge is live: The above is running in a near edge (Power 9) machine
(or running in ssh terminal to near edge which is a Power 9 machine). Users
should leave this open and in running mode.
Step 4 Local Server to handle near edge: Open another terminal in user machine (if
it is a Ubuntu machine ) and use the ssh command to connect and read the
URL of the near edge (Power 9) machine such that the Jupyter Notebook
can be used in the user machine via a web browser. Users can use URL

1 l o c a l h o s t : 8 8 8 9 work@171 . 6 1 . 1 2 3 . 7 6

to reach edge from the local PC browser. Use the following command from
the local PC to reach the near edge machine.

1 s s h −N −f −L l o c a l h o s t : 8 8 8 6 l o c a l h o s t : 8 8 8 9 work@171
.61.123.76

The following is useful to create an instance of a Jupyter Notebook and


also kill, if it is not used.
Step 5 Web browser in local server
Open web browser in local machine by using URL

1 http : / / localhost :8886/

The above will open a web page in the user machine and ask for a token,
bring a token from Step 2. The same token will be displayed in the running
window of the Jupyter Notebook.
Successful deployment of the abovementioned workflow will lead users
to have good control on near edge machines, where near edge machines
are expected to run real-time inference service for applications that are
subscribed inference service from a given near edge machine.

3.9.1 Deploy DL Networks by Using TensorFlow RT

TensorFlow Lite is the official (from Nvidia) solution for running machine learning
models on mobile and embedded devices. It enables on-device machine learning
inference with low latency and a small binary size on Android, iOS, etc. TensorFlow
Lite 10.2 uses many techniques for this such as quantized kernels that allow smaller
and f aster (fixed-point math) models. Though deep learning networks run faster, it
comes at a trade-off of having lower accuracy.
Use URL [31] to install and use TensorFlow Lite.
Chapter 4
Hardware for DL Networks

In the realm of deep learning, hardware is the foundation upon


which intelligence is built.

This section proceeds to engage the previous section learning to empower the
advanced hardware system knowledge that powers a sturdy performance to train
the deep learning networks. A detailed hardware environment setting, configuration,
and presentation are presented on various processing and computing kinds including
AMD, POWER9, ARM, ARM .+ GPU, and X86 systems. These processing environ-
ments are showcased with an installation, setup, and configuration on edge servers.
Deep learning needs high computing systems with customized configurations for
various applications and tools, and hence, the book is not limited to deep learning
tools and application, but as well educates users and professionals to know insights
of what kinds of hardware and performance settings should be configured to achieve
the best deep learning results.
The book also provides sufficient URL reference links for coders to readers to
quickly download all relevant tools, applications, and hardware configuration tech-
niques in the need of the hour. Further, advanced installations like NVIDIA CUDA
compiler, GPU hardware, GeForce multiprocessor, thread processing, IBM Watson
CE, and large-scale AI business enterprise suite configuration are demonstrated in
simple steps. At last, deployment of AI on X86 and Android phone is also presented.
Increase in hardware performance is necessary to train deep learning networks.
The market is witnessing a proliferation of specialized hardware that not only
offers better performance on deep learning tasks, but also increased efficiency
(performance per watt). Figure 4.1 provides CPU and CPU .+ GPU combinations
which are used in enterprise level and also in research labs in academic institutes.
DGX Station A100 is very popular in enterprise-level performance and also
multiple of them form an on-prem cluster to manage the required computing
in training deep learning networks. AC922 and V100 GPUs are in the high-
performance segment. OpenPOWER CPU and PCI card (RTX 2080 or 2070) are
providing options to the cost-sensitive enterprise market.
The Jetson series Xavier and Orin are providing entry-level performance to train
deep learning networks.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 35


J. Singaram et al., Deep Learning Networks,
[Link]
36 4 Hardware for DL Networks

Fig. 4.1 Computing devices to train deep learning networks

CPU and FPGA are providing more options in the inference segment.
AI community’s demand for GPUs led to Google’s development of TPUs and
pushed the entire chip market towards more specialized products.
In the next few years we will see NVIDIA, Intel, SambaNova, Mythic, Graph-
core, Cerebras, and other companies bring more focus to hardware for AI workloads.
The Silicon Vendor team can take advantage of emerging requirements for
accelerated computing and move their GPU silicon into the IoT edge market. For
example, TI has their own inference engine (TIDL); Qualcomm has their own
(SPNE) as well ST Micro and many other silicon vendors.

4.1 Open Source for Edge Native Hardware

Open source is a boon for digital transformation!


4.2 POWER9 with RTX 2070 GPU 37

Fig. 4.2 Open source is


moving to hardware

Yes it is a boon for AI-based application developers. But mostly open-source


opportunities exist in the software segment. Still there is not much good news from
the hardware segment on open source, which is shown in Fig. 4.2.
In addition to open sourcing the POWER ISA, IBM is also contributing a newly
developed softcore to the community. In a very short time, an IBM engineer was
able to develop a software on the POWER ISA and get it up and running on
a Xilinx FPGA. This softcore implementation is being demonstrated this week
at OpenPOWER Summit North America. “Through the growing open ecosystem
of the POWER Architecture and its associated technologies, the OpenPOWER
Foundation facilitates its Members to share expertise, investment and intellectual
property to serve the evolving needs of all end users.” At Raptor Computing Systems
our top priority has always been owner-controlled, auditable systems.

4.2 POWER9 with RTX 2070 GPU

OpenPOWER Foundations [32] is in progress to make the hardware part also


open such that the open-source community gets to contribute in the development
of enhanced hardware for the deep learning network segment also for high-
performance computing requirements.
POWER9’s large caches and high SMT levels ensure deep learning applications
run smoothly, even with full system utilization. Hardware virtualization extensions
keep VMs running at near native speeds.
Talos™ II is first to market with the brand new, 14-nm POWER9 processor, built
on IBM OpenPOWER technology. Talos™ II, the world’s first computing system
to support the new PCIe 4.0 standard, also boasts substantial DDR4 memory, dual
POWER9 CPUs, and next-generation security. According to Talos™ II datasheet,
Lower your power use. Enable development of next-generation cards. Be ready for
tomorrow’s requirements today with Talos™ II. What goes best with PCIe 4.0 bandwidth?
Lots of DDR4 main memory. Shuffling your data in and out of the CPU isn’t a problem
with Talos™ II’s plentiful DDR4 slots. Vital data integrity is ensured through registered
DIMM interfaces and ECC support. In an industry first, Talos™ II ships with fully open
and auditable BMC firmware, based on the Open BMC project. Gone are the days when
you had to carefully isolate the buggy, insecure BMC port from threats at the firewall level.
With Talos™ II, the BMC is just another Linux system that can be maintained as part
of normal workflow. Find a bug or vulnerability? No problem; just patch, recompile, and
38 4 Hardware for DL Networks

install. Talos™ II drives the state of the art of secure computing forward. Talos™ II gives
you — and only you — full control of your machine’s security. Rest assured knowing
that only your authorized software and firmware are running via POWER9’s secure boot
features. Don’t trust us? Look at the secure boot sources yourself — and modify them as
you wish. That’s the power of Talos™ II.

Talos™ II Entry-Level Developer System TLSDS3 [33] includes the following


items:
1. EATX chassis with 500W ATX power supply
2. A single Talos™ II Lite EATX mainboard
3. One 4-core IBM POWER9 CPU
4. 4 cores per package
5. SMT4 capable
6. POWER IOMMU
7. Hardware virtualization extensions
8. 90W TDP
9. One 3U POWER9 heat sink/fan (HSF) assembly
10. 8 GB DDR4 ECC registered memory
11. 2 front panel USB 2.0 ports
12. 128 GB internal NVMe storage
13. Recovery DVD
Built for the world’s biggest AI challenges, POWER9 delivers unprecedented
performance for modern HPC, analytics, and AI. It deploys data-intensive work-
loads, like deep learning frameworks and accelerated databases, with confidence.
CPU information is very useful to plan for a given project and also getting the
associated operating system. The user can get cpuinfo [34] by using the script given
in the following link.

4.2.1 OpenPOWER CPU with ASPEED VGA Controller

During installation of CUDA SDK and its requirements, it is part of routine checks
to make sure of the VGA controller and its resource allocation. In some cases,
VGA controllers also go through PCI, so it is necessary to remove resource conflict
between other devices in PCI. Making a CPU to support GPU devices via PIC
requires the above study on resources used for VGA controllers.
The ASPEED controller as in Fig. 4.3 has a baseboard management controller,
or BMC [35], which is a small computer that sits on virtually every server
motherboard. Other components such as higher-end switches, JBODs, JBOFs, and
other devices now include BMCs as well. The largest vendor for BMCs today is
ASPEED whose AST2400 BMC is pictured below.
BMC support: Discrete GPU (VGA-compatible controller: GeForce RTX 2070).
Is it true that GeForce RTX 2070 is a discrete GPU? Most modern discrete GPUs
require firmware. As Talos™ II is aimed at a security-conscious audience, it does
not currently include GPU firmware in the production firmware images.
4.2 POWER9 with RTX 2070 GPU 39

Power 9

USB 2.0

x1 PCLe Gen10.4GB/s
LPC 33MHz
(Optional) TPM

Analog Video
DDR3
128 MB

16MB SPI BMC SMBUS (8) Rear I/O Panel


Flash
2x USB (Optional) RMM4 Dedicated
RGMII
NIC Module Connector
Serial Port A (DB9. external)

Serial Port B (DH-10 internal)

Fig. 4.3 BMC hardware

Does this mean GeForce RTX 2070 does not have firmware support in Talos™
II?
In case yes, how can Talos™ II support firmware for GeForce RTX 2070? The
following boot sequence is useful to resolve issues during boot.
Boot:
1. Does Talos™ II supports Trusted Boot or Secure Boot?
2. In case Talos™ II has a Secure Boot on, then how do you disable the same?
Trusted Boot is the measurement (hashing) of system firmware boot components
and the creation of secure cryptographic artifacts that unambiguously demonstrate
that particular firmware has been executed by the system. Trusted Boot artifacts can
be used to remotely verify system integrity or to seal secrets so that they are only
available after certain firmware has been executed.
Secure Boot is the cryptographic signing and verification of firmware boot
components, failure of which is flagged for system administrator investigation and
action, including logging an error and halting the system boot. Secure Boot prevents
the system from executing either accidentally or maliciously modified firmware.
VGA ports in PCIe (is it gen 3 or 4?) bus of Talos™ II
VGA-compatible controller, NVIDIA Corporation Device ([Link].0), and
VGA-compatible controller, ASPEED Technology, Inc. ([Link].0), are
placed in PCIe slots of Talos™ II.
40 4 Hardware for DL Networks

Workaround 1: Disable the onboard VGA output via the VGA disable jumper,
J10109. See the user’s guide for additional information.
Workaround 2: Select desired GPU at run time (yes, this option is put in use).
More information “about configuring ASPEED controllers” is discussed in [34]
and the particular file name is [Link].

4.2.2 CUDA Installation and PCI Driver for RTX 2070

Hardware configurations are listed in the following:


1. Hardware used in CUDA computing
2. Hardware from NVIDIA: GeForce RTX 2070
3. GPU Turing architecture
4. NVIDIA CUDA® Cores 2304
5. RTX-OPS 42T
6. Boost clock 1620 MHz
7. Frame buffer 8 GB GDDR6
8. Memory speed 14 Gbps
The following is used to check the driver for RTX 2070 GPU (PCI) hardware
installed in Talos™ II Edge Server.
Option 1 To enable the VGA port on the GPU, disable the onboard ASPEED VGA
which the system defaults to.
Option 2 As seen, there is no explicit driver for the RTX2070 on ppc64le. However,
some engineers have been successful in running with the standard driver (418.39)
that comes with CUDA 10.1, which is what we should point users to do.
VGA-compatible controllers from NVIDIA and ASPEED are listed. It appears
that ASPEED VGA controllers become a default in Talos™ II. There is a need to
remove the ASPEED VGA controller from the VGA port and make the NVIDIA
VGA controller as a default VGA port controller in Talos™ II. Remove any CUDA
PPAs that may be set up and also remove the NVIDIA CUDA toolkit if there is a
past installation.

4.2.3 Build Application Using nvcc

GeForce RTX 2070 Super comes with the following resources.


In one RTX 2070, there are 40 SMs. Each SM includes one RT core, 8 Tensor
Cores, 4 texture units, and 64 CUDA cores. The same is shown in Fig. 4.4.
One RT core (RT cores specifically accelerate the key math needed to trace
virtual rays of light through a scene). The ray-tracing algorithm builds an image
4.2 POWER9 with RTX 2070 GPU 41

Fig. 4.4 Infrastructure in a GPU

by extending rays into a scene and bouncing them off surfaces and towards sources
of light to approximate the color value of pixels.
Ray tracing is capable of simulating a variety of optical effects such as reflection,
refraction, soft shadows, scattering, depth of field, motion blur, caustics, ambient
occlusion, and dispersion phenomena (such as chromatic aberration). It can also be
applied to track the trajectory of sound waves, much like it does with light waves.
This feature makes it a suitable choice for enhancing the immersive sound design
in video games by generating lifelike reverberations and echoes. Additionally, it’s
important to note that there are 64 CUDA cores in this context. CUDA kernels also
have access to a unique variable that provides information about the number of
threads within a block.
.blockDim.x Using this variable, in conjunction with blockIdx.x and threadIdx.x,

increased parallelization can be accomplished by organizing parallel execution


across multiple blocks of multiple threads with the idiomatic expression

threadIdx.x + blockIdx.x × blockDim.x


.

The warp scheduler as shown in Fig. 4.5 looks at all warps assigned to it, to
determine which have instructions that are ready to issue. The warp scheduler then
chooses 1 or 2 instructions that are ready to execute and issues those instructions.
The process of issuing an instruction involves assigning functional units within an
SM to that execution (scheduling) of that instruction, warp-wide. A warp is always
32 threads; therefore, 32 functional units in one clock cycle, or a smaller number
distributed across multiple clock cycles, must be scheduled (and therefore must be
“available”) to issue the instruction.
Let “queue of blocks” be associated with each kernel launch. As resources on an
SM become available, the block scheduler will deposit a block from the “queue”
42 4 Hardware for DL Networks

Fig. 4.5 Infrastructure in a GPU

onto that SM. The block scheduler does not deposit blocks warp-by-warp. It is
an all-or-nothing proposition, on a block-by-block basis. Let us consider a block
that is already deposited on an SM. A warp is “eligible” when it has one or more
instructions that are ready to be executed.
4.2 POWER9 with RTX 2070 GPU 43

1. UBlock scheduling does not include warp scheduling.


2. Block scheduler is a device-wide entity.
3. Warp scheduler is a per SM entity.
Ampere GPU The maximum number of concurrent warps per SM remains
the same as in Volta (i.e., 64). The high-priority recommendations from those
guides are as follows: Find ways to parallelize sequential code. Minimize data
transfers between the host and the device. Adjust kernel launch configuration
to maximize device utilization. Ensure global memory accesses are coalesced.
Minimize redundant accesses to global memory whenever possible.
Avoid long sequences of diverged execution by threads within the same warp.
Devices with the same major revision number are of the same core architecture.
The major revision number is 9 for devices based on the NVIDIA Hopper GPU
architecture, 8 for devices based on the NVIDIA Ampere GPU architecture, 7
for devices based on the Volta architecture, 6 for devices based on the Pascal
architecture, 5 for devices based on the Maxwell architecture, and 3 for devices
based on the Kepler architecture.
The minor revision number corresponds to an incremental improvement to the
core architecture, possibly including new features.
The compute capability version of a particular GPU should not be confused with
the CUDA version (for example, CUDA 7.5, CUDA 8, CUDA 9), which is the
version of the CUDA software platform. The CUDA platform is used by application
developers to create applications that run on many generations of GPU architectures,
including future GPU architectures yet to be invented. While new versions of the
CUDA platform often add native support for a new GPU architecture by supporting
the compute capability version of that architecture, new versions of the CUDA
platform typically also include software features that are independent of hardware
generation.
The multiprocessor creates, manages, schedules, and executes threads in groups
of 32 parallel threads called warps. When a multiprocessor is given one or more
thread blocks to execute, it partitions them into warps and each warp gets scheduled
by a warp scheduler for execution.
The way a block is partitioned into warps is always the same. A warp executes
one common instruction at a time, so full efficiency is realized when all 32 threads
of a warp agree on their execution path. The SIMT architecture is akin to SIMD
(single instruction, multiple data) vector organizations in that a single instruction
controls multiple processing elements.
All functional units are pipelined. Most functional units can accept a new
instruction of the type they are designed to handle, on each clock cycle. The pipeline
depth determines when that instruction completes or retires. Each SP refers most
directly to a floating-point ALU. It handles floating-point adds and multiplies, but
not other instructions generally speaking. If there is a need for an integer add, for
example, an SP would not be scheduled to handle that instruction; instead, it would
be an integer ALU. All instructions are issued warp-wide and require 32 functional
units of the appropriate type to be scheduled. This can be 32 functional units in a
44 4 Hardware for DL Networks

single clock cycle, or, e.g., 16 over 2 clock cycles, or 8 over 4 clock cycles, etc.
The following questions provide a hint to understand more on SM and its efficient
usage:
1. What are all the different types of functional units in an SM?
2. How many of functional unit X are in SM architecture Y?
3. What is the pipeline depth of functional unit X?
4. What is the exact algorithm by which a warp scheduler chooses instructions to
issue?

Vector addition in CUDA code (*.cu)

Get source code of vectorAdd example from the Samples folder . /Simulations/
Build vectorAdd application by using make. Section 3 in URL [36] has an example
code for vector addition in GPU.

nbody example in CUDA code (*.cu)

Use URL [34] to get more details on running body examples in RTX 2070 GPU.
PTX file creation
Following command line instructions, make output “user2” from “[Link]” and
run it as well. Produce the PTX for the CUDA kernel. Section 4 in URL [36] is
handling PTX file creation and its use.
Use Python to use GPU in run time
Use Python code to perform computation in CUDA cores. Matrix multiplication
is done in GPU0 by using Python code. Matrix addition is done in GPU1 by using
Python code.
Following Python code, run TensorFlow on multiple GPUs. The same code
provides the option to construct a defined model in a multitower fashion where
each tower is assigned to a different GPU. Use the following URL to get code and
associated workflow.
Following, Python code is used to test GPU availability for computation in Talos
II. Use link [36] to get [Link], [Link], and [Link] files. Use the same files to test
GPU availability in OpenPOWER CPU.

4.2.4 Edge Native AI Hardware

DLtrain is used to handle AI inference workflow such as the following:


1. Inferencing by using the NN model
2. Inferencing by using the CNN model
The above AI inferring workload is ported into Android phones as well by using
Android NDK. In addition it is ported to many edge computer boards such as Jetson
Nano. For example, see [37].
4.2 POWER9 with RTX 2070 GPU 45

J7 app is a popular example application in Android phones. Deployment of a


training AI model in Android phones is handled in the J7 app and very good details
are given above.

4.2.5 On-Prem Requirement

The training platform is different from the deployment platform. The same provides
obstruction to deploy a trained AI model (CNN, RNN networks, etc.) into limited
capability deployment edge. Mostly, there is a need to cut down the AI model size
or optimize weights of the AI model. Optimization of the AI model size or changing
weight of the AI model may not be there if deployment happens in the edge side by
using DGX Station A100. Enterprise business owners of all sizes are struggling to
find the next-generation AI solutions that will unlock the hidden patterns and value
from their huge volume of data.
Emerging AI-enabled microservices in a given enterprise are driven by the
confluence of ML/DL algorithms. Figure 4.6 provides details on layers in DGX
Station A100. Enterprise on-prem requirement appears to be matching with speci-

Fig. 4.6 Edge native AI workload for enterprise business


46 4 Hardware for DL Networks

fications of DGX Station A100 which provides high levels of accuracy in business
solutions. However, AI-enabled enterprise service initiatives are complex and often
require specialized skills, ability, hardware, and software that are often not readily
available.
1. Training data set creation (on-prem or in IBM Cloud or in Colab or any other)
2. Building AI model by using TensorFlow or PyTorch. Building an AI model using
the custom framework DLtrain (for NN, limited version of CNN)
3. Training AI model by using DGX Station A100
4. Deploying AI model in IoT edge for inference service in real time

4.2.6 DGX Station A100 for DL Networks

Enterprise customers have the option to train large models using a fully GPU-
optimized software stack and up to 320 gigabytes (GB) of GPU memory. With DGX
Station A100, enterprise can provide multiple users with a centralized AI resource
for all workload training, inference, and data analytics. DGX Station A100 brings
AI out of the data center with a server-class system that can plug in anywhere to
perform real-time inference. DGX Station A100 uses the NVIDIA DGX ™ software
stack and it is an ideal platform for teams from all enterprises, large and small.
Data science teams effortlessly providing multiple, simultaneous users with a
centralized AI resource, DGX Station A100 is the work group appliance for the
age of AI. It is capable of running training, inference, and analytics workloads in
parallel, and it can provide up to 28 separate GPU devices to individual data science
teams.
The AI workgroup server delivering 2.5 peta FLOPS organizations around the
world can provide multiple users with a centralized AI resource for all workloads
that delivers an immediate on-ramp to NVIDIA DGX™-based infrastructure and
works alongside other NVIDIA-certified systems with a DGX Station A100 rental,
which is a new-generation enterprise offering in multi-instance GPU (MIG), includ-
ing four NVIDIA A100 Tensor Core GPUs, a top-of-the-line server-grade GPU,
superfast NVMe storage, and leading-edge PCIe Gen4 buses. A100 includes remote
management so enterprise customers can manage their DGX Station A100 like a
server. With no complicated installation processes or significant IT infrastructure
required, the DGX Station A100 can truly be placed anywhere an enterprise
customer data science team requires complex computations. Simply plug your
station into any standard wall outlet to get it up and running in minutes—and work
from anywhere.
This supercomputer was truly designed for today’s agile data science teams that
work in corporate offices, labs, research facilities, or even from home as the DGX
Station A100 can run simultaneous processes from multiple users without affecting
performance.
4.2 POWER9 with RTX 2070 GPU 47

NVIDIA DGX Station A100 is providing an opportunity to use the world’s only
office-friendly system with four fully interconnected and MIG-capable NVIDIA
A100 GPUs, leveraging NVIDIA® NVLink® for running parallel jobs and multiple
users without impacting system performance.
DGX Station A100 is a server-grade AI system that does not require data center
power and cooling. It includes four NVIDIA A100 Tensor Core GPUs, a top-of-
the-line, server-grade CPU, superfast NVMe storage, and leading-edge PCIe Gen4
buses, along with remote management so you can manage it like a server. It is
suitable for use in a standard office environment without specialized power and
cooling.

4.2.7 Deployment of AI in X86 Machine

Deployment of trained CNN model in X86 machine has the following items as part
of its deployment:
1. Ubuntu 18.04 OS is used in deployment machine which is X86.
2. Python is not required in deployment machine which is X86.
3. TensorFlow is not required in deployment machine which is X86.
4. FPGA (via PCI add-on) is not required in deployment machine which is X86.
5. GPU (via PCI add-on) is not required in deployment machine which is X86.
6. Item (b) and item (c) are always true.
7. Item (e) may be true sometimes.
8. Others.
Problem 4.2.1 Deployment of DLtrain application to train NN or CNN model
has a well-defined workflow. What are the items required in the following list
to successfully complete deployment of DLtrain for training a deep learning
network?
(a) User machine needs to install Docker (Ubuntu 18.04 X86 machine).
(b) User pulls “dltrain:1.0.0” docker image from Docker hub.
(c) User uses “dltrain:1.0.0” to train CNN model.
(d) CNN model definition is given in a txt file which is located in the user machine
(current working folder).
(e) Images folder is also required to be in the current working folder.
(f) Images and Network._prop.txt file are downloaded from Google Drive link
which is provided during demonstration.
(g) All items are true.
(h) Item (f) may not be true always.
48 4 Hardware for DL Networks

4.2.8 Deployment of AI in Android Phone

Problem 4.2.2 Deployment of “trained CNN model in Android Phone” has the
following items as part of its workflow for successful completion of deployment:
(a) NDK is (in Android Studio) used to build inference engine which is developed
in C and C++.
(b) Inference engine is not using GPU in phone.
(c) Inference engine is not using DSP in phone.
(d) Inference engine is capable of using updated trained CNN model from Ubuntu
machine (X86) via WiFi.
(e) Inference engine is used to collect data from the user (in this case the LKG
student).
(f) Inference engine is showing inference output in display.

4.2.9 Deployment of AI in Rich Edge

Rich edge is expected to have Python with installed virtual environment. If the CNN
model or NN model is small enough to fit within the constraints of rich edge, then
deployment will go through successfully. In case there is an issue in memory size
available to load the trained CNN, then there are expected issues that might require
pruning of the trained CNN model such that it can fit in rich edge.
Rich edge may not have FP32 support, or if FP32 computation is costly, then
there is a need to move towards INT32 computation during inference.
Problem 4.2.3 Deployment of TensorFlow version of the NN or CNN model in IoT
edge is required to quantize the trained NN or CNN model. This happens because
(a) The CNN model in Python is not ready to be deployed in IoT edge.
(b) The CNN model in TensorFlow is not ready to be deployed in IoT edge.
(c) The trained CNN model might have floating point weights.
(d) The trained CNN model might have too many neurons.
(e) IoT edge technology is very different from deep learning-based inference
technology.
(f) All the above are true.
(g) Others.

Setup Jetson nano AI computer and more information is given in [38] to set up
Jetson nano.
AI Edge Computer: Run CUDA program in Jetson Series Devices.
Chapter 5
Data Set Design and Data Labeling

Data is the lifeblood of deep learning, and its design and


labeling are the artisans’ work.

5.1 Insight

This section presents the most “tricky” and advanced data processing techniques
in the easiest way for the readers. More importantly, the book chapter reveals how
to read data from audio, speech, image, and text in different modes and techniques
for data sanitization and scaled data processing systems. The book also explains
statistical methods for interpreting and analyzing data for different deep learning
models; the Maxwell-Boltzmann statistic technique is deeply described specifically
with image signal processing, which demonstrates its main use in open CV libraries.
MNIST data handling is presented with training, test, and deployment mechanism.
More importantly, a novel technique, pixel normalization for image processing, is
presented including the global standards that facets the sequences in prediction,
classification, and sequence generation and sequence classification [39, 40] (see
Fig. 5.1).

5.2 Description

A data set is an essential part in training deep learning networks. In the following
paragraphs, various sources are given and these sources form mostly a basis for
modern-generation data sets to train deep learning networks and also machine
learning networks. The leftmost side in Fig. 5.2 indicates a low volume of data and
the rightmost side indicates huge volume of data.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 49


J. Singaram et al., Deep Learning Networks,
[Link]
50 5 Data Set Design and Data Labeling

Fig. 5.1 Input to data set

Develop a ID Generative Adversarial Network by using Keras


Statistical Machine Translation
Audio
Machine Learning for Audio
Published in IEEE JSTSP
Sentiment Classification
Image Captioning
RNN
Language Translation
Speech predict the next word in as sentence

Neural Machine Translation (seq2seq) Tutorial


machine translation,
Image CNN speech recognition, Moral Machine
text summarization.

How to Develop a Deep CNN for Fashion


MNIST Clothing Classification
Text

Fig. 5.2 Data set size growth

5.3 Source of Data: Human and Machine

Human beings generate too much data and add to this every digital device (IoT)
also creating 100 times more data compared to human beings. Imagine, 2000 plus
electricity transformers in a given city (may be, for example, Chennai in Tamil
Nadu) might produce too much of data for every hour. It will be almost impossible
for “Engineer in Power distribution substation or at Feeder to take a call by looking
at huge amounts of data from each transformer.” AI can reduce these data sizes
and provide good inference-level data to engineers to decide on load dispatch.
5.4 Data Set Creation and Statistical Methods 51

Healthcare is also coming up. Things are happening and few companies emerged
in this data-driven business segment as well.
Generated text data has been used to perform intelligent gathering for well-
defined objectives. Image data-based intelligence report regeneration and notifi-
cation service are becoming very popular at enterprise level and also getting into
consumer industries in the form of a doorbell with computer vision, advanced driver
assistance system (ADAS) with computer vision, and many more applications in
the healthcare segment as well. A proposed workshop provides an introduction to
cognitive computing in multimedia applications.

5.4 Data Set Creation and Statistical Methods

As shown in Fig. 5.2, collection of data might require different kinds of experiments
with the help of domain experts. Experiments, mostly statistical in nature, and, in
the following, popular statistical experiences are discussed.
Data set creation required a good amount of domain knowledge in a given
domain. For example, the following diagram illustrates the kind of signal and
associated networks that are used in deep learning models.
Data set size growth early-stage deep learning networks had used text data as
an input, but major success came after image data set is used in training models.
Speech and audio are also getting in as a part of natural language processing and
many more associated applications. Advanced driver assistance systems (ADASs)
appear to be integrating real-time sensor data as well.
Posterior probability distribution is the probability distribution of an unknown
quantity, treated as a random variable, conditional on the evidence obtained from an
experiment or survey.
In Fig. 5.3,
.k = 1 for Bernoulli trial
.k > 1 for binomial trial
.k = ∞, for Poisson trial

Fig. 5.3 Poisson trials


Bernoulli Trial
P(k) = p(1 – p)k–1

Binomial Experiment Geometric distribution


Trials required to get first
Poisson trials success
52 5 Data Set Design and Data Labeling

5.5 Statistical Methods

5.5.1 Bernoulli: Binary Classification of Data

It is a random experiment with exactly two possible outcomes, “success” and


“failure,” in which the probability of success is the same every time the experiment
is conducted. It is named after Jacob Bernoulli, a seventeenth-century Swiss
mathematician.
Let p be the probability of success in a Bernoulli trial and q be the probability
of failure. By definition these are complementary events, for example, “success”
and “failure” are mutually exclusive and exhaustive. The following relations are
used to illustrate the abovementioned relations. p .= 1 .− q and q .= 1 .− p. These two
equations can be stated as p .+ q .= 1:

p =1−q
q =1−p
. p+q =1 (5.1)
S = {x1 , x2 , . . . xr }
X(xi ) = 1

Let S be a set of observable in a given experiment as a possible outcome during


each measurement, where xi is an observable at a given measurement. There are “r”
measurements included in a set S.
A random variable X is a measurable function from S to E, where S is a set of
possible outcomes. Measurable space is E and in this case it can be assumed that 0
and 1 are two elements in E.
.X(xi ) = 1 or .X(xi ) = 0

As shown above X can take a value 1 or 0 in E. Probability mass function (PMF)


is a function that gives the probability that a discrete random variable is exactly
equal to some value. The following experiment is performed to create a data set.
Measure the respiration rate of an individual, which indicates the number of
lung breaths per minute. Consider that this experiment is conducted multiple times,
denoted as “r,” with the same sensor device to record the respiratory rate of the same
person.
.xi takes the value from 0 to 200, where 0 is the minimum breathing rate and 200

is the maximum breathing rate.


Figure 5.4 provides a histogram obtained using recorded samples, where the
horizontal axis is the number of breathing rates shown by the device during each
measurement. Vertical axis is number of times that the number of breathing rates
recorded. Is it defined only at its value 1 and 0?
5.5 Statistical Methods 53

Fig. 5.4 Data set design:


recorded samples

Figure 5.4 is one element in a data set which will be used in training NN or
CNN. Each picture is associated with a label, where the label has two values such
as “normal” and “abnormal.” For normal values, xi is in the range of 60 to 80.

P (X(xi )) = 1 = p
. (5.2)
P (X(xi )) = 1 − p = q

.P (X(xi )) = 1 is the probability mass function. After r recording, get one picture

and use the same as an input to train CNN for training data set.
Inference result = normal breathing rate (1) or abnormal breathing rate (0). The
same provides a notification for a person’s breathing condition such as normal or
abnormal.
One sample recorded for each trial. Number of samples per trial = 1.
In the above example, E is [0 and 1] and it is a discrete case in which experiment
provides a binary outcome. But in general E includes values in an interval. For
example, .E ∈ [a, b], where a can be 0 and b can be 1.

. f (a) = P (X(xi )) = a (5.3)

is a probability mass function, where a .= 1 or a .= 0



.
F (a) = f (x) (5.4)
x≤a
54 5 Data Set Design and Data Labeling

where F(a) is using probability mass function to define probability distribution


function, and the same can be defined as

. F (a) = P (X(xi ) ≤ a) (5.5)

The following uses a probability distribution function to define probability mass


function f(a).

. f (a) = F ((a) − lmith→0 F (a − h) (5.6)

A breathing rate example is designed to use a probability mass function with


binary outcome. P(X .= xi) is a probability mass function that differs from a
probability density function (PDF) in that the latter is associated with continuous
rather than discrete random variables. A PDF must be integrated over an interval to
yield a probability. The probability distribution of a random variable is a function
that takes the sample space as input and returns probabilities; in other words, it maps
possible outcomes to their probabilities.

5.5.2 Binomial: Binary Classification of Data

It consists of a fixed number n of statistically independent Bernoulli trials, each with


a probability of success p, and counts the number of successes, the probability of
exactly k successes in the experiment. Binomial distribution describes the number
of successes (k) in a sequence of n number of Bernoulli trials.
 
n k n−k
. P (k) = p q (5.7)
k
n
k is a binomial coefficient.
.

X is a random variable defined on a sample space which has r measurement at


a time. Function X takes n inputs and provides one integer output. For example, n
trials might lead to three trial successes and the rest .(n − 3) are failures and then the
output of .X is 3.

. X : S × S × S × ... × S → Z (5.8)

Perform n trials and each trial r recordings. After completing n trials, create a
histogram of recorded data which is shown in Fig. 5.5.
Breath rate 80 to 100 is normal and the rest is abnormal. In the Bernoulli
experiment .n = 1 and in binomial .n > 1. The rest of the workflow is given in
data set creation in the Bernoulli trial.
5.5 Statistical Methods 55

Fig. 5.5 Data set design:


binary mode

.P (X(xi )) = 1 is the probability mass function. After r recording, get one picture

and use the same as an input to train CNN for training data set.
Inference result = normal breathing rate (1) or abnormal breathing rate (0).
The same provides a notification for a person’s breathing condition such as
normal or abnormal.
r samples recorded for each trial. Number of samples per trial .r > 1.

5.5.3 Poisson: Binary Classification of Data

Poisson distribution expresses the probability of a given number of events occurring


in a fixed interval of time. Poisson distribution provides interesting things like
finding the probability of a number of events in a time period or finding the
probability of waiting some time until the next event. Stochastic process that is
continuous in time but discrete in space is the Poisson process.
Poisson sampling is a process where each element of the population is subjected
to an independent Bernoulli trial. The discrete nature of the Poisson distribution is a
probability mass function and not a density function. Figure 5.6 provides a sample
plot in which the x-axis is the integer count on a number of events (there are no
fractional events). Vertical coordinates provide probability for that number of events
to happen. Curve is associated with a specific lambda.
What is the probability that infinitely many times events can happen in a given
time period is zero? Thus, the above graph is valid for a sample space in which
period is fixed. For a different T and for the above given, the same lambda curve can
be very different. This is critical for data set design,
56 5 Data Set Design and Data Labeling

Fig. 5.6 Events vs. associated probability

Table 5.1 Bernoulli vs. Trial no Bernoulli trial Poisson trial


Poisson trial
1 p,q .p1 , .q1
2 p,q .p2 , .q2

3 p,q .p3 , .q3


... ...
n p,q .pn , .qn

where p is the normal breathing rate (assigned value is 1) (Table 5.1 provides a
detailed comparison)
where q is the abnormal breathing rate (assigned value is 0)
where .p1 is the normal breathing rate (assigned value is 1) during the first trial
where .q1 is the abnormal breathing rate (assigned value is 0) during the first trial
where .pn is the normal breathing rate (assigned value is 1) during the nth trial
where .qn is the abnormal breathing rate (assigned value is 0) during the nth trial
where .p1 is not equal to .p2 and so on
where .q1 is not equal to .q2 and so on

P (X(xi ) = 1) = pi
. (5.9)
P (X(xi ) = 0) = 1 − pi = qi

The above shows probability .pi might be different from .pi+1 .


Let lambda be the expected number of events in the interval

λk
. P (k) = e−λ (5.10)
k!
5.6 Image Signal Processing 57

The above is a probability for k events to happen in a given interval.


For example, people from different communities in a given geography may have
small probability differences in the number of heart attacks each year. Analysis of
heart attacks often involves Poisson trials, as probabilities change with each instance
of month or year. When these probabilities are unequal for each Bernoulli trial of
independent events, it becomes a Poisson trial.
Number of samples per trial .r = ∞.
The above constraint leads to many challenges in collecting samples. For
example, having an infinite number of samples will be assumed to be a sufficiently
high number of samples. A Poisson trial is a collection of Bernoulli trials with
unequal probabilities.
Let us assume .Ti is a period.
.r1 is the number of breathing beats in the above given .T1 . .r2 is number of

breathing beats in .T2 above given.


What is the number of breathing beats for .Ti ,
where .Ti is in .[T1 , T2 ]?
The above guidelines are used to create a data set which can be used to train
CNN. And then the trained CNN can be used for inference. For example, the above
question is transformed into an inference question by using trained CNN.
Perform n trials and each trial r recordings. After completing n trials, create a
histogram of recorded data, breath rate 80 to 100 is normal, and the rest is abnormal.
In the Bernoulli experiment .n = 1 and in binomial .n > 1. The rest of the workflow
is given in the data set creation in the Bernoulli trial.
Problem 5.5.1 80 beats recorded (as an average) for 1 minute. What will be the
number of beats for 5 min?
.λ1 = 80 and .t1 = 1
.λ2 =?? and .t2 = 5

λ2 t2 t2
. = =⇒ λ2 = λ1 (5.11)
λ1 t1 t1

The above steps are useful to create a data set for many new lambda values from
a given set of lambda values.

5.6 Image Signal Processing

5.6.1 Image Data and Maxwell-Boltzmann Statistics

Gaussian blur which is also known as Gaussian smoothing is the result of blurring
an image by a Gaussian function. It is used to reduce image noise and reduce
details. The visual effect of this blurring technique is similar to looking at an
image through the translucent screen. It is sometimes used in computer vision for
58 5 Data Set Design and Data Labeling

0 250 0 0 250

100 200 100 100 200

200 150
200 150 200

300 100
300 100 300
50
50 400
400 400
0
0 0 100 200 300 400 500
0 1 200 300 400 500 0 100 200 300 400 500 2 3
0 0
4 5
100 100

200 200

300 300

400 400

0 100 200 300 400 500 0 100 200 300 400 500

Fig. 5.7 2D filter use in image processing

image enhancement at different scales or as a data augmentation technique in deep


learning. The basic Gaussian function looks like
 
1 x2 + y2 − x2 + y2
Log(x, y) = −
. 1 − e
2π σ 4 2σ 2 2σ 2

In practice, it is best to take advantage of the Gaussian blur’s separable property


by dividing the process into two passes. In the first pass, a one-dimensional kernel
is used to blur the image in only the horizontal or vertical direction. In the second
pass, the same one-dimensional kernel is used to blur in the remaining direction.
Image 1 in Fig. 5.7 is the input image.
Image 2 in Fig. 5.7 is a gray version of the input image. Image signal processing
algorithms are used to bring a complex problem into an easily computable problem.
Image 3 in Fig. 5.7 is the 2D filtered version of input image. It is a computable
problem but complexity is high.
Image 4 in Fig. 5.7 is an output after applying a circle on each detected object
(mostly point in image).
Image 5 in Fig. 5.7 is a version that is filtered via object detection. The same is
used for counting the number of stars in a given input image. Mostly, Image 5 is the
result in a computable problem.
The resulting effect is the same as convolving with a two-dimensional kernel in a
single pass. The following example is used to illustrate the application of Gaussian
filters to an image enhancement. If the filter is normally distributed and when it is
applied to an image, the results look like that as given in Fig. 5.7.
The source code for the above example is given in [41]
5.6 Image Signal Processing 59

5.6.2 Working with Image Files

1. OpenCV library
2. PIL library
3. URLLIB library
4. matplotlib library
5. pickle module
6. skimage library
The following URL provides a code to read the fashion mnist data set. The
mentioned data set is read from the URL base and also from the local PC [42].
It is an order 5 tensor, and the dimensions are BatchSize × Depth×Height ×
Width×Channels

MNIST Data Set Handling

A data set of 60,000 .28×28 grayscale images of the 10 digits, along with a test set of
10,000 images. The original black and white images of MNIST had been converted
to grayscale in dimensions of 28*28 pixels in width and height, making a total of
pixels. Pixel values range from 0 to 255, where higher numbers indicate darkness
and lower as lightness.
Refer: [43] provides details on MNIST data set file format and also a necessary
code to read MNIST file.
After completing adding one file and testing, use all other files from your own
data set. Pixel values are often unsigned integers in the range between 0 and 255.
Although these pixel values can be presented directly to neural network models in
their raw format, this can result in challenges during modeling, such as slower than
expected training of the model. Instead, there can be a great benefit in preparing
the image pixel values prior to modeling, such as simply scaling pixel values
to the range 0–1 to centering and even standardizing the values. This is called
normalization and can be performed directly on a loaded image. The example below
uses the PIL library (the standard image handling library in Python) to load an image
and normalize its pixel values.
Problem 5.6.1 MNIST data set is used in DLtrain to train the NN or CNN model.
Given the MNIST data set is having a well-defined format and also its use in DLtrain
to train the NN or CNN model. In the following items, list items that are valid for
the above-defined MNIST data set:
(a) MNSIT data set includes 70,000 images.
(b) 28 × 28 is image size used in MNIST.
.

(c) Each pixel is having 0 or 1 value in MNIST image file.


(d) Images for 0,1,2,3,4,5,6,7,8,9 are given in MNIST data set.
(e) Each pixel has a gray (0 to 1) value in MNIST image file.
(f) All of the above are true but item (e).
(g) All of the above are true but item (c).
60 5 Data Set Design and Data Labeling

5.6.3 Pixel Normalization

How to normalize pixel values to a range between zero and one. Use [44] to access
source code and perform normalization on a given image file.

5.6.4 Global Centering

How to center pixel values both globally across channels and locally per channel.
Use the following URL [44] to get a source code for global centering.

5.6.5 Global Standardization

How to standardize pixel values and how to shift standardized pixel values to .+ve
domain.
Use the following URL [44] to get a source code for global standardization.
The given example calculates the mean and standard deviation across all color
channels in the loaded image and then uses these values to standardize the pixel
values.

5.7 Data Set: Read and Store

5.7.1 Data Set with Label Data

Mostly, ML-based networks use data with labels during training and testing phases.
Image classification networks (DL based) also use data with label for training and
testing. Labeling a given data is mostly manual and it is driving a very vibrant
industry. There are companies providing service for data label work. By using
computer vision, there is a mix of manual and partly automatic also getting into
data label workflow.
Volume of data set defines storage options. For example, a local machine will be
the best option to store a data set, but a high-volume nature will require an on-prem
data center or cloud data center for large-volume data storage. Moreover, different
file systems are coming up to handle the distributed nature of data storage. In fact,
this is a vibrant segment and a lot more invention happening in every financial
quarter of the business cycle.
Stored data is required to be used in training DL/ML networks. For this there are
many methods and tool sets emerging. PyTorch, TensorFlow, etc. provide methods
to handle data set reading. But for large volumes of data set reading, there are
vendor-specific tool sets and services are emerging.
5.8 Audio Signal Processing 61

5.7.2 Working with CSV Files

Load a file directly using the NumPy function loadtxt(). There are eight input
variables and one output variable (the last column). Once loaded we can split the
data set into input variables (X) and the output class variable (Y). Use the following
URL to get a source code which is useful to read CSV file. Refer to [45].

5.8 Audio Signal Processing

Text to speech synthesis (TTS) uses deep learning networks to synthesize high-
quality speech for a given speech:
1. Text is input.
2. Normalization.
3. Text preprocessing.
4. Phoneme (database of phoneme for a given word and also given language).
5. Acoustic model for given phoneme.
6. Speech waveform is output.
In the above, step 5 uses deep learning networks to synthesize speech. But other
steps from 1 to 4 provide processed data for step 5.
Speech data set creation requires Mel spectrogram computation for a given
phoneme.
Concatenation synthesis (words, syllables, diphones, or even individual phones),
statistical parametric synthesis (HMM), speech synthesis evaluation (MOS), and
speech synthesis with deep learning.

5.8.1 Speech Synthesis by Using Deep Learning Networks

Problem 5.8.1 Let Y be input text sequence; target speech X can be derived by

X = arg Max P(X|Y, θ )


.

where theta is the model parameter. Create a data set by using speech signal and
train deep learning network model such that trained deep learning network model is
used to estimate above X.
A neural vocoder achieves the encoding/decoding using a neural network. GAN-
based TTS and EATS (end-to-end adversarial text-to-speech by Deepmind). It
operates on pure text or raw phoneme sequences and produces raw waveform as
output [46–48].
62 5 Data Set Design and Data Labeling

5.9 Data Set by Using PCAP File and Stream to Tensor

IP stream data is stored in PCAP file. Overall, combining IP stream analysis with
deep learning can lead to more accurate and effective tools for network analysis and
security.
PCAP file-based data is required to transform into Tensor for TensorFlow and
also Tensor for PyTorch.
Refer to file [Link] in [49] to get information on flow capture tool set
tcpdump.
Figure 5.8 provides workflow in capturing data from the IP network.
Code in file [Link] in [50] is used to convert PCAP file in CSV
file, where the CSV file is used as a data set to train the deep learning network
model.

Fig. 5.8 Data stream in IP networks


Chapter 6
Model of Deep Learning Networks

Deep learning models are the architects of artificial


intelligence, shaping its intelligence and character.

6.1 Insight

This section of the book primarily addresses the deep learning model designing and
development. Deep learning network is emerging as another tool set to model a
given physical process [51–53]. Observed data of a given physical process is used
in the design and development of deep learning networks. Probability distribution
for a given data set is associated with deep learning networks which represent a
given data set. A neural network is used to model Boltzmann machine, but training
Boltzmann machine is still an open problem. Thus, restricted Boltzmann machine
[54] is trending as a way ahead and the same is used in a model neural network,
convolutional neural network, etc. The mentioned restricted Boltzmann machine
uses Bayes network and data collection is required to support model parameters.
Innovation of CNN had resulted in providing a tool set to handle modeling of
observed data. Brooks–Iyengar algorithm [55, 56] provides methods and apparatus
to solve a special class of Boltzmann machine which is in line with multilayer
perceptron (MLP). The design of deep learning network uses NN, CNN, RNN, etc.
to model a network. The development of deep learning networks [57] requires to
train NN, CNN, RNN, etc. by using a data set. Finding a probability distribution
for a given data is defined as a computability problem in the sense of Kolmogorov
computability [58]. Back propagation is one class of algorithms that leads to sub-
optimal deep learning networks. Pre-trained deep learning networks become a
starting step to train a network with additional data set. Compression (quantization
of bias and weights, pruning) of a trained deep learning network also appears to be
critical for successful deployment of a trained deep learning network in a given IoT
native device or cloud native system. The abovementioned items are discussed in
this chapter, but still there is a scope to enhance with a lot more detail in Kolmogorov
complexity and also the use of Pontryagin duality [59–61] to handle Kolmogorov
complexity.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 63


J. Singaram et al., Deep Learning Networks,
[Link]
64 6 Model of Deep Learning Networks

6.2 Data and Model

Deep learning starts with the following set of questions:


1. Find a model of a given binary data.
2. What is the need to model binary data?
3. What can one do with a model?
4. What can be done with a model that emerged from binary data?
5. How is probability assigned to each binary vector?
6. How is the above connected to the weights of Boltzmann machine?

The main concern appears to be gathering existing data and utilizing deep learning networks
to learn a new capability. For example, the following list includes a few problems that have
good attraction in the research segment of a deep learning network model design:

1. Sequence (set with order)


2. Sequence prediction
3. Sequence classification
4. Sequence generation
5. Sequence to sequence prediction

6.2.1 Sequence Prediction

Predicting the next value or kth value from the present value for a given input
sequence.
Weather forecasting: Given a sequence of observations about weather over time,
predict the expected weather tomorrow.
Stock market prediction: Given a sequence of movements of a security over time,
predict the next movement of the security.
Product recommendation: Given a sequence of past purchases of a customer,
predict the next purchase of a customer.

6.2.2 Sequence Classification

Predicting class label for a given input sequence. The input sequence may be
comprised of real values or discrete values.
DNA sequence classification: Given a DNA sequence of ACGT values, predict
whether the sequence codes for a coding or noncoding region.
Anomaly detection: Given a sequence of observations, predict whether the
sequence is anomalous or not.
6.3 Data and Probability Model 65

Sentiment analysis: Given a sequence of text (ex tweet), predict whether


sentiment of the text is positive or negative.

6.2.3 Sequence Generation

Generating a new output sequence that has the same general characteristics as other
sequences in the corpus.
Text generation: Given a corpus of text, such as the works of Shakespeare,
generate new sentences or paragraphs of text that read like Shakespeare.
Handwriting prediction: Given a corpus of handwriting examples, generate
handwriting for new phrases that have the properties of handwriting in the corpus.
Music generation: Given a corpus of examples of music, generate new musical
pieces that have the properties of the corpus. Image caption generation: Given an
image as input, generate a sequence of words that describe an image.

6.2.4 Sequence to Sequence Prediction

It is a subtle but challenging extension of sequence prediction where rather than


predicting a single next value in the sequence.
Multistep time series forecasting: Given a time series of observations, predict a
sequence of observations for a range of future time steps.
Text summarization: Given a document of text, predict a shorter sequence of text
that describes the salient parts of the source document.
Program execution: Given the textual description program or mathematical
equation, predict the sequence of characters that describes the correct output.

6.3 Data and Probability Model

6.3.1 Measurement and Probability Distribution

Measurements of any kind, in any experiment, are always subject to uncertainties


or errors, as they are more often called. Measurement process is, in fact, a random
process described by an abstract probability distribution whose parameters contain
the information desired. The results of a measurement are then samples from this
distribution which allow an estimate of the theoretical parameters. In this view,
measurement errors can be seen as sampling errors. Most observable phenomena
are random in nature and it is termed as random process or random experiment.
Random processes have outcomes, and subsets of these outcomes are called
events. These events are mapped to a numeric form by using random variables.
66 6 Model of Deep Learning Networks

Fig. 6.1 Observed data and associated model

Stochastic models predict the output of an event by providing different choices (of
values of a random variable) and the probability of those choices.
If a distribution has unknown (not inferred yet) parameters, then it leads to a
family of distributions. Each value of the parameter is a different distribution. This
family is called a statistical model with parametrization. For example, Bernoulli,
binomial, exponential is a class of statistical model.
The term “probability model” (probabilistic model) is usually an alias for a
stochastic model [51]. Figure 6.1 provides a link between observed data and the
associated model.
1. Providing different choices (of values of a random variable)
2. Probability of those choices
Probability mass function is a function that gives the probability that a discrete
random variable is exactly equal to some value. A probability mass function differs
from a probability density function (PDF) in that the latter is associated with
continuous rather than discrete random variables.

. P (Y = y, X = x) = P (X = x)P (Y = y|X = x) (6.1)

The probability distribution of a random variable is a function that takes the


sample space as input and returns probabilities: In other words, it maps possible
outcomes to their probabilities. The joint probability distribution is useful in cases
where we are interested in the probability that x takes a specific value while y takes
another specific value. For instance, what would be the probability to get a 1 with
the first dice and 2 with the second dice? The probabilities corresponding to every
pair of values are written .P (x = x, y = y) or P(x,y). This is what we call the
joint probability .P (y = y|x = x) that describes the conditional probability: It is
the probability that the random variable y takes the specific value y given that the
random variable x took the specific value x.
It is different from .P (y = y, x = x) which corresponds to the probability of
getting both the outcome y for the random variable y and x for the random variable x.
In the case of conditional probability, the event associated with the random variable
x has already produced its outcome (x).
6.3 Data and Probability Model 67

The probability that the random variable y takes the value y given that random
variable x took the value x is the ratio of the probability that both events occur (y
takes the value y and x takes the value x) and the probability that x takes the value x
is
P (Y = y|X = x)
. P (Y = y, X = x) = (6.2)
P (X = x)

It may be more intuitive to look at it in another direction, as in the following:

. S = x1 , x2 , . . . , xr [0, 1] (6.3)

Posterior, in this context, means after taking into account the relevant evidence
related to the particular case being examined. The posterior probability distribution
is the probability distribution of an unknown quantity, treated as a random variable,
conditional on the evidence obtained from an experiment or survey.
Figure 6.2 provides details on increasing complexity in probability models on the
right side and on the left side it provides details on models for inference. “k” trials
are required to obtain the first success in geometric distribution.

Bernoulli Trial

Binomial Experiment

P(k) = p(1 – p)k–1

Geometric distribution
A Bayesian network (also known
as a Bayes network, belief Poisson trials
network, or decision network)

Restricted Boltzmann
Machine

Boltzmann Machine

Fig. 6.2 Probability distribution


68 6 Model of Deep Learning Networks

6.4 Boltzmann Distribution

Bayesian networks are directed acyclic graphs whose nodes represent variables in
the Bayesian sense; they may be observable quantities, latent variables, unknown
parameters, or hypotheses [52]. Edges represent conditional dependencies; nodes
that are not connected (no path connects one node to another) represent variables
that are conditionally independent of each other. Each node is associated with a
probability function that takes, as input, a particular set of values for the node’s
parent variables and gives (as output) a probability distribution.
It is a probabilistic graphical model that represents a set of variables and their
conditional dependencies via a directed acyclic graph. Boltzmann distribution is a
probability distribution that gives the probability of the state as a function of the
state’s energy and a temperature of a system [53].
Gibbs sampling is applicable when the joint distribution is not known explicitly
or is difficult to sample from directly, but the conditional distribution of each
variable is known. The Gibbs sampling algorithm [62] generates an instance from
the distribution of each variable in turn, conditional on the current values of the
other variable. Gibbs sampling is particularly well adapted to the sampling posterior
distribution of a Bayesian network, since Bayesian networks are typically specified
as a collection of conditional distributions.
Maxwell–Boltzmann Statistics
The original derivation in 1860 by James Clerk Maxwell was an argument based on
molecular collisions of the kinetic theory of gases as well as certain symmetries in
the speed distribution function.
Maxwell also gave an early argument that these molecular collisions entail a
tendency towards equilibrium. After Maxwell, Ludwig Boltzmann in 1872 also
derived the distribution on mechanical grounds and argued that gases should over
time tend towards this distribution, due to collisions.
Maxwell later (1877) derived the distribution again under the framework of
statistical thermodynamics. Starting with the result known as Maxwell–Boltzmann
statistics (from statistical thermodynamics). Maxwell–Boltzmann statistics gives
the average number of particles found in a given single-particle microstate. Under
certain assumptions, the logarithm of the fraction of particles in a given microstate is
proportional to the ratio of the energy of that state to the temperature of the system:
 
Ni Ei
. − log ) ∝ (6.4)
N T

The assumptions in this equation are that the particles do not interact and that
they are classical.
Each particle’s state can be considered independently from the other particles’
states. Additionally, the particles are assumed to be in thermal equilibrium. This
6.4 Boltzmann Distribution 69

relation can be written as an equation by introducing a normalizing factor:


Ei
Ni e− kT
= (6.5)
.
N  N − Ej
j =1 e
kT

where
1. .Ni is the expected number of particles in the single-particle microstate i
2. N is the total number of particles in the system
3. .Ei is the energy of microstate i

4. the sum over index j takes into account all microstates


5. T is the equilibrium temperature of the system
6. k is the Boltzmann constant
The denominator in the equation is a normalizing factor so that the ratios .Ni : N
add up to unity; in other words it is a kind of partition function (for the single-
particle system, not the usual partition function of the entire system).
Because velocity and speed are related to energy, the equation can be used to
derive relationships between temperature and the speeds of gas particles. All that is
needed is to discover the density of microstates in energy, which is determined by
dividing up momentum space into equal-sized regions.
In the given context, .Ei represents energy, .T stands for temperature, and .k
represents the Boltzmann constant. In a sensor network, the term .T is associated
with the noise generation term in a measurement process. If .T is .0, then the
measurement is clean (which may not be true in the real world of sensing).
A probabilistic graphical model represents a set of variables and their conditional
dependencies via a directed acyclic graph (DAG). Figure 6.3 provides details on
“supervised” vs. “autonomous” learning.
Bayesian networks as shown in Fig. 6.4 is a directed acyclic graph whose nodes
represent variables in the Bayesian sense.
Nodes can be observable quantities, latent variables, unknown parameters, or
hypotheses.
Edges represent conditional dependencies.
Nodes that are not connected (no path connects one node to another) represent
variables that are conditionally independent of each other.
Each node is associated with a probability function that takes, as input, a partic-
ular set of values for the node’s parent variables and gives (as output) a probability
distribution. A probabilistic graphical model represents a set of variables and their
conditional dependencies via a directed acyclic graph. Boltzmann distribution is a
probability distribution that gives the probability of a state as a function of the state’s
energy and a temperature of a system.
Gibbs sampling is applicable when the joint distribution is not known explicitly
or is difficult to sample from directly, but the conditional distribution of each
variable is known. The Gibbs sampling algorithm generates an instance from the
70 6 Model of Deep Learning Networks

Artificial Neural Networks,


(ANNs)Recurrent Neural Networks
(RNNs), Convolutional Neural
Networks (CNNs)

DL A Bayesian network (also known


ed
vis
as a Bayes network, belief
p er ls network, or decision network)
Su ode
m
Restricted Boltzmann
Machine

DL Boltzmann Machine
d
v ise
er
s up ls Self Organising Maps
Un ode (SOMs), Autoencoders
m

Fig. 6.3 Supervised learning

Fig. 6.4 Bayesian networks

distribution of each variable in turn, conditional on the current values of the other
variable.
Gibbs sampling is particularly well adapted to sampling the posterior distribution
of a Bayesian network, since Bayesian networks are typically specified as a
collection of conditional distributions.
Ei
e− kT
pi = (6.6)
N
.
Ej
− kT
j =1 e

where .Ei is energy and .T is temperature and .k is Boltzmann constant.


6.4 Boltzmann Distribution 71

In a sensor network, the term .T 6.6 is associated with the noise generation term
in a measurement process. If .T is .0, then the measurement is clean (which may not
be true in the real world of sensing).
A probabilistic graphical model represents a set of variables and their conditional
dependencies via a directed acyclic graph (DAG). Bayesian networks are directed
acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: They
may be observable quantities, latent variables, unknown parameters, or hypotheses.
Edges represent conditional dependencies; nodes that are not connected (no path
connects one node to another) represent variables that are conditionally independent
of each other. Each node is associated with a probability function that takes, as
input, a particular set of values for the node’s parent variables and gives (as output)
the probability (or probability distribution, if applicable) of the variable represented
by the node. Gibbs sampling is applicable when the joint distribution is not known
explicitly or is difficult to sample from directly, but the conditional distribution of
each variable is known and is easy (or at least, easier) to sample from. The Gibbs
sampling algorithm generates an instance from the distribution of each variable in
turn, conditional on the current values of the other variable.
Gibbs sampling in Fig. 6.5 is particularly well adapted to sampling the posterior
distribution of a Bayesian network, since Bayesian networks are typically specified
as a collection of conditional distributions. Given an input vector v, we are using
p(h|v) for the prediction of the hidden values h. Knowing the hidden values we
use .p(v|h) for the prediction of new input values v. This process is repeated k
times. After k iterations we obtain another input vector .vk which was recreated from
original input values .v0 , the specified multivariate probability distribution, where .Ei
is the probability of a certain state of our system .pi and N is the number of sensors
in a given sensor network.
Particles which are regulated by Maxwell–Boltzmann statistics have to be
distinguishable from each other and one energy state can be occupied by two or
more particles. Reconstruction is different from regression or classification.
Reconstruction estimates the probability distribution of the original input instead
of associating a continuous/discrete value to an input example. Gibbs sampling is
particularly well adapted to sampling the posterior distribution of a Bayesian net-

Fig. 6.5 Gibbs sampling


72 6 Model of Deep Learning Networks

work, since Bayesian networks are typically specified as a collection of conditional


distributions.
Problem 6.4.1 What is the necessary condition “on a given image” such that a
computationally tractable algorithm is used to count the number of objects in a
given image?
(a) The given image needs to have Maxwell–Boltzmann statistics.
(b) The given image needs to have Bose–Einstein statistics.
(c) Only (a) is true and (b) is not true.
(d) (a) is not true and (b) also not true.
(e) (a) is true and (b) also true.

Boltzmann and Helmholtz machines are strongly related to Markov random


fields and conditional random fields. This leads to the development of algorithms
for inference that can be applied to both kinds of models, as, for example, fractional
belief propagation.
Boltzmann machine models given a “set of binary vectors.” Trained Boltzmann
machine is deployed to find out the distribution of an input stream.
A sensor classification work is a special case of the abovementioned process in
Boltzmann machines. The example using restricted BM is used as a deep learning
network to classify nodes in a TCP/IP network. In the Boltzmann machine, each
undirected edge represents dependency. In Fig. 6.6, there are three hidden units and
four visible units. This is not a restricted Boltzmann machine. Input is fed to all
nodes(green and yellow) during interval 0 to T.
Step 1 (input to Boltzmann machine): Provide an external input to network which
is subjected to system temperature for all neurons. Network is then trained
using data from all systems that are exposed to the same system temperature.
However, there may be variations in temperature levels among different
states, which is a subject of investigation.

output
input v1 h1
v1 h1
v2 v2
T T
v3 h2 v3 h2

v4 h3 v4 h3

input time
input time
output time

Δt
Fig. 6.6 Training Boltzmann machine
6.4 Boltzmann Distribution 73

E1
output E2
v1
v2
E3 input
v3 E
output
E4
v4
Δt
E = E 1 + E2 + E3 + E4
Fig. 6.7 Inference by using Boltzmann machine

Fig. 6.8 Distribution and model

Step 2 (output from Boltzmann machine): Measure each state that is supporting
measurable conditions. The maximum possible energy of the above system
will provide provision to quantize energy levels.
The output energy E in Fig. 6.7 is a combination of all four energies from each
node v. Each energy level provides a possible state of each node. But in the above E
there is no contribution from hidden nodes but there is indirect contribution which
needs to be estimated by using a model of dynamical system [51].
Input vector .v is used to find .p(h|v) to predict hidden values h. Knowing the
hidden values, use .p(v|h) for the prediction of new input values v. This process
is repeated k times. After .k iterations, input vector .k is recreated from the original
input value .v0 . Target states include all possible states of the sensor.
Distribution is associated with a model and it is shown in Fig. 6.8. For a given
input data set, find a distribution and it is equal to finding a model.
74 6 Model of Deep Learning Networks

6.5 Multilayer Neural Network

Multilayer perceptron (MLP) [63] Hopfield networks are deterministic networks.


MLP can be shown to estimate the conditional average on the target data. An MLP
is a fully connected class of feed forward artificial neural network (ANN).
The term MLP is used ambiguously, sometimes loosely to mean any feed forward
ANN, sometimes strictly to refer to networks composed of multiple layers of
perceptrons (with threshold activation). MLPs are sometimes colloquially referred
to as “vanilla” neural networks, especially back propagation, a generalization of the
least mean squares algorithm in the linear perceptron.
Hopfield is a deterministic recurrent neural network, deterministic because once
the initial state is given, its dynamics evolves following the Lyapunov function.
It has been shown that it can solve combinatorial problems and learn time series.
Helmholtz and Boltzmann machines are stochastic networks, meaning that given
an input, the state of the network does not converge to a unique state, but to an
ensemble distribution. BM provides a probability distribution of the state of the
neural network. They are the stochastic equivalent of the Hopfield network, when
they have a single hidden layer. Learning occurs in the perceptron by changing
connection weights after each piece of data is processed, based on the amount of
error in the output compared to the expected result. This is an example of supervised
learning.

6.6 Reduction of Boltzmann Machine to the Hopfield Model

Boltzmann machine reduces to the Hopfield model. Figure 6.9 provides details on
the mentioned relationship between Boltzmann machine and Hopfield network. The
Boltzmann network energy level is a function of temperature. If temperature is high,
then energy also will be high in the Boltzmann network. If T .= 0 (temperature), then
the Boltzmann network reaches an energy level which is in equilibrium (energy level
need not be zero). In a sense at T .= 0, Boltzmann network becomes a deterministic
network. In particular, Boltzmann network becomes Hopfield network, because
Hopfield is having Lyapunov function which can be considered as a constraint (as it
comes from energy). In the case of MLP, there is no Lyapunov function and thus no
constraint as well. The BI algorithm is closer to multilayered perceptrons (MLPs),
because the BI algorithm is deterministic and does not have Lyapunov function.
Training the MLP network, the back propagation algorithm is used.
The BI algorithm is similar to back propagation to arrive at convergence in
node value. The Brooks–Iyengar algorithm performs the following: Every node is
given its input and also average values from other nodes (average over T). Nodes
jointly provide deterministic output. In the above, it is clear that no Lyapunov or
temperature is used in BI, and thus, BI is a special case of Hopfield network where
6.7 Kolmogorov Complexity for a Given Data 75

Fig. 6.9 Boltzmann machine and Brooks–Iyengar algorithm

there is no constraint from Lyapunov. BI is another version of MLP where the


network provides deterministic output by using conditions of other nodes.

6.7 Kolmogorov Complexity for a Given Data

Kolmogorov complexity has its roots in probability theory [58], information theory,
and philosophical notions of randomness. Idea is intimately related to problems
in both probability theory and information theory. Kolmogorov complexity is the
length of the shortest binary program from which the object can be effectively
reconstructed.
Combining concepts of computability and statistics, we can express the complex-
ity of a finite object in terms of Kolmogorov complexity. Kolmogorov complexity
represents the length of the shortest computer program (algorithm) that can produce
the object as output. This complexity measure takes into account both the computa-
tional aspects (computability) and the statistical aspects (probability) of describing
the object. In essence, it quantifies the minimum amount of information needed to
generate the object using a universal Turing machine or a similar computational
model. It may be called the algorithmic information content of a given object: What
is the shortest binary representation of a program from which a parameter can be
reconstructed by using N-r sensors, w here N is the number of sensors used in a
sensor network and r is the number of faulty sensors? Output s is observed from
Turing machine T, where p is a program in T that outputs s and K T (s) is used
to detect regularities of a given sensor data in order to find new information from
a given sensor. For example, expression K is computable if and only if there is an
76 6 Model of Deep Learning Networks

effective procedure that, given any (k-tuple) x of natural numbers, will produce the
value f (x). f : Nk .→ R. In agreement with this definition, computable functions take
finitely many natural numbers as arguments and produce a value which is a single
natural number.
Construction of dual space of given sensor network G is the first step in getting
the shortest binary representation of a program. In Sect. 6.2, there are illustrations
that provide steps to construct dual space. The function f definition is key in the
construction of dual space. But f definition needs to have physical relevance to the
measurement process which is using an N-r good sensor. To measure the same, the
first dual space of G is constructed. And Kolmogorov complexity of G is estimated.
The method mentioned above for measuring the entropy of a given G indirectly
relies on the Kolmogorov complexity of G. It employs a well-defined result from
Pontryagin duality and utilizes the Kolmogorov complexity outcome as part of its
approach.

6.8 Restricted Boltzmann Machine

A restricted Boltzmann machine (RBM) [54] is a simplification over the general


Boltzmann machine approach, in the sense of imposing more restrictions on the
structure of the graphical model. A bipartite graph is created, composed of hidden
states and observed (or visible) states. There are no connections among hidden and
no connections among visible states themselves. These restrictions force the system
to learn parameters and converge over iterations.
Each undirected edge represents dependency.
For example, in Fig. A.1 there are three hidden units and four visible units. This
is a restricted Boltzmann machine.
Restricted Boltzmann machines are probabilistic. As opposed to assigning
discrete values, the model assigns probabilities. At each point in time, the RBM is
in a certain state. The state refers to the values of neurons in the visible and hidden
layers v and h.
Both the visible and the hidden vectors are binary vectors. The hidden vector is
selected from the visible vector by application of the W matrix and setting the bits of
the h vector based on a probability from a sigmoid activation function. Similarly, the
visible vector is generated back from the h vector based on the sigmoid activation
function.
The difference between the reconstructed visible vector .vr and the actual visible
vector .vis minimized over the data set X.
Deep neural networks that perform stochastic gradient descent with huge param-
eter counts and massive data have achieved stunning triumphs over the past decade.
Figure A.2 provides details on stochastic gradient descent algorithm to compute
model parameters. The gradients of such models are typically computed using back
propagation, a technique Hinton helped pioneer.
6.8 Restricted Boltzmann Machine 77

Problem 6.8.1 The NN or CNN model is used in deep learning networks. Optimal
model design requires many items to consider and arrive at parameter value. In the
following list, locate items that are used in deep learning model design:
(a) Kernel size option is given to user.
(b) Number of layer option is given to user.
(c) Each layer user can provide number of neurons.
(d) User can provide label on each image file.
(e) User can drop some of the connection in between layers.
(f) All of the above are true but item (e) and (d).
(g) All of the above are true but item (e).

This is the point where restricted Boltzmann machines meet physics for the
second time. The joint distribution is known in physics as the Boltzmann distribution
which gives the probability that a particle can be observed in the state with the
energy E. As in physics we assign a probability to observe a state of v and h, which
depends on the overall energy of the model. Unfortunately, it is very difficult to
calculate the joint probability due to the huge number of possible combinations
of v and h in the partition function Z. Much easier is the calculation A.3 of the
conditional probabilities of state h given the state v and conditional probabilities of
state v given the state h and so on. The essential is here, energy-based probability.
Global energy
 
E= wij si sj + θi si
i<j i
. (6.7)
s1 = v1 , s2 = v2 , s3 = v3 , s4 = v4 , s5 = h1
s6 = h3 , s7 = h3

Reconstruction is different from regression or classification in that it esti-


mates the probability distribution of the original input instead of associating a
continuous/discrete value to an input example. Figure 6.10 is used to show the
above-mentioned issue in reconstruction.

Fig. 6.10 Classification vs. distribution


78 6 Model of Deep Learning Networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton created a “large, deep
convolutional neural network” (CNN) that was used to win the 2012 ILSVRC
(ImageNet Large-Scale Visual Recognition Challenge).
DL network developers focus on designing models with a reduced number of
parameters in the CNN model, thus reducing memory and execution latency while
aiming to preserve high accuracy.
“One of the most interesting features of machine learning is that it is on the
boundary of several different academic disciplines, principally computer science,
statistics, mathematics, and engineering. Machine learning is usually studied as part
of artificial intelligence, which puts it firmly into computer science. Understanding
why these algorithms work requires a certain amount of statistical and mathematical
sophistication that is often missing from computer science undergraduates.” It
appears that the convolutional neural network is a very new and yet proven tool to
model a given physical process as long as the given physical process can be captured
in the form of images or in the forming of video.
Problem 6.8.2 Error in “Image classification in Deep Learning Network Model
based method” is less compared to a human being or compared to ML-based image
classification methods. List items in the following that is useful in the mentioned
reduction in error:
(a) Deep learning network model training methods are using CNN.
(b) Deep learning network model training methods are using NN.
(c) Deep learning network model training methods are using CPU .+ GPU for
training.
(d) Deep learning network model training methods do not require feature vector.
(e) Deep learning network model training methods use too many kernel filters to
learn feature vector.

6.9 Brooks–Iyengar Algorithm for Binary Classification

Brooks–Iyengar Algorithm [55] is very similar to MLP [56, 64]. The same is shown
in a flowchart by using Fig. 6.11.
Each sensor [65] has an energy level at a given time period. The energy level of
other sensors is also expected to have energy in a similar range. However, it is not
expected to have too much of a difference in energy level from sensor to sensor.
Sensor fusion using the BI algorithm 6.11 is using a processing element (PE) to
compute the accuracy range and also the measured value estimation. Let sensor .j
be used .t sec duration to record .k samples. And also let sensor .j receive measured
values from other sensors in a given network.
The PE of a given sensor .j is using:
1. Recorded k samples (0 to .t sec) in sensor .j
2. Measured values from other sensors from 1 to N but not sensor j
6.9 Brooks–Iyengar Algorithm for Binary Classification 79

Fig. 6.11 Brooks–Iyengar algorithm

The above workflow is part of each PE at a given sensor from 1 to N.


Measurement is done for a duration t sec in a given sensor and the same measured
data is used in PE, where PE is using these samples to compute its “measured value”
by using measured values from other sensors. The assumption is that the measured
value from other sensors is a proper time sequence. Keeping the timestamp in each
measured value is another area of research and that is handled well in IEEE 1588
standard. It is assumed that the uniform interval is used to collect each sample in
a given sensor. And also, all sensors are synchronized with the clock to start the
sample collection process.
The BI algorithm removes sensors with faulty conditions and uses only sensors
with no error. BI is using heuristic algorithms or variance-based algorithms to
classify sensors. If a sensor is providing an image signal, then BM can handle it
with ease and perform sensor classification work. BI appears to be having issues
in handling image as an input, thus converting the image signal as time series data
might help the BI algorithm. However, BI algorithm extensions to handle image data
can use BM and keep the temperature equal to zero. T = 0 in the equation results in
p i = 1 for all i. Having p i = 1 for all sensors for all time is not good and the same
results in not a good model of sensor network. Thus, making T = small value results
in a model which can be used for deterministic algorithms like the BI algorithm.
80 6 Model of Deep Learning Networks

But there is increasing interest in whether the biological brain follows back
propagation or, as Hinton asks, whether it has some other way (instead of back
propagation) of getting the gradients needed to adjust the weights on its connections.

6.10 Pre-Trained Model

A pre-trained model is a deep learning model that has already been trained on a
large data set and saved. The saved model can be used as a starting point for training
new models, or it can be used directly for making predictions on new data [66–71].
Pre-trained models have become popular in deep learning due to their ability
to save time and computational resources. Instead of training a new model from
scratch, developers can use a pre-trained model as a starting point and fine-tune it
on their own data set. This can be especially useful when the data set is small or
when computational resources are limited.
Pre-trained models are often trained on large and diverse data sets, such as
ImageNet for image classification and BERT for natural language processing. These
models are usually trained using deep learning techniques such as convolutional
neural networks (CNNs) or recurrent neural networks (RNNs).
Using pre-trained models has many benefits, such as the following:
Reduced training time: Using a pre-trained model can significantly reduce the
time required to train a new model from scratch.
Improved accuracy: Pre-trained models are often trained on large and diverse
data sets, which can improve the accuracy of the model.
Transfer learning: Pre-trained models can be used for transfer learning, where
the model is fine-tuned on a smaller data set for a specific task, such as object
recognition.
Overall, pre-trained models have become an important tool in the deep learning
toolbox, allowing developers to leverage existing models and data sets to solve new
problems more efficiently.
Post-training quantization reduces computing power demand and energy con-
sumption at the expense of a slight loss in accuracy.
With sophisticated pre-training objectives with huge model parameters, large-
scale PTMs are effectively capturing knowledge from massive labeled and unlabeled
data. Knowledge is stored into huge parameters and also fine-tuning process used on
specific tasks such that precision inference is possible. Rich knowledge implicitly
encoded in huge parameters can benefit a variety of tasks in industries such as
agriculture, healthcare, transport, food, education, etc. In the recent past, the same
has been extensively demonstrated via experimental verification and empirical
analysis.
Get results sooner by using pre-trained models and scripts are used more in
translating effort into better results sooner over by “do it yourself.” Large-scale pre-
trained models (PTMs) such as BERT and GPT are used in cloud native deployment
and still these models are not very popular in embedded devices. Recently, use
6.11 Compression of DL Networks 81

of pre-trained models has been achieving great success and become an attractive
milestone in the field of artificial intelligence (AI) for enterprise business owners.
It is now the consensus of the deep learning community to adopt PTMs as
backbone for well-defined tasks rather than develop learning models from scratch.
Deploying pre-trained models is discussed and also examples are given in tutorial
sessions.

6.11 Compression of DL Networks

Model compression allows the user to run the model on tiny devices and there are
two main ways to reduce the network:
1. Lower precision (fewer bits per weight). By default, the model weights are
float32-type variables, which lead to two problems: Firstly, the model is very
large because 4 bytes are associated with each weight, with a considerable
memory requirement; secondly, the execution is remarkably slow compared to
uint8-type variables. It is possible to considerably reduce the weights from 32
bits to 8 bits, obtaining a 4x reduction in the size of the NN. TensorFlow and
Keras give the possibility to apply quantization.
2. Fewer weights (pruning). This involves creating a smaller DNN that imitates the
behavior of larger DL networks. This is done by training the smaller DL networks
using the output predictions produced from the larger one and the smaller DL
networks approximate the function learned by the larger one.
Note that post-quantization is a technique that is carried out after training the
model, but it could be done even before training. DLtrain can be used efficiently to
model the number of bits required for weights in a given CNN or NN.
As stated above, the reduction of the model size can be obtained not only
with quantization, but also with pruning techniques that allow the elimination
of connections that are not useful to the NN. This leads to a decrease of the
computation request and program memory. Quantization and pruning approaches
have been considered individually as well as jointly.
Chapter 7
Training of Deep Learning Networks

Training deep learning networks is the forge where intelligence


is honed and refined.

Insight
This section of the book details training deep learning networks.
PyTorch is one of the popular AI frameworks to model deep learning networks and
also train, test, and deploy deep learning networks.
TensorFlow is another AI framework that has similar functionalities to PyTorch.
These two mentioned AI frameworks provide low-code options for developers.
However, major limitations arise from its dependency from many other open-source
packages in Python.
DLtrain overcomes the above issues and provides a clean AI framework which
can be classified as no-code category. DLtrain is developed by using C.++ and it
is easy to port on to many platforms across silicon vendors. Moreover, DLtrain is
GPU-friendly and it can be revised for large-scale CUDA Core machines like DGX
Station A100 or higher versions. Further, it also demonstrates and showcases how to
create, build, and configure Docker images of DLtrain for large-scale CNN models
and also including the support services [72].

7.1 DLtrain Is a No-Code Deep Learning Framework

A hyperparameter plays a major role in the quality of a training deep learning


network model and also the quality of inference by using a trained deep learning
network model.
Domain experts appear to be playing a major role in setting up hyperparameters
before starting training of deep learning networks.
Domain experts are also new to hyperparameters and its associated quality of
inference. There is a challenge in setting up hyperparameters.
Challenges in setting up hyperparameters are handled in DLtrain which provides
minimal hyperparameter options to domain experts such that it enables domain

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 83


J. Singaram et al., Deep Learning Networks,
[Link]
84 7 Training of Deep Learning Networks

experts to learn quickly and take good control of training of a deep learning model
with training data set.
An image classification problem is solved by using deep learning networks and in
particular by using convolutional neural networks. The industrial segment appears
to be using CNN for image classification with enterprise quality in inference.
Being an early-stage tool set, many of them moved away from regular use and
only few of the tool set stayed back, for example, PyTorch and TensorFlow are
those two to stay in regular use. Most interestingly, these two appear to be using too
many open-source packages with version-specific nature in its functional use. The
industry segment looks for a tool set which can be customized and also free from
dependencies on open-source packages.
DLtrain is a platform designed to work to train NN and CNN models by using
image signals as a training data set. DLtrain is created by using the “nvcc” tool set
from CUDA 10.2 (NVIDIA) and DLtrain is tested in IBM Power AC922 processor
with GeForce RTX 2070 GPU hardware. DLtrain works well for a given training
image data set and performs high-speed classification of a given image during
inference.
DLtrain as in Fig. 7.1 is designed to remove most of the issues because of too
many dependencies in open-source packages to support PyTorch and TensorFlow.
DLtrain provides a good solution to developers to perform training of deep learning
networks, test, and deploy given NN and CNN models. Deployment can be on the
cloud native side and also in edge native devices such as IoT nodes and IoT edges.

Fig. 7.1 DLtrain is an autonomous deep learning framework


7.1 DLtrain Is a No-Code Deep Learning Framework 85

In Fig. 7.1, the left side shows a dependency list for the WML (Watson Machine
Learning) tool set to train deep learning networks.
The right side of the above picture shows the clean nature of DLtrain on its zero
dependency from open-source tool sets. DLtrain is capable of training CNN and NN
models of deep learning networks. In the case of WML, it appears that there are 100
plus dependencies to have successful installation of WML. But on the right side,
DLtrain does not have any dependency.
CUDA SDK is required for DLtrain to use NVIDIA CUDA Core and Tensor
Cores in GPUs such as V100, A100, etc. The enterprise application development
team can use CUDA Core and Tensor Core computing as part of their customized
tool set to train deep learning networks.
Watson Machine Learning Accelerator gives access to power-optimized versions
of all of the most popular deep learning frameworks currently available, including
TensorFlow, Caffe, and PyTorch. Watson Machine Learning Accelerator runs on
IBM Power-accelerated server HPC, a platform that runs not only on deep learning
training workloads but also on a wide variety of HPC and high-performance
data analytic workloads. It leverages unique capabilities of accelerated power
servers, delivering performance unattainable on commodity servers. For example,
a large model support facilitates the use of system memory such that there is
no performance impact in POWER9 CPU, yielding significantly larger and more
accurate deep learning models. The Watson Machine Learning Community Edition
(WML CE) is delivered as a set of software packages that can deploy a functioning
deep learning environment, potentially within a few hours by using a few simple
commands.
The DLtrain framework is ported on to POWER9 CPU with Ubuntu 22.04 OS.
DLtrain enables enterprise and academic researchers with ease of training their deep
learning network models such as NN and CNN. Most importantly, they can follow
the no coding path while using DLtrain to train deep learning network models.
Moreover, DLtrain does not use any third-party library, and thus, it is fully secured
and safe for enterprise and academic researchers to use DLtrain to run their AI
workloads.
DLtrain provides an inference engine which can be deployed in IoT edges.
Currently, we are witnessing a proliferation of specialized hardware that not only
offers better performance on deep learning tasks, but also increased efficiency
(performance per watt). The AI community’s demand for GPUs led to Google’s
development of TPUs and pushed the entire chip market towards more specialized
products. In the next few years, there will be a vendor list that includes NVIDIA,
Intel, SambaNova, Mythic, Graphcore, Cerebras, and other companies that bring
more focus to hardware for deep learning network-based training and inference
workloads.
“Bring Your Data on to Your Table” to perform training of the deep learning
model and also to deploy for your enterprise. In this process, the data set stays
within the customer premise and also it provides high security to the customer data
set.
86 7 Training of Deep Learning Networks

DLtrain is used to train deep learning models such as NN and CNN by using a
computing infrastructure that is available on your table. DLtrain provides a quick
solution for the abovementioned by using OpenPOWER/IBM Power Systems that
form a basis for “computing infra on your table.”
As previously mentioned, the computing infrastructure setup will be completed
in just a few hours. Subsequently, the development team can seamlessly deploy the
deep learning network model training workload onto this infrastructure. This entire
process is carried out adhering to the highest engineering standards, ensuring that
there are no external dependencies on obscure or untraceable software components
from open-source origins. DLtrain is developed by using C and C.++ such that it
can run best in the given CPU of various silicon vendors.
Most importantly, effort is given to make DLtrain very useful to subject matter
experts (domain knowledge holders) to bring their best via their own custom model
without doing a single line of coding.
DLtrain also provides provision to run the trained model in the above and move
to an Android phone such that large-scale deployment is feasible. After moving the
trained model to an Android phone, the application is designed to use phone camera
or local files to get the input image to perform inference in the phone locally. There
is no need to connect the camera with cloud for inferencing.
DLtrain is ported onto various silicons and the following provide more details.

DLtrain
DLtrain is ported onto many CPU and GPU combinations. For example:
1. Ported DLtrain to work in X86 with Ubuntu and also Windows 10 OS.
Tested DLtrain with training of CNN model by using the MNIST data set.
2. Ported DLtrain to work in OpenPOWER Raptor system (POWER9 CPU).
Tested DLtrain with training of a CNN model by using the MNIST data
set.
3. Ported DLtrain to work in the OpenPOWER Raptor system (POWER 9
CPU) and RTX 2070 GPU. Tested DLtrain with training of the CNN model
by using the MNIST data set.
4. Ported DLtrain to work in X86 with Windows OS. Tested DLtrain with
inference workload by using a trained CNN model and using input image
from local machine.
5. Ported DLtrain to work in Jetson Series SOMs(for example, Nano). Tested
DLtrain with inference workload by using a trained CNN model and using
input an image from the local machine.
6. Deployed trained CNN model in Android phone and successfully inference
is performed on a given local image.

GPU acceleration: takes advantage of the massively parallel architecture of GPUs


to get the biggest benefit in these algorithms.
7.2 DLtrain: Training of NN and CNN Models 87

Dynamic memory management: is using high-speed next-generation NVIDIA


NVLink connection between the POWER9 CPU and the NVIDIA Tesla V100
GPUs, due to the ability to move the large data set from the system memory to
the GPU memory much faster. Dynamic data transfer algorithm runs on the CPU to
determine which data to move next to the GPU.
Efficient cluster scaling: is feasible with a data parallel framework, which enables
developers to scale out and train with massive data sets by distributing the data
across multiple servers.

7.2 DLtrain: Training of NN and CNN Models

7.2.1 Preprocessing Data Set

Data set preprocessing is one of the most important tasks. In Fig. 7.2, the prepro-
cessing flow is provided for DLtrain, TensorFlow, and PyTorch.
Input to TensorFlow and PyTorch is Tensors; most importantly, input to Tensor-
Flow and PyTorch is not Numpy Arrays. Added to that, input Tensor to TensorFlow
is very different from input Tensor to PyTorch.
The amount of data copy and conversion effort is required for the abovemen-
tioned conversion of data from a given file to input to TensorFlow and PyTorch. In
fact, the same is highly challenging for huge data sets.

Fig. 7.2 Data set to training DL network model


88 7 Training of Deep Learning Networks

DLtrain takes the input file name of the data set. Read data from file (for example,
image) and copy to input array which is directly used by the next module to perform
training deep learning network model.
DLtrain is highly efficient in reducing the movement of data from memory to
memory. DLtrain is good for large-scale models and also for huge data sets.

7.2.2 Design Deep Learning Network Model

Developers are required to design their own custom model in the form of CNN or
in the form of NN.
For example, the NN model requires:
1. The number of neuron in the input layer
2. The number of hidden layer
3. The number of neurons on each hidden layer
4. The number of neurons in the output layer
5. The kernel size
6. The number of kernel
The abovementioned parameters can be stored in a txt file which can have a file
name as well. Designing a CNN model or NN model requires no coding. The above
given reference provides a sample value for listed parameters.
For example,
-c network._config.txt
is a file name and the same file has information about CNN.

7.2.3 Training Algorithm

The algorithm used in DLtrain to train CNN or NN is given in Fig. A.2.


The reference code for Fig. A.2 is provided in [73].
Problem 7.2.1 Compute the score (f) function: During the forward pass, the score
function computes the class scores, stored in vector f.
Problem 7.2.2 Compute loss function: The loss function contains two components—
the data loss computes the compatibility between the scores f and the labels y. The
regularization loss is only a function of the weights.
Problem 7.2.3 Compute gradient: During gradient descent, compute the gradient
of weights (and optionally on data) and use it to perform a parameter update during
gradient descent.
7.3 DLtrain Tested in POWER9 with GPU 89

7.2.4 Training Deep Learning Network Model

1. The reference code is provided in [74] and the same is used to train the CNN and
NN network.
2. The reference code is provided in [75] and the same is used to train the C NN
network.

7.2.5 Save Deep Learning Network Model

The reference code is provided in [76] and the same is used to save the trained CNN
or NN networks in a file. Section 2 of [76] handles saving for the DLtrain framework
and Section 1 of [76] handles saving for the TensorFLow framework.

7.3 DLtrain Tested in POWER9 with GPU

The deep learning accelerator is essential to fulfill computing requirements in train-


ing the deep learning network model. The Watson Machine Learning Accelerator
runs on IBM Power-accelerated server HPC, a platform that runs not only for deep
learning applications but also a wide variety of HPC and high-performance data
analytics workloads. The Watson Machine Learning Accelerator leverages unique
capabilities of accelerated power servers, delivering performance unattainable on
commodity servers, and provides for hyperparameter search and optimization
and elastic training to allocate the resources needed to optimize performance.
Distributed deep learning provides for rapid insights at massive scale. Large model
support facilitates the use of system memory with little to no performance impact,
yielding significantly larger and more accurate deep learning models.
The IBM Watson Machine Learning Accelerator for enterprise AI, a new piece
of Watson Machine Learning, makes deep learning and machine learning more
accessible to the development team and brings the benefits of AI into enterprise
business. It combines popular open-source deep learning frameworks, efficient AI
development tools, and accelerated IBM® Power Systems™ servers. Developers
can deploy a trained AI model and supported AI platform that delivers real-time
performance during inference. The Watson Machine Learning Accelerator is a
complete environment for data science as a service, enabling enterprise to bring AI
applications into production. It includes the most popular deep learning frameworks,
including all required dependencies and files, precompiled and ready to deploy.
The Watson Machine Learning Accelerator gives access to power-optimized
versions of all of the most popular deep learning frameworks currently available,
including TensorFlow and PyTorch. The abovementioned solutions are from IBM.
90 7 Training of Deep Learning Networks

Fig. 7.3 POWER9 server with RTX 2070 GPU

DLtrain is ported on power servers with GPU computing support. DLtrain is


tested in power server over a period of a few months to optimize performance during
training of a deep learning network model.
TalosTM II version of hardware with POWER9 CPU is used in porting DLtrain
and also during testing of DLtrain with MNIST data set and CNN model. Ubuntu
18.04 OS is used in Fig. 7.3 and is using a POWER9 server.

7.3.1 Build DLtrain for POWER9 Servers

The source code for DLtrain is provided in [77]. [Link] is also given in
[77].
Developers are required to use the “cmake” tool set to build Makefile. After the
successful creation of Makefile, developers were required to use “make” to create
an executable version of DLtrain for POWER9 servers. Mentioning these two steps,
cmake and make are shown in [77].

7.3.2 DLtrain to Train CNN in POWER9 Servers

The MNSIT data set is used. The version of DLtrain given in [77] handles MNIST
data set efficiently and makes use of the same in training CNN.
Hyperparameters are available for developers to choose the optimal value for a
given parameter. For example, the following parameters are available for developers:
7.4 Docker Image of DLtrain for X86 with Ubuntu 91

1 . / D L t r a i n −m t r a i n −s NewNetwork . d a t −c n e t w o r k P r o f . t x t −n
2000 −e 30 −d / home / j k / I m a g e s /

1. .−c is input. File name which has parameters of the model.


2. .−d is input. Data set folder path.
3. .−n is input. It is the number of images to use from the data set (optional default
is 10000).
4. .−e is input. It is the number of epochs (optional; the program will request it later
on if not given).
5. .−m is input. It is for training (this can have train or infer as a string).
6. .−s is output. It is a file name in which the trained model is saved.

7.3.3 DLtrain for Inference in POWER9 Servers

Inference workload is run in the POWER9 server, as given in the following:

1 . / D L t r a i n −m i n f e r −s NewNetwork . d a t −c n e t w o r k P r o p . t x t −f
img . raw

Where
1. .−c is input. File name which has parameters of the model.
2. .−d is input. Data set folder path.
3. .−m is input. It is for training (this can have train or infer as a string).

4. .−s is output. It is a file name in which the trained model is saved.

5. .−f is the name of the input file which is used for inference.

Developers are required to refer to [77] for more information on inference work.

7.4 Docker Image of DLtrain for X86 with Ubuntu

Docker is an open-source container engine and a set of tools to compose, build, ship,
and run distributed applications.
The reference code is provided for the following [78].
A drawback with this multi-platform support is that one Docker image has to be
built for each specific target platform:
1. A specific operating system
2. Hardware architecture(x86, ppc64el, arm, CUDA Core, Tensor Core, DSP, etc.)
92 7 Training of Deep Learning Networks

Developers are required to create two Docker images, one for Linux and one
for Windows. Developers are required to create each Docker image using a Docker
engine running on the specific target platform.
Few commands (to manage Docker) are provided in the above reference and the
same commands are useful to manage workflow to create the DLtrain Docker image
and also use the DLtrain Docker image to perform training of the CNN network and
perform inference on a given input by using trained CNN.

7.5 DLtrain: Train DL Models in Windows 10

DLtrain is built for Windows machines and also the same is available for use in the
following reference in GitHub.
The reference code is provided for the following [79].
There is an issue with the runtime library in the Windows machine. Steps are
given in the above reference to obtain the missing library in Windows machine to
run DLtrain successfully. The LibGCC library part creates the above issue and the
same is resolved by downloading those two files and keeping it in the path or project
folder.
DLtrain executable (for Windows OS with X86) is used to train NN or CNN
models.
The data set is placed in the path or project folder.
Developers are required to model in a file, for example, “Network._prop.txt” is a
file which can be used as input.
Output is stored as [Link] and the same file includes parameters of .W
and b.
DLtrain executable (for Windows OS with X86) is used to train the NN or CNN
model by using a data set in the path or project folder. Figure 7.4 provides the
necessary workflow to use DLtrain in Windows machine.

Fig. 7.4 DLtrain in windows machine


7.6 DLtrain: Large Model Support 93

7.6 DLtrain: Large Model Support

A deep learning network model is growing towards billions or more of neurons.


Trending applications in deep learning are using large models and training the same
with necessary data sets. But having such a large-scale model might result in good
accuracy in inference.
For example, the electricity delivery network of any huge city requires a large
model to represent its power delivery network. Each house can contribute a few
hundred million neurons to model the load pattern in a given day or night. For a
citywide power delivery network, it can be close to trillion neurons or more. To train
such a model, it requires software infrastructure to distribute training of the model.
It requires another huge model to simulate and understand real-time requirements.
Another example model is to perform weather forecasting for a given city. For
precision in terms of time and spatial location, it requires a large model to represent
the abovementioned physical process in the form of digital twin.
The DLtrain platform is equipped to support large models. Recent research work
in large model support is attracting new-generation researchers who are having good
experience in parallel computing and HPC (high-performance computing). Along
with the compute capability using CPU and GPU, it is important to have ultrahigh-
speed input and output link capability to share training data and give GPU to many
other CPUs.
Mathematical theory appears to be evolving to support large model-based parallel
computing by using CUDA Core and Tensor Cores. In parallel, neural networks
proved its worth at a high level on understanding huge data. For example, NN, CNN,
RNN, and many more models are used.
Amazing things are happening in distributed deep learning (DDL) and this is
creating new technologies in processor to processor communication. For example,
NVLINK is one such connection between CPU and GPU and also GPU and GPU.
IBM and NVIDIA are doing too many things in distributed computing. In DDL,
there is a huge amount of customization of computing load and this area of research
is coming up. This means fast computing along with fast communication between
processors.
Large model support is a very important and active area of research. The
following picture provides three types of models for networks.
The right side of Fig. 7.5 provides models that are fully random and mostly fully
connected networks. The probability of each node being connected to another node
is very high. In the middle of the picture shows the small-world network model.
There is very little probability that each node is connected to all other nodes. In
fact, in a small-world network, there is a high probability that a group of nodes are
connected and the rest are connected. In this process, there will be many small-world
networks in the middle picture. The leftmost side of the picture provides a regular
network model. Maybe the regular network is not showing good potential to model
the large model of NN or CNN.
The random network model appears to be a good choice to represent NN or CNN.
94 7 Training of Deep Learning Networks

Fig. 7.5 Fully connected networks vs. small-world networks

Fully Connected
Random network model to small-world model.
Step 1. Let us assume the “given CNN is a random network” before starting
training CNN (in the place of CNN, NN can be used as well).
Step 2. After training, there is a high probability that CNN will tend towards
the small-world network.

The abovementioned process from step 1 to step 2 indicates that the random
network becomes a small-world network after training. Parallel computing of step 1
to step 2 is challenging.
Suppose you start with a small-world network (many small networks of a given
big network) and provide all input to each small network during training.

Fully Connected
Random network model to small-world model.
R-Step 1. Train small-world networks with all or most of the given input.
Perform for all small-world networks.
R-Step 2. Combine all small-world models to obtain a big model which can
handle all given inputs and provide inference for defined labels.

The above given revised steps appear to be good for parallel computing and
also represent a large model. The above can be verified or it can be worked out
independently. But the key challenge is still open on “training large models” by
using distributed deep learning networks.
7.7 Train NN and CNN Models in TensorFlow 95

Small-world networks have direct mapping with influence matrix. Issues with
influence matrix are not known a priori. Maybe the course computation of the
influence matrix is very useful to start a small-world network and its training by
using a parallel computing infrastructure.

7.7 Train NN and CNN Models in TensorFlow

7.7.1 Setup Tool Chain for TensorFlow

Using TensorFlow requires a particular tool chain in a given computer and also
compatible versions of open-source software. In this regard detailed work is
provided in the GitHub page and developers can use the same (or vary if necessary)
to set up a working version of tool chain for TensorFlow.
The reference code is provided for the following in [80].
A virtual environment is recommended for a given project. In some sense it
is a lightweight version of a Docker image running environment like Container.
Having different versions of packages is possible if each project has its own virtual
environment.
TensorFlow 2.0 or above is recommended for new developers. In case developers
are required to support the old model (version before TF 2.0), then it will be good to
convert the old TF model into the TF 2.0 model. Keras is tightly integrated with TF
2.0 onwards, and thus, it is easy to use Keras to train a NN or CNN model by using
TensorFlow, where Keras is the layer above TensorFlow and it is making workflow
easy for developers.
Jupyter Notebook is recommended and, along with Jupyter Lab, also will help
developers during debugging time of the application development process. All these
mentioned work well in Python 3.6 or above.

7.7.2 MNIST Data Set to Train NN or CNN Model

The MNIST data set has handwritten images of numbers from 0 to 9. A large number
of images are created for each number. It has 60,000 images.
The reference code is provided for the following in [81].
Developers can use the MNIST data set during the early level of the project
and then move on to the custom data set of a given project. However, there is a
need to arrive at a data set size in terms of the number of images per label, number
of pixels per image, image width, and image height. These mentioned parameters
required critical revision because it contributes to the quality of inference of a given
application.
The above example uses the MNIST data set locally, but it can be downloaded
from multiple URL locations. Details are shared in the above reference link.
96 7 Training of Deep Learning Networks

Developers are required to partition the data set into two parts. The first part is
for the training NN or CNN model. The second part is to test the training NN and
CNN model.
For inference, developers can use a real-time image or image from file to perform
inference by using the training NN or CNN model.
Problem 7.7.1 Training the NN or CNN model by using the MNIST data set uses
a well-defined process. For example, import data set from file and transform data
into Tensor, in particular Tensor which can be used as a input to train the model via
TensorFlow training. List items from the following to perform successful use of a
MNIST data set to a train NN or CNN model by using TensorFlow:
(a) CPU is used in training workload.
(b) GPU is used in training workload.
(c) The number of epochs is set to the number which is above 100.
(d) The number of layers is taken from the “model configuration file” and used in
constructing the CNN model for training.
(e) Item (a) is always true.
(f) Item (b) may be true sometimes.
(g) Item (c) may be correct. In case yes, what will be number set to epochs?

7.7.3 Colab: Train NN and CNN Models

Colab provides an option to load the data set from various sources. Cloud annota-
tions focus on the data set creation aspect of the model development life cycle and
leaving the training part to other tool sets. For example, use TensorFlow in Colab to
train CNN.
There are many ways to train NN and CNN models, each with their own use
cases and trade-offs. Developers can train from scratch using a framework like
TensorFlow or PyTorch.
Use the following references to get the source code and sample examples to use
Colab to train deep learning networks (Reference [82] and [83]).

7.8 DLtrain for Jetson Nano Series SOM

7.8.1 Build DLtrain for Jetson Nano Series SOM

The source code for DLtrain is provided in [84]. [Link] is also given in
[84].
Developers are required to use the “cmake” tool set to build Makefile. After the
successful creation of Makefile, developers are required to use “make” to create an
7.8 DLtrain for Jetson Nano Series SOM 97

executable version of DLtrain for POWER9 servers. Mentioning these two steps,
cmake and make are shown in Section 3 of [84].

7.8.2 DLtrain to Train CNN in Jetson Nano Series SOM

7.9.2 Jetson Nano Series SOM The MNIST data set is used. The version of DLtrain
given in [84] handles the MNIST data set efficiently and makes use of the same
in training CNN. Developers are required to refer to Section 4 of [84] to get more
details on the “use of DLtrain to train CNN by using Jetson Nano.”
Hyperparameters are available for developers to choose the optimal value for a
given parameter. For example, the following parameters are available for developers:

1 . / D L t r a i n −m t r a i n −s NewNetwork . d a t −c n e t w o r k P r o f . t x t −n
2000 −e 30 −d / home / j k / I m a g e s /

1. .−c is input. File name which has parameters of the model.


2. .−d is input. Data set folder path.
3. .−n is input. It is the number of images to use from the data set (optional default
is 10,000).
4. .−e is input. It is the number of epochs (optional; the program will request it later
on if not given).
5. .−m is input. It is for training (this can have train or infer as a string).
6. .−s is output. It is the file name in which the trained model is saved.

7.8.3 DLtrain for Inference in Jetson Nano Series SOM

The inference workload is run in Jetson Nano SOM, as given in the following.
Developers are required to refer to [84] for more information on inference work.

1 . / D L t r a i n −m i n f e r −s NewNetwork . d a t −c n e t w o r k P r o p . t x t −f
img . raw

Where
1. .−c is input. File name which has parameters of the model.
2. .−d is input. Data set folder path.
3. .−m is input. It is for training (this can have train or infer as a string).

4. .−s is output. It is the file name in which the trained model is saved.

5. .−f is the name of the input file which is used for inference.
Chapter 8
Deployment of Deep Learning Networks

The true power of deep learning shines when it is deployed,


bringing AI to life.

8.1 Insight

In recent years, embedded systems started gaining popularity in the AI field. Due
to the transition of the AI and deep learning revolution from software to hardware,
embedded systems are now equipped with plug-in SOMs (System-on-Modules) that
incorporate essential components such as processors, memory, power supply, and
external interfaces. Since an embedded system is dedicated to specific tasks, design
engineers can optimize it for a given workflow and reduce the size and cost of
the product and enhance reliability and performance. They are commonly found
in consumer, cooking, industrial, automotive, medical, commercial, and military
applications.
The surge on the Internet and data has led to advanced deep learning systems, and
hence, the book also presents techniques for Internet of Things IoT in association
with deep learning networks. This section discusses and reveals the computing
infrastructure that sits on the edge of a network. More importantly in this section,
the chapter reveals the best deployment of deep learning network on IoT edge
devices and reveals the benefits of the implementation. The core areas addressed
here are how to reduce the latency, enhance the security, and communicate with less
bandwidth by deploying deep learning networks. Further, the chapter demonstrates
and details a comprehensive way to set up, install, compile, run, test, and deploy
different IoT edge devices. Through this chapter readers also understand and
gain strong learning in event data collection, flow data collection, vulnerability
assessment, network analysis, packet inspection, android deployment diagnosis, and
neural data communication with android services.
At the higher side, in this section the book presents how to set up and run the
IBM Watson Visual Recognition service in an Android device and associated visual
recognition application services. Further, deep learning network model pruning
and optimization, joint probability weight quantizer, and edge compilers are also

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 99


J. Singaram et al., Deep Learning Networks,
[Link]
100 8 Deployment of Deep Learning Networks

discussed. The chapter enumerates case studies on agriculture connected to IoT and
deep learning networks for reader understanding.

8.2 Description

Deep learning networks and the Internet of Things (IoT) edge are interconnected in
various ways [85–88]. IoT edge refers to the computing infrastructure that sits at the
edge of a network, close to the devices that generate data. These devices can include
sensors, cameras, and other data sources that produce massive amounts of data.
Deep learning networks can be deployed on IoT edge devices to process and
analyze this data in real time, allowing for faster decision-making and more efficient
use of resources. This is especially important in applications such as smart cities,
autonomous vehicles, and industrial automation, where real-time processing and
decision-making are critical.
The deployment of deep learning networks on IoT edge devices has several
advantages, including the following:
1. Reduced latency: By processing data at the edge, deep learning networks can
reduce the latency associated with sending data to a remote data center or cloud.
This is important in applications where real-time processing is critical, such as
in autonomous vehicles.
2. Improved security: By processing data at the edge, deep learning networks can
help reduce the risk of data breaches and ensure that sensitive data is kept secure.
3. Reduced bandwidth: By processing data at the edge, deep learning networks can
help reduce the amount of data that needs to be transmitted to the cloud, reducing
bandwidth requirements and associated costs.
Overall, the connection between deep learning networks and IoT edge is
crucial in enabling real-time decision-making and efficient use of resources in IoT
applications.
Intelligence in IoT edge is playing a critical role in services that require real-time
inferencing. Historically, there have been systems with a high amount of engineering
complexity in terms of deployment and also in operation. For example, SCADA is
one such system that has been working in the power generation industry, oil and
gas industry, cement factories, etc. In fact, SCADA includes humans in a loop and
makes it as supervisory control and data acquisition.
In the advent of deep learning and its success in the modern digital side, there
have been huge amounts of interest among researchers to carry deep learning models
to the abovementioned industrial verticals and trying to bring up intelligent control
and data acquisition. In the place of a supervisor, it appears that an intelligent IoT
edge is coming up to perform those tasks that are handled by human beings in
the form of a supervisor. Thus, there is immense interest in making IoT edge as
intelligent systems in these core engineering verticals apart from consumer industry
requirements.
8.2 Description 101

Fig. 8.1 Variables in IoT edge

CNN is one particular class of deep learning networks. After training CNN, it
is necessary to deploy CNN in a machine such that that inference work can be
performed on a given set of input data. Inference work can be image classification,
or object detection or sequence to sequence translation. Edge devices might have
any one of these following combinations to perform computation:
1. CPU
2. CPU+GPU
3. CPU+FPGA
4. CPU+GPU+FPGA
Variables in IoT edge are shown in Fig. 8.1. Trained DL networks might be
modified to fit in a given computing capability of IoT edge.
Emerging trend shows that “embedded devices” also have GPU along with
multicore CPU. Some OEM devices appear to be including FPGA as well. Thus,
the challenge is to port trained deep learning networks on to embedded devices and
run inference service applications.
Deployment of “trained CNN model in X86 machine” requires many items for
successful completion. The list given in the following has items that might be
essential to complete CNN deployment in X86 (CPU) machines.
Problem 8.2.1 Identify necessary items in the following list for successfully
porting CNN model on to X86 CPU processor. And also provide reason for selecting
a particular item as a part of essentials for deployment of CNN on X86 CPU.
(a) Ubuntu 22.04 OS-based devices.
(b) Python is not required in deployment devices.
(c) TensorFlow is not required in deployment devices.
102 8 Deployment of Deep Learning Networks

(d) FPGA (via PCI add-on) is not required in deployment devices.


(e) GPU (via PCI add-on) is not required in deployment devices.
(f) Item (b) and item (c) are true.
(g) Item (e) may be true sometimes.
(h) Others.

Understanding the above problem will provide guidelines to deployment engi-


neers to handle deployment of deep learning networks in a given embedded device.
The following items are used to illustrate challenges in porting deep learning
networks in a given embedded device:
1. Porting DL networks into Android phones (this work requires NDK support
along with support from the SDK of a given Android Studio)
2. Ported on to many edge computer boards such as Jetson Nano, Jetson Xavier NX,
Jetson Xavier AGX, and Jetson Orin
3. Ported DL networks on to an embedded device which has FPGA
For example, porting “trained CNN model in Android phone” has challenges to
use CPU or CPU+GPU to run the CNN model. Android applications are using Java
as a language to code applications. But the NN or CNN model is required to code
in C or in C++ such that efficient execution is possible.
Problem 8.2.2 Identify necessary items in the following list for porting the CNN
model onto an Android device. And also provide a reason for selecting a particular
item as a part of essentials for deployment of CNN on an Android device. Assume
that the inference engine source code is available in C or C++ or in both.
(a) NDK is (in an Android Studio) used to build an inference engine.
(b) Inference engine is not using GPU in Android devices.
(c) Inference engine is not using DSP in Android devices.
(d) Inference engine is capable of using an updated trained CNN or NN model from
Ubuntu machine (X86) via Wi-Fi.
(e) Inference engine is used to collect data from users.
(f) Inference engine is used to show inference output in display for a given input
data.
(g) Other.

Deployment of TensorFlow or PyTorch models in a given embedded device


requires the following listed modification on a trained DL model:
1. Optimize trained deep learning network model.
2. Truncate trained deep learning network model.
3. Quantize coefficients in trained deep learning network model.
4. Request embedded device vendor to use X86 or POWER9 CPUs.
Problem 8.2.3 Identify items in the following list for porting the CNN model on
to an embedded device. And also provide reason for selecting a particular item as
a part of essentials for deployment of CNN or NN on to an embedded device (IoT
edge).
8.2 Description 103

(a) The CNN or NN model is in Python and it is not ready to be deployed in IoT
edge.
(b) The CNN or NN model is in TensorFlow and it is not ready to deploy in IoT
edge.
(c) The trained CNN model might have floating point weights and bias coefficients.
(d) The trained CNN or NN models might have too many neurons and embedded
devices might not have resources for all neurons.
(e) IoT edge technology is very different from deep learning-based inference
technology.
(f) All the above are true.
(g) Other.

DLtrain designed to support custom models by using NN and CNN. Figure 8.2
provides a detailed workflow to deploy NN in IoT edge. DLtrain is used to train a
(CNN and NN) model with training data and validate trained (CNN and NN) models
before use in deployment in IoT edge.
In the case of deployment, there is a huge interest in making smartphones as IoT
edge such that the same device can be used without much investment during the
learning time of each learner. However, industrial deployment is expected to happen
in devices like Jetson Series GPUs, zynq ultrascale+s FPGA, mmWave Radar, etc.
The DLtrain inference source code is in C, C++.
The DLtrain inference source code is open for developers for further value
addition on the same.

Fig. 8.2 DLtrain for IoT devices


104 8 Deployment of Deep Learning Networks

Fig. 8.3 AI in IoT

Problem 8.2.4 Provide your thought process to find a method and apparatus to port
DL networks in an embedded device by using DLtrain.
REST API appears to be a method in which the client can communicate with the
inference engine which is performing inference for a given input data, where input
data comes from client applications which are written in different languages.
Intelligence IoT edge plays a critical role in services that are required for real-
time inferencing. Historically, there have been such systems with high amounts of
engineering complexity in terms of deployment and also in operation. For example,
SCADA is one such system that has been working in the power generation, oil and
gas industry, cement factories, etc. In fact, SCADA includes humans in a loop and
makes it as supervisory control and data acquisition.
Figure 8.3 provides healthcare support workers vs. use of Watson in healthcare.
During pandemic time there is a need to manage a given bed, and there is a need
to have supporting healthcare staff. The following appears to be a valid issue among
the healthcare team.
At the forefront of the battle are healthcare workers who have been struggling to cope both
physically and mentally

Once a medical Dr had said, “Bed does not treat Patients.”


Yes, it may have been a valid statement 10 years ago. But now the kind of
innovation in sensing has resulted in MEMS sensors. mmWave Radar-based vital
sign monitoring, camera-based AI inference, and robot-based delivery of food will
give a way to make a bed be autonomous without much routine help from supporting
healthcare staff.
IBM Watson Discovery service and Assistant service are used in creating a digital
assistant in the healthcare segment. Fortunately, the mentioned services and sensors
provide scope to go for accelerated deployment in clinical stage hospital and also in
quarantine facility centers. Figure 8.4 and 8.5 provides details on Watson in loop to
monitor and control a given patient.
8.2 Description 105

Fig. 8.4 AI and IoT in life

Fig. 8.5 Healthcare

The advent of deep learning and its success in the modern digital side have been
igniting a huge amount of interest among researchers to carry deep learning models
to the abovementioned industrial verticals and try to bring off intelligent control and
data acquisition.
In the place of a supervisor, it appears that intelligent IoT edges are coming up to
perform those tasks that are handled by human beings in the form of a supervisor.
106 8 Deployment of Deep Learning Networks

Thus, there is immense interest in making IoT edge as intelligent systems in these
core engineering verticals apart from consumer industry requirements. Kalman
filter has been there for 50 plus years. Moreover, it provides instant prediction
with local measurement data. Figure A.4 provides workflow in IoT edge to perform
inference by using CPU along with GPU.
1. Create an NN model.
2. Training data (mostly use MNIST).
3. Train an NN model.
4. Validate trained NN model.
5. Go for deployment.
Deep learning is good and it will outperform results that are obtained by
using Kalman filter
In the case of deployment, there is a huge interest in making a smartphone as IoT
edge such that the same device can be used without much investment during pilot
deployment time. However, industrial deployment is expected to happen in devices
like Jetson Nano, Ultra96-V2, etc.
Problem 8.2.5 Deployment of a trained CNN model in X86 machine has a well-
defined workflow for successful deployment. List items in the following such that
they are used in successful deployment:
(a) Ubuntu 18.04 OS is used in a deployment machine which is X86.
(b) Python is not required in a deployment machine which is X86.
(c) TensorFlow is not required in a deployment machine which is X86.
(d) FPGA (via PCI add-on) is not required in a deployment machine which is X86.
(e) GPU (via PCI add-on) is not required in a deployment machine which is X86.
(f) Item (b) and item (c) are always true.
(g) Item (e) may be true sometimes.

8.3 Silicon Vendors in IoT Edge Segment

Deep learning network algorithms are characterized by extensive linear algebra,


matrix, and vector data operations.
Traditional processor architectures are not optimized for deep learning inference
workloads, and hence, specialized processing architectures are necessary to meet the
low latency requirements of running complex deep learning algorithm operations.
Figure 8.6 provides information on few silicon vendors’ AI tool set. As such, factors
to be considered while choosing the edge device include balancing the model archi-
tecture (accuracy, size, operation type) requirements with device programmability,
throughput, power consumption, and cost.
Problem 8.3.1 Deployment of the TensorFlow model or PyTorch-based model in
IoT edge is required to optimize and quantize the trained CNN model. What are the
8.3 Silicon Vendors in IoT Edge Segment 107

Fig. 8.6 IoT edge silicon from silicon vendors

items required to form the following for successful completion of deployment of the
abovementioned model in IoT edge?
(a) The CNN model in Python is not ready to be deployed in IoT edge.
(b) The CNN model in TensorFlow is not ready to be deployed in IoT edge.
(c) The trained CNN model might have floating point weights.
(d) The trained CNN model might have too many neurons.
(e) IoT edge technology is very different from deep learning-based inference
technology.
(f) All the above are true.
108 8 Deployment of Deep Learning Networks

Deploying deep learning networks in a given silicon requires effort to work


with software tool chain and also silicon architecture to successfully port inference
engine. Mostly, silicon vendors have released versions of tool sets which can take
the TensorFlow model as an input and provide a model which is good for their
silicon.
Low-cost embedded processors have limited compute capability, and thus, there
is a need to obtain a deep learning network which can fit in in terms of computing
requirement and also memory requirements.
1. Extensible inference engines
2. Platform independent libraries supporting image transformations
3. Validation tools capable of guaranteeing the algorithm functionality across
different platforms
The choice of the algorithm to be used is important to run a model on
an edge device. However, this must also be coupled to an optimal choice of
hardware. The metric to be used for choosing the hardware is based on accuracy,
energy consumption, throughput, and cost. The accuracy of deep learning network
algorithms must be measured on a data set large enough to be able to affirm that the
obtained result is valid. Energy efficiency, on the other hand, is closely related to
deployment feasibility and the size of deep learning networks.
The high size and the variability of the scenario imply an increase in terms of
computation. In particular, the high size of the deep learning networks increases the
number of neurons, and, instead, programmability involves the need to access the
memory, read the weight value, and modify it. This generally involves an increase
in energy consumption.
Microcontrollers can be used for AI but implementing the algorithm on them is
challenging. They are excellent choices in IoT applications and may run networks
that are not too large for low-data fusion tasks.
A good tool to facilitate the implementation of deep learning networks on a
microcontroller is the X-CUBE-AI, suitable only for STMicroelectronics MCUs.
It is an expansion of the STM32CubeMX environment that extends the potential
of the tool, allowing an automatic conversion of pre-trained NNs to low-resource
hardware. X-CUBE-AI also optimized libraries by modifying layers and reducing
the number of weights to make the network more “memory-friendly.”
The Qualcomm Neural Processing SDK is designed to help developers run one
or more neural network models trained in Caffe, Caffe2, ONNX, or TensorFlow.
TIDL is a set of open-source Linux software packages and tools enabling offload
of deep learning (inference only) compute workloads from ARM cores to hardware
accelerators such as EVE and/or C66x DSP. The objective for TIDL is to hide
the complexity of a heterogeneous device for machine learning/neural network
applications and help developers focus on their specific requirements. In this way,
ARM cores are freed from the heavy compute load of deep learning tasks and can
be used for other roles in your application. This also allows the use of traditional
computer vision (via OpenCV) augmenting deep learning algorithms.
8.4 Deploying DL Networks in Kanshi 109

IP Packets

Send Sniff Dissect Forge

Indents
IP Packets
Assistant design
Entities capture

Dialog

Fig. 8.7 Dialog in IP networks

At the moment, TIDL software primarily enables convolution neural network


inference, using offline pre-trained models, stored in device file system (no training
on target device). Models trained using Caffe or TensorFlow-slim frameworks can
be imported and converted (with provided import tool) for efficient execution on TI
devices.
Neural network structures supported by e-AI Translator is from Renesas Elec-
tronics.
Free Source code shared in the following link can be used to deploy DL networks
in Jetson series SOMs such as Nano, TX2, Xavier and Orin. To access FREE source
of DLtrain inference engine, click on “cpuTraincmake” button in [89].

8.4 Deploying DL Networks in Kanshi

Kanshi [90] is a security audio application in IP network packets stream by using


Deep learning Networks. Figure 8.8 provides an overview on Kanshi by using IBM
Watson Assistant service in Figure 8.7.
Engineers at work invest more “effort” in innovation and less time used in coding
applications to deploy a deep learning network in edge to monitor TCP/IP traffic.
IoT edge flows represent network activity by normalizing IP addresses, ports,
byte and packet counts, and other data into flow records, which are records of
network sessions between two hosts. Flows are a differentiating component in
QRadar that provide detailed visibility into your network traffic. Figure 8.8 provides
information on QRadar.
110 8 Deployment of Deep Learning Networks

Flow aggregation
Flow pipiline Flow direction
flows
Flow sources Application
identification

Deduplication
Superflows
View flow data on the
Network Activity tab Tuning false positive events
from creating offenses
VLAN fields
Configuring a flow collector

Fig. 8.8 Flows in IP networks

What is the difference between events and flows in an IP network?


Application identification, flow direction, and superflows are part of analysis.
Perhaps the same can be defined as events in the IP network. To identify events in
IP networks, deep learning networks are used.

8.4.1 Event Data Collection

Event data collection requires a packet sniffing tool set to collect IP packets in real
time. Events are generated by log sources such as firewalls, routers, servers, and
intrusion detection systems (IDS) or intrusion prevention systems (IPS).

8.4.2 Flow Data Collection

Flows provide information about network traffic and can be sent to IoT edge in
various formats and the list is given in the following:
1. Including flow log files
2. NetFlow
3. JFlow
4. sFlow
5. Packeteer
Data provides flow arrival time, common dst port, and RFC 1700 ports 0-1023.
TAP devices provide a way to access the data flowing across a computer network,
typically for the benefit of network security and performance monitoring tools. The
monitored traffic is referred to as the “pass through” traffic and the ports used for
monitoring are called “monitor ports.”
8.4 Deploying DL Networks in Kanshi 111

For greater visibility into the network, a TAP can be placed between the router
and the switch. To begin with, port mirroring, also known as SPAN or roving
analysis, is a method of monitoring network traffic that forwards a copy of each
incoming and/or outgoing packet from one or more port (or VLAN) of a switch to
another port where the network traffic analyzer is connected. SPAN is often used
on simpler systems to monitor multiple stations at once by using the following data
format.
arp, ether, fddi, cmp,ip, p6, link, pp. radio, rarp, slip, tcp, tr, udp, wlan
In network communication, a packet typically consists of two main parts: the
header and the payload. Here’s a breakdown of the information typically found in
each:
Packet Header:
Source and Destination Addresses: The header contains information about the
source and destination IP addresses or MAC addresses, depending on the layer of
the network protocol stack (e.g., IP addresses in the network layer, MAC addresses
in the data link layer).
Packet Length: The total length of the packet, including both the header and the
payload.
Packet Sequence Number: In some cases, there may be sequence numbers to
ensure packets arrive in the correct order.
Error Checking Information: Checksums or CRC (Cyclic Redundancy Check)
values are included in the header to verify the integrity of the packet.
Protocol Information: Indicates the type of data carried in the payload (e.g., TCP,
UDP, ICMP).
Time-to-Live (TTL) or Hop Limit: Prevents packets from circulating indefinitely
by decrementing on each hop in the network.
Packet Payload:
Data: The actual content of the packet, which can include application data,
messages, or any other information being transmitted. To investigate the information
in the header and payload, various technologies and tools can be used:
Packet Sniffers/Analyzers: Tools like Wireshark, tcpdump, or Microsoft Network
Monitor can capture and analyze network packets, providing detailed insights into
both headers and payloads.
Protocol Analyzers: These specialized tools focus on specific network protocols,
making it easier to dissect and understand the information contained in the header
and payload of those protocols.
Deep Packet Inspection (DPI) Systems: DPI systems go beyond basic packet
analysis to inspect the content of packets for security, quality of service, or traffic
112 8 Deployment of Deep Learning Networks

shaping purposes. They can analyze and classify payloads based on application-
layer content.
Network Monitoring and Intrusion Detection Systems (IDS/IPS): These sys-
tems can inspect packet headers and payloads for patterns that may indicate network
intrusions or malicious activities.
Custom Software: Depending on your needs, you can develop custom software
to parse and analyze packet headers and payloads, especially when working with
proprietary or custom protocols.
Remember that investigating the payload content may require additional knowl-
edge and tools specific to the application or protocol being used in the communica-
tion.
The flow inspection level might require network packet appliances to capture
up to 10 Gbps. Packet header and payload: which information is available in the
header and packet and which technologies to use to investigate header and payload
information.
IoT edge analyzes TCP/IP traffic flow data for applications, flow direction, and
superflows. Deployment engineers also learn how to build an IoT edge flow rule and
how to perform flow searches in IoT edge.
SSH in a nonstandard port might be an issue. The header does not have extra
information on issue, but payload might have it.
IoT edge collects network activity information, or what is referred to as “flow
records.” Flows represent network activity by normalizing IP addresses, ports, byte,
and packet counts, as well as other details, into “flows,” which effectively represent a
session between two hosts. QRadar can collect different types of flows, which differ
greatly in the collected details. The following list provides available IoT edges in
the market to handle flow collection.
1. Cisco NetFlow
2. QRadar QFlow
3. QRadar Network Insights (QNI)

8.4.3 Vulnerability Assessment

Packet analysis use deep learning networks to perform real-time inference to get
vulnerability assessment (VA) information. IBM Cloud account provides use of
Watson Assistant in the development of Kanshi to perform security audit in IP
networks. IoT edge can import VA information from various third-party scanners.
IoT edge network insight appliances connect to network TAPs, SPAN, or mirror
ports to access full packet data for real-time analysis. Mostly, IoT edge network
insight appliances provide a detailed analysis of network flows to extend the threat
detection capabilities of network insight appliances.
8.4 Deploying DL Networks in Kanshi 113

IoT edge network insight appliance provides a detailed analysis of network flows
to extend the threat detection capabilities of IBM QRadar. CPU and GPU are used
in IoT edge to obtain real-time performance. More secure operating systems run on
Red Hat Enterprise Linux® version 7.9.
Berkeley Packet Filters (BPFs) provide a powerful tool for intrusion detection
analysis. Use BPF filtering to quickly reduce large packet captures to a reduced set
of results by filtering based on a specific type of traffic. Both admin and non-admin
users can create BPF filters. Build complex filter expressions by using modifiers and
operators to combine protocols with primitive BPF filters.
Cyber Physical Systems Increasingly Under Threat from “n00bs”
Throughout 2021, we observed low sophistication threat actors learn that they could
create big impacts in the operational technology (OT) space—perhaps even bigger than
they intended. Actors will continue to explore the OT space in 2022 and increasingly
use ransomware in their attacks. This targeting will occur because of the need to keep
OT environments fully operational, especially when the systems are part of critical
infrastructure. Attacks against critical OT environments can cause serious disruption and
even threaten human lives, thereby increasing the pressure for organizations to pay a
ransom. To compound the issue, many of these OT devices are not built with security at
the forefront of the design, and we’re currently seeing a massive uptick in the number of
vulnerabilities being identified in OT environments. Reference R E P O R T | M A N D I A
N T, 14 cyber security predictions for 2022 and beyond.

Investigation of cybersecurity threats using the IoT edge Analyst Workflow


provides security analysts with a new UI to investigate offenses and search for
threats.
It enables analysts to initiate a search, define operators, customize table columns,
group and sort, and define a time range using interactive modules. The following
features highlight the new investigation workflows:
1. Critical information to help inform your decision-making is one click away.
Select objects like IP addresses, log sources, events, insights, magnitude, and
more to open a side pane that provides more context and details.
2. Narrow down results in tables with filters.
3. Search for common objects like IP, hash, URL, and more with Ariel Query
Language (AQL) smart query builder and with no need to build a query.
4. Load screens and navigate between workflows with improved performance.
Analyst Workflow provides new methods for filtering offenses and events,
and graphical representations of offenses, by magnitude, assignee, and type. The
improved offense workflow provides a more intuitive method to investigate offense
to determine the root cause of an issue and work to resolve it. Use the built-in query
builder to create AQL queries by using examples and saved or shared searches, or
by typing plain text into the search field.
114 8 Deployment of Deep Learning Networks

8.4.4 IP Stream Analysis

IP stream analysis and deep learning are two fields that can be combined to create
powerful tools for analyzing network traffic and detecting anomalous behavior.
IP stream analysis involves capturing and analyzing network traffic to identify
patterns and trends, detect security threats, and troubleshoot network issues. This
can involve analyzing data at the packet level, looking at protocol headers, or
examining flow records.
Deep learning, on the other hand, is a subset of machine learning that uses arti-
ficial neural networks with multiple layers to model and solve complex problems.
It involves training the neural network on large data sets to learn patterns and make
predictions or classifications.
By combining IP stream analysis with deep learning, it is possible to create
sophisticated tools for detecting anomalies and security threats in network traffic.
For example, deep learning models can be trained on large data sets of normal
network traffic to learn patterns of behavior. These models can then be used to detect
deviations from normal behavior, which could indicate a security threat.
One example of this is using deep learning to detect distributed denial-of-service
(DDoS) attacks. By analyzing network traffic and training a deep learning model
to identify patterns of normal traffic, the model can be used to detect when traffic
patterns deviate from the norm. This can help to detect and mitigate DDoS attacks
in real time.
Overall, combining IP stream analysis with deep learning can lead to more
accurate and effective tools for network analysis and security.
Scapy is a utility for allowing a user to manipulate packets on networks. Scapy
is a powerful Python-based interactive packet manipulation program and library.
Figure 8.9 provides workflow to obtain a data set from the PCAP file.
Write a program that can use malicious pcap files as data sets and predict if other
pcaps files have malicious packets in them.
1. Download two pcap files and concatenate them to extract packet._timestamp and
packet._data.
2. Preprocess the packet._data, add labels on it, and create a training data set.
3. Create testing data set; if it is in a file, then zip them to pcap files.
4. Passing a data set of (feature, label) pairs is all that is needed from the above.
Researchers working on computer network or cyber security often need to
analyze network traffic. In that case, they use a Wireshark Packet Analyzer or
any other similar traffic analysis tools to capture and analyze packets. However,
if you want to perform data analysis, cleaning, modeling, or feature analysis and
classification for the network traffic, you might want to convert the PCAP files into
a CSV file.
1. Wireshark is an open-source cross platform software.
2. tcpdump is Linux utility.
3. Firesheep is Firefox extension.
4. Packet sniffers can store captured packets in PCAP (PacketCAPture) files.
8.4 Deploying DL Networks in Kanshi 115

Port Scan Network Scan TCP Network Scan UDP Network Scan ICMP

DoS
Pcap file

File Header Packet N


Timestamp = v19034c0 Timestamp = v19034c0
Magic_num = 0x12 other packet header fields other packet header fields
other file header fields data = [1F3....] data = [1B3....]

Timestamp data = [1F3]

Data Set

CNN model in
Tensorflow / DLtrain

Fig. 8.9 Data set for DL networks

Fig. 8.10 IP packet format

Figure 8.10 provides details on IP packet format.


Refer to file [Link] in [91] to get information on the flow capture tool set
scapy.
[Link] code is to train deep learning network model by using a PCAF
file-based data set. Refer to file [Link] in [92].
116 8 Deployment of Deep Learning Networks

8.5 Deploying DL in Android Phone

How to port trained NN or CNN models onto Android phones?


Figure 8.11 provides details to perform porting of NN on to an Android phone.
A model is required to fit in an Android phone. Moreover, the computer
capability of a phone may be the same as a host machine in which a NN or CNN
model is trained. Perhaps there is a need to quantize coefficients in models such that
it is possible to run a model in an Android phone for inference.
Resource used is listed in the following:
1. POWER9 or x86 with GPU-based DL training
2. CUDA SDK 10.1 or above
3. Inference app in Android phone
4. Watson Studio for ML app
5. Android SDK handling and working knowledge (advantage)
6. Watson studio for image classification service design and deployment
7. App in android to work with Watson VR microservice

Fig. 8.11 J7 app development


8.5 Deploying DL in Android Phone 117

8.5.1 Installing Android Studio

DLtrain is coded in C and C++. To build an inference engine in an Android phone


by using DLtrain, it is required to use NDK, where Android NDK is used to make
a library for a C and C++ source code of DLtrain. The following provides steps for
the installation of Android Studio along with the installation of NDK. Ubuntu 18.04
x86 machine is used. PC OS is Ubuntu 18.04 or higher.
Installation of dependencies is critical. For example, Android Studio requires
OpenJDK version 8 or above to be installed to the development PC system.
1. sudo apt update
2. sudo apt install openjdk-8-jdk
3. Java version

1. Install Android Studio


2. sudo snap install android-studio—classic
Recommended Android SDK version is 22 or above.
In J7 app, SDK version 29 is used to build the J7 app project.
Start Android Studio either by typing android-studio in the Development PC
terminal or by clicking on the Android Studio icon ( Activities −> Android Studio).
Installation of NDK is required to use the SDK Manager. For example, use the
SDK manager to install the following components of NDK. And also the following
components in the list are useful to build JNI for DLtrain inference engine.
Packages to install:
1. LLDB 3.1 (lldb;3.1)
2. CMake 3.10.2.4988404 (cmake;3.10.2.4988404)
3. NDK (side by side) 20.1.5948944 (ndk;20.1.5948944)
Fix for 3.1.2 or Newer Versions Developers had faced the same issue on Android
Studio 3.1.2, but a simple sync did not help. For example, the solution was a bit
different.
1. File −> Invalidate Caches −> Invalidate.
2. ( File −> Close Project.
3. Remove the project from the Android Studio project selector window.
4. Quit from Android Studio.
5. Start AS and open project again.
NDK 3.1.2 issue is discussed in [93] and Android Studio installation is discussed
in [94].
118 8 Deployment of Deep Learning Networks

Fig. 8.12 J7 app development

8.5.2 Build Inference Engine

Android studio is used to build inference engines as given in the following workflow.
Use file transfer functionality to copy created APK into Android phone. Workflow
for the same is provided in Fig. 8.12.
Windows 10 or Ubuntu 18.04 with Android Studio is used in the J7 app project,
where the latest stable version of Android Studio version is [Link] .
NDK 20.1.5948944 (ndk;20.1.5948944) is used to build JNI lib for inference engine.
Inference engine full source code is given in [95].
Update inference engine with the revised model. The model update application
source code is given in [95].
The following diagram provides information on workflow to create a J722
application in the form of APK.

8.5.3 Send CNN or NN Model to Phone

Figure 8.13 provides an IP network configuration and the same is recommended to


transfer the trained NN or CNN model to an Android phone from a host machine.
Workflow is given in Fig. 8.13, using host CPU and Android phone.
Successful operations in the above result in “deployment of trained model NN or
CNN” in Android smartphones.
Inference engine application in Android phones is designed to use the latest
model from the host.
8.5 Deploying DL in Android Phone 119

Fig. 8.13 J7 app classroom demonstration

Fig. 8.14 Model download on to Android device

The Send2Phone source code in Java runs in POWER9 machine or in X86


machine. Figure 8.14 provides workflow to download the model from the server.
Question: How to build toPhone/[Link]?
javac [Link]
Question: How to use toPhone/[Link]?
java −jar toPhone/[Link]
Send a file to the J722 app which is installed in the Android phone.

1 j k : ~ / J a n 2 8 $ j a v a − j a r t o P h o n e / SndModel . j a r
2 Open J 7 2 2 and Load
3 Enter f i l e path : j2xxxx
120 8 Deployment of Deep Learning Networks

Developers need to provide a CNN model file name.


Enter IP: [Link]
This is the IP number of an Android device which has the J7 app and also
connected via the same subnet IP address.
Done
The Send2Phone source code is given in [96].

8.5.4 Using the J7 Application in Android Phone

The following provides a guideline to use the J7 app in an Android phone.


1. Local: The trained model from the local storage of an Android phone can be
loaded into the application.
2. Go: Use the button to perform inference after the user enters their choice of
number in a given scripting window.
3. Network: Use the network button to load a “successfully trained network.” Wi-Fi
link is used to load trained models. The host application is given and the same is
required to be used along with the network button. More detail on this is given in
the following page.
4. Previous: Use the previous button to move to the earlier inference image sample
in a given list.
5. Next: Use the next button to move to the next inference image sample in a given
list.
6. Clear: Press clear button to clear inference details given in a display.

8.5.5 Mini Project 1: Inference Using GPU

Use the GPU of the Android device to perform computation in inference for a given
image as input. Many Android phones may not have GPU, but in case the Android
phone has GPU, then how is GPU used to perform computation which is part of
inference.
Problem 8.5.1 Develop CUDA Core code for a given C, C++ source code of a J7
app inference, where a C, C++ code is working well in the CPU of an Android
phone. The inference engine source code of NN/CNN is given in [95].
Objective Port J7 app inference engine C, C++ code into CUDA programming
and use CUDA cores of GPU of a given Android phone for real-time inference.
Figure 8.15 provides details on workflow for the abovementioned application.
8.5 Deploying DL in Android Phone 121

Fig. 8.15 Use GPU in J7


application

8.5.6 Mini Project 2: On Sharing Trained CNN

Objective Share the trained deep learning network model in the host PC with J7
app in the Android device.
Share the trained deep learning network model in the host PC with J7 app in the
Android device. Assume the host PC and Android device are connected via local
Wi-Fi access point. Figure 8.16 provides details on “workflow” for sharing CNN
with J7 application.
Design and develop server application in host PC and run it in PC.
J7 app has a client application on Android phones.
Host machines (Windows or Ubuntu) use DLtrain to train NN or CNN. Assume
that the MNIST data set is available in the above host machine.
A sample application is made for the above functional requirements. The source
code of the mentioned sample application is shared in [95]. Perhaps focus on the
host processor side and revise the given source code to perform better ways to
transfer the “trained deep learning model from the host PC to the Android device.”
Problem 8.5.2 Develop application in host (which can improve the above work in
Mini Project 2) such that users can transfer trained deep learning network models
such as NN or CNN from the host computer to the Android device. Assume that
both are connected via the TCP/IP network and have the same subnet address.
122 8 Deployment of Deep Learning Networks

Fig. 8.16 Sharing trained CNN with J7 application

8.5.7 Mini Project 3: Pull Trained CNN from Host

Objective Pull the trained deep learning network model into the Android device
via Wi-Fi by using an application in the Android device and also running server
application in the host processor. Figure 8.17 provides a detailed workflow to pull
the trained model from the host PC.
The J7 app is a client application that is designed to perform receiving trained
NN or CNN from the host machine by using Wi-Fi (local network).
Problem 8.5.3 Develop application in Android device such that it can automati-
cally perform synchronization to pull a revised deep learning networks from the
GitHub server or any other server.
The source code of the J7 app is given in [95].
8.5 Deploying DL in Android Phone 123

Fig. 8.17 Pull trained CNN from host

Fig. 8.18 Visual recognition in IBM Watson


124 8 Deployment of Deep Learning Networks

8.5.8 IBM Watson Visual Recognition Service

The IBM Watson Visual Recognition service is deployed in IBM Cloud. Client
application in the Android device is used to collect image data by using the device
camera and send image data to the IBM Cloud-based VR service for inference. The
mentioned visual recognition service in IBM Cloud appears to be having a status as
given in Fig. 8.18.
IBM Watson leverages unique capabilities of accelerated power servers, deliver-
ing performance unattainable on commodity servers and provides for hyperparam-
eter search and optimization, and elastic training to allocate the resources needed
to optimize performance. Distributed deep learning provides for rapid insights at
massive scale. Large model support facilitates the use of system memory with little
to no performance impact, yielding significantly larger and more accurate deep
learning models.
IBM Watson Visual Recognition service uses deep learning algorithms to analyze
images of scenes, objects, and other content. The response includes keywords that
provide information about the content. The Watson Machine Learning Accelerator, a
new piece of Watson Machine Learning, makes deep learning and machine learning
more accessible to team in the customer side and brings the benefits of AI into
customer business. It combines popular open-source deep learning frameworks,
efficient AI development tools, and accelerated IBM® Power Systems™ servers.
Now small and medium organization can deploy a fully optimized and supported
AI platform that delivers blazing performance, proven dependability, and resilience.
The Watson Machine Learning Accelerator is a complete environment for data sci-
ence as a service, enabling small and medium organization to bring AI applications
into production.
It enables rapid deployment in customer locations. The deployment process
includes most popular deep learning frameworks, including all required depen-
dencies and files, precompiled and ready to deploy. The entire AI suite has been
validated and optimized to run reliably on accelerated power servers.
It incorporates the most popular deep learning frameworks. The Watson Machine
Learning Accelerator gives access to power-optimized versions of all of the
most popular deep learning frameworks currently available, including TensorFlow,
Caffe, and PyTorch. Watson Machine Learning Accelerator runs on IBM Power-
accelerated server HPC, a platform that runs not only customer deep learning
networks but also a wide variety of high-performance computing workloads.

Important: Deprecated IBM Watson® Visual Recognition is discontinued.


Existing instances are supported until December 1, 2021, but as of January
7, 2021, you cannot create instances. Any instance that exists on December 1,
2021, will be deleted.
8.5 Deploying DL in Android Phone 125

Fig. 8.19 Visual recognition in IBM Watson

The following provides information on the visual recognition service in IBM


and also providing quick learning to create applications on visual recognition.
Figure 8.19 provides a workflow for the design and development of deep learning
application by using IBM Watson. Most importantly, there is no coding required
to create custom visual recognition. Visual recognition-based applications are
emerging across different verticals that span all engineering, medical, science, and
art departments.
Visual recognition (via deep learning) is moved in to the:
1. Hands of makers
2. Self-taught experts
3. Professional and embedded engineers
Coding skill is not required, but modeling skill is required.
1. One type of picture is color and what is another type of picture?
(a) Gray color
(b) Black and white
2. Image (or picture) file formats, and provide the names of four file formats:
(a) jpeg
(b) png
(c) bmp
(d) gif
126 8 Deployment of Deep Learning Networks

3. Image capturing (color, gray), and provide names of two types of camera:
(a) usb camera
(b) CSI camera
4. Image synthesis (drawing by using software), and provide the name of one
software that provides option to edit pic and save pic:
(a) Paint brush
(b) GNU Image Manipulation Program
5. What unit is used to measure the size of the image?
(a) Number of pixels in horizontal axis
(b) Number of pixel in vertical axis
6. How many bytes are required to store one pixel?
(a) Color [Link] bits
(b) White and black: 8 bits
(c) binary 1 bit
7. How do you create one file by using many picture files?
(a) Use zip or gz
(b) Compression tool to perform the above
Key Items
Data set, NN model, CNN model, training of NN/CNN model, deep learning model,
testing DL model, deployment of trained model, inferencing on given hypothesis
Challenges in the rollout of deep learning enabled service for enterprise require-
ments.
1. Inferencing required a well-trained deep learning model.
2. Deployment for a trained DL model in camera is not easy.
3. The cost of the camera will be high, if the camera performs inference on a given
click.
4. Training of a CNN model requires huge data set.
5. Training of a large CNN model requires IBM Watson Visual Recognition service.
Infrastructure The following list provides items that are required before starting
a project:
1. IBM Cloud account (free or paid version)
2. PC (Windows or Ubuntu machine) with Internet connection
3. One or few smartphones (Android)
4. Android Studio (Windows machine or Ubuntu machine)
5. Watson Studio Project
6. Watson Visual Recognition service
7. Cloud object storage service in IBM cloud
8.5 Deploying DL in Android Phone 127

Fig. 8.20 AI client in Android phone and use IBM Watson VR service

Figure 8.20 provides a detailed workflow for a client application in Android to


work with the IBM Watson VR service.

Step 1. Watson Studio, create an IBM Cloud account ([Link]


login).
Step 2. Use Watson Studio to create projects which can perform image classifica-
tion by using visual recognition. Refer to the associated link in [97].
Step 3. Use the visual recognition service of Watson (this is out of service). Refer
to the associated link in [97].
Step 4. Create your custom model. Refer to the associated link in [97].
Step 5. Train the custom model and deploy it in IBM Watson. Model training
might take a long time. Training session had been deployed after successful
training. Refer to the associated link in [97].
Step 6. Test custom model. Refer to the associated link in [97].
128 8 Deployment of Deep Learning Networks

Step 7. Client application in an Android phone is designed to collect image data


by using camera in Android phone and send the image to the IBM Watson
VR service for inference. The deployed model in IBM Cloud will perform
inference and send the result to the client application which is running in an
Android phone. Refer to the associated link in [97].
Successful completion of the above steps will provide many more questions and
will come up with a “data set” and model used to train with a given data set,
where image-based data set is very popular and it is emerging as a fast-growing
“unstructured data.” Data set preparation plays a major role in application quality
during inference.
Problem 8.5.4 Locate items in the following such that those items are used in IBM
Watson Studio project for image classification application:
(a) IBM Visual Recognition service.
(b) IBM Cloud Object Storage service.
(c) jpg files to train custom model.
(d) jpg file to test custom model.
(e) Java or python or C++ coding expert to work with Watson Studio project.
(f) All of the above are true but item (e).
(g) All of the above are true but item (b).

8.5.9 Build a Custom Model to Test Tomato Quality

The following problems can be formulated as an image classification problem and


train IBM Watson Visual Recognition service.
Visual data is emerging from various fields. A detailed study on each vertical
with the associated subject matter expert will result in good-quality “data set.”
Problem 8.5.5 Creation of custom model for image classification is very useful to
have high-quality inference service. In the following, list items that are used in the
IBM Watson Studio project:
(a) Minimum 15 images per label.
(b) Minimum 200 images per label.
(c) Image to be part of “Negative Class.”
(d) Image format can be in “jpg’
(e) Every label requires in one zip file that includes jpg pictures.
(f) Zip file in item (e) can be uploaded from local PC.
(g) All above is true but not item (b).
(h) All above is true but not item (a).

Problem 8.5.6 Build a custom model to test tomato quality. Reject a tomato if it
has yellow patch on it.
8.5 Deploying DL in Android Phone 129

Problem 8.5.7 Deploy an application in a mobile phone by using “custom model


for image classification.” The following items are required to solve the mentioned
problem in “custom model-based image classification”:
(a) URL to IBM Watson Visual Recognition service
(b) API key to access IBM Watson Visual Recognition service
(c) File-based picture reading or camera-based picture collection ability in phone
(d) MQTT client in Android phone
(e) Internet connection in Android phone to reach IBM Watson Visual Recognition
service
(f) None of the above required

Problem 8.5.8 Deploy custom model for image classification. The IBM Watson
Visual Recognition service is used to create custom models. And also the IBM
Watson Studio is used to train, test, and deploy custom models in IBM Cloud. User
application in Android phone can perform the following:
(a) Take a picture by using the camera and send it to IBM Watson for inferencing.
(b) Receive inferencing result from IBM Watson and display result locally.
(c) The automatic driver assistance system in a car can use items (a) and (b) such
that ADAS can help the driver.
(d) Non-real-time applications can use (a) and (b) such that the image classification
result is useful in their application.
(e) Batch processing of given images can be handled by using (a) and (b).

8.5.10 Deploying DL in FPGA (Ultra96-V2)

New-generation IoT edge for AI-driven applications uses FPGA devices to perform
real-time inference. Creating applications on FPGA requires VHDL or Verilog.
There is a challenge to run deep learning models that are trained in TensorFlow
or in PyTorch. In this regard, there is a need to use C, C++ languages to deploy deep
learning networks in FPGA.
A very-early-stage tool set is provided to deploy a deep learning network which
is trained by using TensorFlow. Perhaps this revised tool set can bring down effort
required to deploy deep learning networks in FPGA. Developers can create high-
quality IoT edge with inference ability. Input to inference engine can come from a
camera which is there in the embedded device. Figure 8.21 provides details on a
tool set which is used in porting DL networks on to Xilinx FPGA.
Custom board Ultra96-V2 uses “Zynq UltraScale + MPSoC ZU3EG A484.” The
DLtrain version of the deep learning tool set does not use:
1. “Ai model pruning and optimization”
2. “ AI model quantizer”
130 8 Deployment of Deep Learning Networks

Fig. 8.21 FPGA: IoT edge

These two mentioned efforts are plugged in model creation time such that the
model used in training is ready for deployment as well without going through the
abovementioned truncation of the trained model.
Xilinx provides:
1. Edge compiler (DNNC is used)
2. Edge run time
The above tool set provides easy options to deploy the custom model in FPGA.
OEMs can get the trained model from vendors and deploy it in the embedded device
which has FPGA. The DLtrain AI framework has a provision to use the custom
model. DLtrain is developed by using C and C++ such that it is feasible to work
with embedded devices that are using FPGA.
A neural network is designed and coded to work with POWER9 and also with the
NVIDIA RTX 2070 GPU. Customers can focus fully on training their model instead
of worrying about 400+ dependency packages for Python 3.6 and TensorFlow 2.0.
Moreover, training of the DL model in DLtrain is distributed in POWER9 and also
in GPU (via CUDA 10.1). This will make training time short and also fine-tune
hyperparameters with ease.
8.5 Deploying DL in Android Phone 131

A hyperparameter plays a major role in training time and quality of inference.


Thus, domain experts appear to be playing a major role in setting up these
parameters, but domain experts are also new to hyperparameters and its associated
quality of inference. The same provides a challenge in setting up hyperparameters.
In this aspect, DLtrain provides minimal hyperparameter options to domain
experts such that the domain expert also learns quickly and takes full control of
the training aspect of a given DL model with training data. Inference in real time
is amazing functionally in the hands of a system builder by using deep learning
technology for their IoT edges and IoT nodes.
As a prototype board, Avnet Ultra96-V2 provides a low-power IoT edge device
to perform real-time inference (image classification and object detection).
Detailed steps are given in URL [98] to bring up the abovementioned Avnet
Ultra96-V2 board. It accelerates IP creation by enabling C, C++, and System C
specifications to be directly targeted into Xilinx programmable devices without
the need to manually create RTL. Supporting both the ISE® and Vivado design
environments, Vivado HLS, provides system and design architects alike with a faster
path to IP creation.
1. Abstraction of algorithmic description, data-type specification (integer, fixed
point, or floating point), and interfaces (FIFO, AXI4, AXI4-Lite, AXI4-Stream)
2. Extensive libraries for arbitrary precision data types, video, DSP, and more. . . .
See the below section under Libraries
3. Directive-driven architecture-aware synthesis that delivers the best possible QoR
4. Fast time to QoR that rivals hand-coded RTL
5. Accelerated verification using C/C++ test bench simulation, automatic VHDL or
Verilog simulation, and test bench generation
6. Automatic use of Xilinx on-chip memories, DSP elements, and floating-point
library
Changes in the tool set are given in Fig. 8.22 and the workflow is listed in the
following:
1. [Link]
2. [Link]
3. [Link]
4. [Link]
are used in decent models. But in the following revised tool set configuration, above
mentioned items were removed. DLtrain-based NN model as an input to DNNC.
Details on emerging method is given in Figs. 8.23 and 8.24
Deploying Deep Learning Networks model in FPGA device is illustrated in the
following diagram. Deep Neural Network Compiler (DNNC) allows the productiv-
ity in deploying AI inference on Xilinx platforms. In fact, it provides a solution for
deep neural network applications. Following diagram provides DNNC tool for the
FPGA integration of Deep Neural Network and Convolutional Neural Network.
132 8 Deployment of Deep Learning Networks

Fig. 8.22 OLD method to deploy AI in edge

Compression Tool, DECENT, employs coarse-grained pruning, trained quantiza-


tion and weight sharing to address these issues while achieving high performance
and high energy efficiency with very small accuracy degradation.
DNNC is the dedicated proprietary compiler for the DPU. DNNC maps the
neural network algorithm to the DPU instructions to achieve maxim utilization of
DPU resources by balancing computing workload and memory access
Deep Neural Network Assembler (DNNAS) is responsible for assembling DPU
instructions into ELF binary code.
8.5 Deploying DL in Android Phone 133

Fig. 8.23 Emerging method to deploy AI in edge

8.5.11 Port FP32 Inference Code to INT32


 
(1) (0) (0) (0)
a1 = σ w1,0 a0 + w1,1 a1 + . . . + w1,n an(0) + b1
 n 
(1)
 (0) (0)
a1 = σ w1,i ai + b1
i=1

⎛ (1) ⎞ ⎡⎛ ⎞ ⎛ (0) ⎞ ⎛ (0) ⎞⎤


a1 w1,0 w1,1 . . . w1,n a1 b1
⎜ (1) ⎟ ⎢⎜ ⎜ (0) ⎟ ⎜ (0) ⎟⎥
⎜a2 ⎟ ⎢⎜ w2,0 w2,1 . . . w2,n ⎟ ⎜
⎟ a2 ⎟ ⎜b2 ⎟
⎟ ⎜ ⎥
⎜ ⎟ ⎢ .. ⎟ ⎜ +⎜ . ⎟ ⎥
⎜ .. ⎟ = σ ⎢⎜ . .. .. ⎜ . ⎟ ⎟ ⎥
⎝ . ⎠ ⎣⎝ .. . . . ⎠ ⎝ .. ⎠ ⎝ .. ⎠⎦
. (8.1)
(1)
am w w
m,0 m,1 . . . wm,n an(0) (0)
bm
 
a (1) = σ W(0) a (0) + b(0)
134 8 Deployment of Deep Learning Networks

Fig. 8.24 Emerging method


to deploy AI in edge

FP32 is the same as a 32-bit floating point number. DLtrain uses the following
data format:
1. wi,j is a 32-bit floating point number (FP32).
2. bj is a 32-bit floating point number (FP32).
3. ai is a 32-bit floating point number (FP32).
Above items 8.1 in list use FP32 for wi,j , bj , and ai . Performing FP32
multiplication and addition in FPGA might consume a high amount of resources.
Problem 8.5.9 Let
1. wi,j be a 32-bit integer (INT32)
2. bj be a 32-bit integer (INT32).
3. ai be a 32-bit integer (INT32).
Use the above INT32 values in Eq. 8.1 above that connects wi,j , bj , and ai . Provide
the method to match the quality of inference by using INT32 computations instead
of FP32 computations.
Problem 8.5.10 Let
1. wi,j be a 32-bit integer (INT32).
2. bj be a 32-bit integer (INT32).
3. ai be a 16-bit integer (INT16).
Use INT32 and INT16 to represent the above parameters in Eq. 8.1 above that
connects wi,j , bj , and ai . Provide the method to match the quality of inference
in INT32 (for wi,j , bj ), INT16 (for ai ) computations instead of using FP32
computations in the above equation.
Chapter 9
Tutorial: Deploying Deep Learning
Networks

The journey of deploying deep learning networks is an


exploration of the digital universe, where data and models meet
reality.

The tutorial is designed to handle workflow from data set creation, deep learning
networks model design, training the deep learning networks model, testing the
deep learning networks model, and deploying the deep learning networks model
in Internet of Things (IoT) edges and also in cloud native applications. Moreover,
there is a list of challenges involved in deploying trained deep learning networks in
IoT edges. In particular, if the application is in real-time service, then a microservice
is introduced into IoT edge. Figure 9.1 shows the steps in the tutorial.
1. Train and validate a neural network (NN), convolutional neural network (CNN)
model with a user-defined data set
2. Deployment of the NN, CNN models of the deep learning network in the IoT
edge.
For example, sub systems are used to collect real-time sensor data from
respective sources and perform inference in the IoT edge to provide micro service
to other applications.
Loading the trained deep learning network model onto embedded systems is a
challenging task and many silicon vendors appears to be providing custom-made
solutions to fit into their own silicon devices.
The tutorial provides the necessary documents in Google Drive and source code
in GitHub. Most importantly, the tutorial connects the above-mentioned assets via
a web page that is designed to support the user to navigate the tutorial session in
autonomous learning by optimal use of the resources. The tutorial provides a quick
start and guide - a person can refer to a resource document online and make quick
progress in learning how to deploy “deep learning networks in edges.” The URLs of
the necessary resources are associated with a QR code or via a reference link.
Data set processing is presented in item 1. It appears that domain knowledge in a
particular data set will help to create an effective data set to train the deep learning
network model.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 135
J. Singaram et al., Deep Learning Networks,
[Link]
136 9 Tutorial: Deploying Deep Learning Networks

Fig. 9.1 Tutorial work flow

Training the deep learning network model is presented in item 2, which has many
sub items. A few of those items require CPU.+GPU hardware such that accelerated
computing is used to train deep learning networks. Also mentioned are training tasks
illustrated by examples for using cloud native servers in Colab to train deep learning
networks or in on-premises Power 9 server clusters.
Deployment of the trained model onto an edge requires a lot more care and hard
work. The training platform is different from the deployment platform. Mostly there
is a need to perform pruning of a trained model or optimize the weights of each node
by using INT8, INT16, etc., instead of using FP32.

9.1 Prerequisites

Prerequisites to successfully completing the tutorial.


1. Ubuntu 18.04 or higher or Windows 10-based computer
2. Internet connection to go through the workflow along with team members
3. Skill in coding, mostly in Python and JavaScript is optional
4. Exposure to deep learning network model design by using TensorFlow with
Keras (optional)
5. Working knowledge of Android SDK-based project handling (advantage)
6. Neural network theory (optional)
7. Working knowledge of back-propagation algorithm (optional)
8. Image creating and editing image files
9. Working level skill in Google Drive documents
10. Microservice design and deployment on the web (advantage)
9.2 Deploying Deep Learning Networks 137

9.2 Deploying Deep Learning Networks

9.2.1 Deploying Deep Learning Networks in Cloud and Edge

The following steps are used in the deployment of Deep Learning Networks in
cloud native and also in edge native applications. The tutorial is designed so that
the given steps can be handled within 30 h. Most of the workflow requires good
attention to read a given document for a specific workflow and implement the
recommended workflow such that accelerated learning is possible in a short period
of time. However, there will be issues that might come up and they will be discussed
in trying out the new workflow.

Tutorial: 30-Hour Version


Train and deploy NN or CNN in Cloud and Edge.
1. Data set handling: 1a, 1b
2. Training deep learning networks: 2a, 2b, 2c, 2d, 2e, 3, 4
3. Inference as a microservice: 6a, 6b, 6c, 6d
4. Inference in on-premises Power 9 server cluster, IoT edge: 5a, 5b, 5c, 5d
5. Creating a user-defined custom data Set: 7b
6. Early attention in the theory of the deep learning network model: 7a, 7b,
7c, 7d

9.2.2 Deploying Deep Learning Networks in Edge Native

Edge side-engineering devices magnify open source challenges, opportunities, and


provide a new growth segment for embedded systems. The following steps are
recommended for deploying deep learning networks in edge native devices. The
tutorial is designed so that the given steps can be handled within 10 h. Embedded
engineers will find it very useful to learn steps involved in handling the deployment
of deep learning networks in edge devices.

Tutorial: 10-Hour Version


Deploying Deep Learning Networks in Edge.
1. DLtrain to train and deploy deep learning networks in an Ubuntu machine:
5d
(continued)
138 9 Tutorial: Deploying Deep Learning Networks

2. Loading the model into an IoT edge/node device: 5a


3. Local inference in the IoT edge/node device: 5b
4. Adding custom data to a Modified National Institute of Standards and
Technology (MNIST) data set: 7b

9.2.3 Deploying Deep Learning in Cloud Native

The following steps are recommended for deploying deep learning networks in
cloud native systems. The tutorial is designed to handle the required workflow
within 6 h. Cloud application engineers will find it very useful to learn the steps
involved in handling the deployment of deep learning networks in cloud-based
servers.

Tutorial: 6-Hour Version


Deploying Deep Learning Networks in the Cloud.
1. Data set handling: 1b
2. Training deep learning model: 2b
3. Inference as a microservice: 6d
4. Create, use, and define the custom data set: 7c, 7b

9.3 Deep Learning Networks, Digital Twin, Edge

The tutorial workflow uses documents from Google Drive so that a learner can refer
to a resource document online and make quick progress in learning how to deploy
“DL networks in an edge.”. The URL of a given resource is associated with a QR
code. The following is the QR code for URLs that are used in the tutorial.

9.3.1 CNN Model

item 7a in the defined workflow is handled.


Google Drive-based slides are provided for understanding the model part of the
deep learning network.
9.4 Data Set Used in Training Deep Learning Networks 139

An error in “Image classification in deep learning network model based method”


is less than a human or compared with machine learning-based image classification
methods. There is a problem included in 6.8.2 on this.
Object counting is an application that is used in multiple engineering segments.
There is a problem included in 6.4.1 on this.
Locating items in the following so that those items are used in the IBM Watson
studio project for the image classification application. There is a problem included
in 8.5.4 on this.
Creation of a custom model for image classification is very useful to have a high-
quality inference service. There is a problem included in 8.5.5 on this.
The NN or CNN model is used in deep learning networks. The optimal model
design requires many items to consider and arrive at the parameter value. There is a
problem included in 6.8.1 on this.

9.3.2 Digital Twin

Item 7c provides information on digital twin and the associated physical process.
The URL [99] offers a brief introduction to the concept of a digital twin within the
context of a deep learning network.

9.4 Data Set Used in Training Deep Learning Networks

9.4.1 Data-Set Storage in a Local Machine

Item 1b handles the workflow to store a data set in a local machine and use a locally
stored data set for training a CNN or NN model. There is no link associated with 1b
because it is trivial to handle image data from a local machine.

9.4.2 Adding Custom Image Data Along with an MNIST Data


Set

Item 7b handles the workflow to “add custom image data along with MNIST data
set.”
An MNIST data set trains an NN or CNN model by using TensorFlow. The
MNIST data set is well defined and it uses an image of hand written numbers from
.0, 1, 2 . . . 9. There is a problem included in 5.6.1 on this.
140 9 Tutorial: Deploying Deep Learning Networks

User-generated custom image data is incorporated into a provided MNIST


dataset, and then the deep learning training process (DLtrain) is executed using this
revised dataset.

9.5 Training the Deep Learning Networks Model by Using a


CPU and a GPU

Specialized hardware for accelerated computing, such as Graphics Processing Units


(GPUs) or Tensor Processing Units (TPUs), is employed to facilitate the training of
deep learning networks. In this case, the MNIST dataset is utilized to train a Neural
Network (NN) or Convolutional Neural Network (CNN) model with the TensorFlow
framework. This hardware acceleration significantly speeds up the training process
and allows for more efficient model development.

9.5.1 Training Deep Learning Networks in Colab

Item 2a handles the workflow for “Training Deep Learning Networks” in Colab.
Colab is used to train the TensorFlow model. The link [82] provides a detailed
workflow on this and the learner can use their Colab account in Google.

9.5.2 Training in Ubuntu 18.04 ×86 CPU

Item 2b handles the workflow for “Training Deep Learning Networks” in a .×86
Ubuntu machine
The URL [24] provides more information on the above task.

9.5.3 Training in Power 9 CPU + RTX 2070 GPU

Item 2d handles the workflow for “Training Deep Learning Networks” by using
Power 9 servers along with RTX 2070 GPU.
If there is access to the above system, then they can use the following link to
perform the given task on a Power 9 CPU.
The URL [77] has a CPU version.
9.8 Deploying Deep Learning Networks in an IoT Device 141

9.5.4 Training Deep Learning Networks in a Jetson Nano GPU

Item 2e handles the defined workflow on “Training Deep Learning Networks in a


Jetson Nano GPU.”
The URL [84] has details with examples.

9.5.5 Watson VR Service: Deprecated

Item 2f handles the defined workflow.


The Watson VR service is deprecated and thus it is not possible to use it to train
a custom model.

9.6 Saving Deep Learning Networks

Item 3 handles the workflow that is used in “Saving Deep Learning Networks” by
using the TensorFlow tool set.
The save model is used to store in local storage or in cloud storage. The URL
[100] provides a workflow for understanding tasks involved in storing deep learning
networks.

9.7 Loading Deep Learning Networks

Item 4 handles the defined workflow for “Loading Deep learning Networks.”
Loading a model from local storage or from cloud storage is handled at the URL
[101].

9.8 Deploying Deep Learning Networks in an IoT Device

Deployment of a trained CNN model in an .×86 machine has a well-defined


workflow for successful deployment. There is a problem included in 8.2.5 on this.
Deployment of DLtrain to train a NN or CNN in a developer machine requires
a well-defined workflow for successful completion. There is a problem included
in 4.2.1 on this.
Deployment of deep learning networks in an Android device requires a well-
defined workflow. There is a problem included in 4.2.2 on this.
Deployment of deep leaning networks in Rich Edge requires a well-defined
workflow. There is a problem included in 4.2.3 on this.
142 9 Tutorial: Deploying Deep Learning Networks

The URL of a given resource is associated with a QR code. The following is a


QR code for URLs that are used in the tutorial. For example, item 5a is linked to the
QR code URL18, where item 5a handles loading a trained deep learning network in
an IoT edge or an IoT node.
Deployment of a TensorFlow model or a PyTorch-based model in an IoT edge is
required to optimize and quantize a trained CNN model. There is a problem included
in 8.3.1 on this.
Item 5a handles the workflow that is required to load a deep learning network
model into an IoT edge. The URL [101] can be used to access detailed workflow
documentation with examples.
Item 5b is handles the workflow that is required to load a deep learning networks
model into an Android device. The URL [95] can be used to access detailed
workflow documentation with examples.
Item 5c handles the workflow using DLtrain in a Windows machine to train a
deep learning network model and also for inference. The URL [28] can be used to
access detailed workflow documentation with examples.
Item 5d handles the workflow using DLtrain in an Ubuntu machine to train a
deep learning network model and also for inference. The URL [74] can be used to
access detailed workflow documentation with examples.

9.9 Inference as a Microservice

9.9.1 Microservice Using the Flask Micro Framework

Item 6a handles the workflow that is required to deploy a deep learning network
model by using the Flask microservice. The URL [102] can be used to access
detailed workflow documentation with examples.

9.9.2 JavaScript to Run TensorFlow Models in a Browser

Item 6b handles the workflow that is required to deploy a deep learning network
model using JavaScript. The URL [103] can be used to access detailed workflow
documentation with examples.

9.9.3 Docker Image for a TensorFlow Serving Model

Item 6d handles the workflow that is required to deploy a deep learning network
model by using TensorFlow Serving. The URL [104] can be used to access detailed
workflow documentation with examples.
Glossary

DLtrain Deep Learning Model Training Platform. And also perform Inference
by using Deep Learning Networks in a given IoT Edge. 16, 29
Kanshi Name of network security audit software by using Deep Learning
networks. 21
MQTT Message Queuing Telemetry Transport .messaging protocol for the Inter-
net of Things (IoT). 19
XMPP Extensible Messaging and Presence Protocol. 20

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 143
J. Singaram et al., Deep Learning Networks,
[Link]
Appendix A
Training Restricted Boltzmann Machine

A.1 Gradient Descent Is Used to Minimize Cost Function

For a neural network, the technique of gradient descent is used to minimize the cost
function C. Here is an overview of how it works.

Cost Function

Firstly, we define a cost function:

C : [0, 1]k × [0, 1]n −→ [0, 1]


.

This function works by taking two vectors, the input to the neural network and
the predetermined correct output we want from the neural network. It then runs the
input through the entire network and then checks how much the final layer of n
neurons varies from the provided correct output. In short minimizing this function
is the goal of our optimization problem.

Neural Network

Now here is a construction of a neural network, it is important to define the parts


properly so that calculating the derivatives later on becomes trivial.
The neural network as a whole is a function
.N : [0, 1] → [0, 1] .
k n

Now N has multiple layers .N1 , N2 , . . . , Nr .


Where each layer is a function, for example, Fig. A.1 has two layers.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 145
J. Singaram et al., Deep Learning Networks,
[Link]
146 A Training Restricted Boltzmann Machine

Fig. A.1 Neural network


model

.Ni : [0, 1]ki−1 → [0, 1]ki


where .kr = n and .k0 = k.
Then the entire network is just a composition of these layers
.N = Nr ◦ Nr−1 ◦ · · · ◦ N1 .

Next, we need a logistic function .σ : R → [0, 1].


Then we can concretely define the functions .Ni .
Let .σi : [0, 1]i → [0, 1]i
be defined as
.σi (x1 , x2 , . . . , xi ) = (σ (x1 ), σ (x2 ), . . . , σ (xi )).

Using this we define

v ) = σki (Wi v⃗ + b⃗i )


Ni (⃗
.

where .Wi is the weight matrix of the edges connecting the neurons from layer i to
i + 1 and the vector .b⃗i is a vector corresponding to the biases.
.

The Derivative

Let’s take a brief diversion to define the concept of derivatives for functions in
multidimensional spaces.
If .f : Rm → Rn is a function, then the derivative is a function
.Df : R
m → L(Rm , Rn ).

where .L(Rm , Rn ) stands for the space of all linear maps from .Rm to .Rn , in other
words the space of all .n × m matrices over real numbers.
A Training Restricted Boltzmann Machine 147

The function Df assigns to every point .x ∈ Rm a linear map .Df (⃗ x ) which is the
best linear approximation of f at .x⃗.
In other words it assigns a matrix to every point which at that point is the best
linear approximation of f . Concretely we can define the derivative as follows if
x ) = (f1 (⃗
.f (⃗ x ), f2 (⃗ x )) and .x⃗ = (x1 , x2 , . . . , xn )
x ), . . . , fm (⃗
and the partial derivatives are defined like

∂fi (⃗
x) x + t x⃗j ) − f (⃗
f (⃗ x)
. = lim
∂ x⃗j t→0 t

where .x⃗j = (0, 0, . . . , 1, 0, . . . , 0) where 1 is in the j th position.


Then the derivative is
⎡ ∂f1 (⃗x ) x) ⎤
∂ x⃗1 ∂ ∂f∂1x⃗(⃗
x)
... ∂f1 (⃗
∂ x⃗n
⎢ ∂f2 (⃗
x)
2

⎢ ∂ x⃗1 ⎥
Df (x) = ⎢ ⎢ ⎥
. .. .. ⎥
⎣ . . ⎦
∂fn (⃗
x) ∂fn (⃗
x) ∂fn (⃗
x)
∂ x⃗1 ∂ x⃗2 ... ∂ x⃗m

This derivative also follows the beloved chain rule which we will now exploit.
We define the Hadamard product of two matrices of the same dimension as
.A ⊗ B = [aij · bij ].

Suppose we have a function


.f : [0, 1] → [0, 1]

and we define
.fi : [0, 1] → [0, 1]
i i

in a similar way from before; then


.D(f ◦ g)(⃗ ⃗ = (f ' )i (g(⃗
x )(h) x )) ⊗ Dg(⃗ ⃗
x )(h).
This is saying that the best linear approximation of .fi ◦ g at .x⃗ is the map that
takes .h⃗ to the vector
'
.(f )i (g(⃗x )) ⊗ Dg(⃗ ⃗
x )(h).
Now we want to compute the derivatives of the cost function with respect to the
weights and biases. The cost function is

x , o⃗) = d(N(⃗
C(⃗
. x ), o⃗)

where d is some function with range .[0, 1] that tells how far apart two vectors are.
Now the derivative with respect to the weights in the ith layer can be calculated as

x , o⃗) = d(Nr ◦ Nr−1 ◦ · · · ◦ Ni (Wi a⃗ + bi ), o⃗)


C(⃗
.

Then we apply the D operator but we differentiate with respect to .Wi that is we
assume everything else is a constant so we get the following by the repeated chain
rule:
148 A Training Restricted Boltzmann Machine

d ' (N (⃗
. x ), o⃗)DNr (Nr−1 ◦ · · · ◦ Ni (Wi a⃗ + bi ))DNr−1 (Nr−2 ◦ · · · ◦ Ni (Wi a⃗ + bi ))

. . . . DNi (Wi a⃗ + bi ))⃗


a

We can compute this for a value of i to see what it looks like; let .i = r; then this
map is as follows:

W → d ' (N (⃗
. x ), o⃗)((σ ' )kr (Wr a⃗ + br ) ⊗ W a⃗

Computationally, this can be hard to program so an easier way to compute the


derivative with respect to the weights is as follows:
Let .Mi = Wi a⃗ + bi
where .a⃗ is the activation values of the .i − 1 layer of nodes.
Then the derivative of C with respect to .Mr is
∂C '
.
∂Mr+1 Wr+1 (σ )kr (Mr )
∂C
This is then used recursively from a top-down approach to compute all . ∂Mi
; then
we can compute

∂C ∂C ∂Mi ∂C
. = = a⃗
∂Wi ∂Mi ∂Wi ∂Mi

∂C ∂C ∂Mi ∂C
. = =
∂bi ∂Mi ∂bi ∂Mi

A.2 Score and Loss Functions

(Fig. A.2)

A.3 Data Flow in Computation of W

(Fig. A.3)

A.4 Use of GPU to Compute W

(Fig. A.4)
A Training Restricted Boltzmann Machine 149

Fig. A.2 Score and loss functions in training CNN


150 A Training Restricted Boltzmann Machine

Fig. A.3 Training NN or CNN


A Training Restricted Boltzmann Machine 151

Run Inference Load in Edge

IoT Edge Android Device,


C++, C and Java Jetson Nano

ppc64le, DSP, X86,ARM

nvcc and nvidia driver


are used for a given GPU

Block ++ GPU CUDA Core GPU Tensor Core

Threads are assigned to


SM in block granularity

Scheduling SPs or SIMTs till all warps

Use 64 ore more SP (or


SIMT) to run one Warp

run 32 Threads (in 32 SPs )

Till all warps


Write in Shared Memory
in a Bloack

Fig. A.4 Inference in IoT edge


References

1. C. Wang, S.S. Iyengar, K. Sun, AI Embedded Assurance for Cyber System, 1st edn. (Springer
Nature, Berlin, 2023)
2. I.S. Sitharama, A. Sabharwal, F.G. Pin, C.R. Weisbin, Asynchronous production system
for control of an autonomous mobile robot in real-time environment. Applied Artificial
Intelligence an International Journal 6(4), 485–509 (1992)
3. P. Santosh, R. Buyya, K.R. Venugopal, S.S. Iyengar, L.M. Patnaik, Searching for the iot
resources: fundamentals, requirements, comprehensive review and future directions. IEEE
Commun. Surv. Tutorials 20(3), 2101–2132 (2018)
4. S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M.P. Reyes, M.-L. Shyu, S.-C. Chen, S.S.
Iyengar, A survey on deep learning: algorithms, techniques, and applications. ACM Comput.
Surv. (CSUR) 51(5), 92 (2018)
5. H. Tian, S. Pouyanfar, J. Chen, S.-C. Chen, S.S. Iyengar, Automatic convolutional neural
network selection for image classification using genetic algorithms, in In 2018 IEEE
International Conference on Information Reuse and Integration (IRI) (IEEE, New York,
2018), pp. 444–451
6. S.K. Ramani, S.S. Iyengar, Evolution of sensors leading to smart objects and security issues
in iot, in In International Symposium on Sensor Networks, Systems and Security (Springer,
Cham, 2017), pp. 125–136
7. I. Vasanth, S.S. Iyengar, N. Paramesh, G.R. Murthy, M.B. Srinivas, Machine learning and data
mining algorithms for predicting accidental small forest fires, in In The Fifth International
Conference on Sensor Technologies and Applications (2011), pp. 116–121
8. A.U. Rajendra, P.S. Bhat, S.S. Iyengar, A. Rao, S. Dua, Classification of heart rate data using
artificial neural network and fuzzy equivalence relation. Pattern Recogn. 36(1), 61–68 (2003)
9. M.M. Htay, S.S. Iyengar, S.Q. Zheng, t-error correcting/d-error detecting (d > t) and
all unidirectional error detecting codes with neural network. ii, in Proceedings of the
International Conference on in Information Technology: Coding and Computing, 2002 (IEEE,
New York, 2002), pp. 383–389
10. Y. Xia, S.S. Iyengar, N.E. Brener, An event driven integration reasoning scheme for handling
dynamic threats in an unstructured environment. Artif. Intell. 95(1), 169–186 (1997)
11. N. Krishnakumar, S.S. Iyengar, R. Holyer, M. Lybanon, An expert system for interpreting
mesoscale features in oceanographic satellite images. Int. J. Pattern Recognit. Artif. Intell.
4(03), 341–355 (1990)
12. S.I. Newsletter, What is low-code/no-code application development?. [Link]
insights/[Link]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 153
J. Singaram et al., Deep Learning Networks,
[Link]
154 References

13. S. E. I. in Carnegie Mellon University, Artificial intelligence engineering. [Link]


[Link]/our-work/artificial-intelligence-engineering/
14. J. S, Virtual environment for python applications (2018). [Link]
dltrainBook/tree/jk/Tool-Set/TF
15. J. S, Advanced vector extensions for computing: used in tensorflow (2018). [Link]
com/DLinIoTedge/dltrainBook/tree/jk/Tool-Set/AVX
16. J. S, Tensorflow in virtual environment (2018). [Link]
tree/jk/Tool-Set/TF
17. [Link]. Keras tutorial: The ultimate beginner’s guide to deep learning in
python (2022). [Link]
18. J. S, Remote notebook and its associated workflow (2018). [Link]
dltrainBook/tree/jk/Tool-Set/Near-Edge
19. J. S, Latex document handling (2018). [Link]
20. J. S, Setting up AI computer (Jetson Nano) (2018). [Link]
dltrainBook/tree/jk/Tool-Set/jetson/setup
21. J. S, IBM Watson machine learning: Community edition (2019). [Link]
DLinIoTedge/dltrainBook/blob/jk/Tool-Set/Power9cpu/[Link]
22. N. Kumar, Dltrain source code in c, c++ (2018). [Link]
dltrainBook/tree/jk/DLtrain/Ubuntu/C-ConvNov22
23. J. S, Workflow to build dltrain to perform inference in x86 with ubuntu os to train
deep learning networks (2018). [Link]
Ubuntu/C-ConvNov22/cpuInfer
24. J. S, Workflow to build dltrain for x86 with ubuntu os to train deep learning networks
(2018). [Link]
cpuTrain
25. J. S, Workflow to handle docker tool set (2018). [Link]
tree/jk/DLtrain/Docker
26. J. S, Workflow to build dltrain to perform deep learning networks model training and inference
in power 9 CPU with ubuntu OS (2018). [Link]
DLtrain/Power9
27. J. S, Workflow to build dltrain to perform deep learning networks model training and inference
in Jetson Nano with Ubuntu OS (2018). [Link]
DLtrain/Jetson-Nano
28. J. S, Workflow to build dltrain to perform deep learning networks model training and infer-
ence in windows !0 (2018). [Link]
Windows
29. J. S, Using docker image of dltrain (2018). [Link]
jk/DLtrain/Docker
30. J. S, Using dltrain in near edge for deep learning networks training and inferenceing (2018).
[Link]
31. J. S, Dltrain in Jetson Nano and also tensorflow lite in Jetson Nano (2018). [Link]
DLinIoTedge/dltrainBook/tree/jk/Tool-Set/jetson/TFlite
32. O. Foundation, Openpower foundation for hardware (2018). [Link]
org/the-next-step-in-the-openpower-foundation-journey/
33. R.C. Systems, Raptor computing systems: Talos™ II (2017). [Link]
content/TLSDS3/[Link].
34. J. S, Power 9 CPU information (2018). [Link]
Tool-Set/Power9cpu
35. P. Kennedy, Explaining the baseboard management controller or BMC in servers
(2018). [Link]
or-bmc-in-servers/
36. J. S, Examples to handle GPU device (2018). [Link]
tree/jk/Tool-Set/gpu
References 155

37. J. S, Handling deployment of deep learning networks in edge devices (2018). [Link]
[Link]/dltrain/deploy-dl-networks/edge-native-service/j7-app
38. J. S, Setting up edge native Jetson Nano AI computer (2018). [Link]
tool-set/setup-jetson-nano
39. G.S. Thejas, Y. Hariprasad, S.S. Iyengar, N.R. Sunitha, P. Badrinath, S. Chennupati, An
extension of synthetic minority oversampling technique based on kalman filter for imbalanced
datasets. Mach. Learn. Appl. 8, 100267 (2022)
40. A.S. Nasreen, Dr, S. Iyengar, Deep learning based object recognition in video sequences.
International Journal Of Computing and Digital System, 11(1), (2022)
41. J. S, Pre-processing and 2d filter (2019). [Link]
jk/Data-Set/Pre-processing/2Dfilter
42. J. S, Data set pre-processing (2019). [Link]
Data-Set/fashionMNIST
43. J. S, Data set pre-processing (2019). [Link]
Data-Set/MNIST
44. J. S, Pre-processing and normalization (2019). [Link]
tree/jk/Data-Set/Pre-processing
45. J. S, Pre-processing and normalization (2019). [Link]
tree/jk/Data-Set/CSV
46. M. Binkowski, J. Donahue, S. Dieleman, A. Clark, E. Elsen, N. Casagrande, L.C. Cobo,
K. Simonyan, High fidelity speech synthesis with adversarial networks (2019). [Link]
org/pdf/[Link]
47. S. Karagiannakos, Speech synthesis: a review of the best text to speech architectures with
deep learning (2021). [Link]
48. A. Brown, Text to speech—lifelike speech synthesis demo (part 1) (2021).
[Link]
f991ffe9e41e
49. J. S, tcpdump for flow capture (2018). [Link]
Edge/Kanshi/FlowCapture/[Link]
50. J. S, scapy is used to convert PCAP file to CSV file (2018). [Link]
dltrainBook/blob/jk/Edge/Kanshi/FlowCapture/[Link]
51. G.E. Hinton, How a boltzmann machine models data, deep learning (2017). [Link]
[Link]/watch?v=kytxEr0KK7Q
52. W. Wolf, A thorough introduction to boltzmann machines (2018). [Link]
10/20/thorough-introduction-to-boltzmann-machines/
53. R. Salakhutdinov, G. Hinton, Deep boltzmann machines, in Proceedings of the 12th Interna-
tional Conference on Artificial Intelligence and Statistics (AISTATS) 2009, Clearwater Beach,
Florida, USA, (Department of Computer Science University of Toronto, Toronto, 2009)
54. R. Salakhutdinov, A. Mnih, G.E. Hinton, Restricted boltzmann machines for collaborative
filtering, in Appearing in Proceedings of the 24th International Conference on Machine
Learning, Corvallis (University of Toronto, Canada, 2007)
55. R.R. Brooks, S.S. Iyengar, Robust distributed computing and sensing algorithm (1996)
56. S.S. Iyengar, R.R. Brooks, J. Chen, Automatic correlation and calibration of noisy sensor
readings using elite genetic algorithms. Artif. Intell. 84(1–2), 339–354 (1996)
57. M. I. to Deep Learning 6.S191 and L. 2, Mit 6.s191: Recurrent neural networks, transformers,
and attention (2023). [Link]
58. F. Soler-Toscano, H. Zenil, J.-P. Delahaye, N. Gauvrit, Calculating kolmogorov complexity
from the output frequency distributions of small turing machines (2017). [Link]
org/plosone/article?id=10.1371/[Link].0096223
59. J.S, P.S.S. Iyengar, P.N.K. Chaudhary, Sensor fusion and pontryagin duality, in International
Conference on Information Security, Privacy and Digital Forensics (ICISPD 2022), Goa
(National Forensic Sciences University (NFSU), Goa Campus, 2022)
60. S. Gogioso, W. Zeng, Fourier transforms from strongly complementary observable (2018)
61. D. Su, The fourier transforms for locally compact abelian groups (2016)
156 References

62. G.E. Hinton, Boltzmann machines (2007). [Link]


readings/[Link]
63. S. Lonkar, Training an mlp from scratch using backpropagation for solving mathematical
equations. [Link]/vHR67
64. P. Sniatala, S.S. Iyengar, S. Ramani, Evolution of Smart Sensing and the need for Tamper
Evident Security, 1 edn. (Springer, Berlin, 2021). ISBN: 978-3-030-77764-7
65. B. Ao, Y. Wang, L. Yu, R.R. Brooks, S.S. Iyengar, On precision bound of distributed fault-
tolerant sensor fusion algorithms. ACM Comput. Surv. 49(1), Article 5 (2016)
66. J.R. Benton, S.S. Iyengar, W. Deng, N. Brener, V.S. Subrahmanian, Tactical route planning:
new algorithms for decomposing the map. Int. J. Artif. Intell. Tools 5(01n02), 199–218 (1996)
67. M. Mastriani, S.S. Iyengar, K.L. Kumar, Bidirectional teleportation for underwater quantum
communications. Quantum Inf. Process 20(1), 1–23 (2021)
68. G.S. Thejas, S. Dheeshjith, S.S. Iyengar, N.R. Sunitha, P. Badrinath, A hybrid and effective
learning approach for click fraud detection. Mach. Learn. Appl. 3, 100016 (2021)
69. M.S. Roopa, S. Pattar, R. Buyya, K.R. Venugopal, S.S. Iyengar, L.M. Patnaik, Social internet
of things (SIoT): foundations, thrust areas, systematic review and future directions. Comput.
Commun. 139, 32–57 (2019)
70. S.S. Iyengar, S.K. Ramani, B. Ao, Fusion of the Brooks–Iyengar algorithm and blockchain in
decentralization of the data-source. J. Sens. Actuator Netw. 8(1), 17 (2019)
71. G.S. Thejas, K.G. Boroojeni, K. Chandna, I. Bhatia, S.S. Iyengar, N.R. Sunitha, Deep
learning-based model to fight against ad click fraud, in In Proceedings of the 2019 ACM
Southeast Conference (ACM, New York, 2019), pp. 176–181
72. Y. Hariprasad, K.J. Latesh Kumar, L. Suraj, S.S. Iyengar, Boundary-based fake face anomaly
detection in videos using recurrent neural networks, in In Proceedings of SAI Intelligent
Systems Conference (Springer, Berlin, 2023), pp. 155–169
73. J. S, Algorithm used in dltrain to train CNN or NN (2019). [Link]
dltrainBook/tree/jk/DLtrain/Algorithm
74. J. S, Dltrain used in training CNN or NN models (2019). [Link]
dltrainBook/tree/jk/DLtrain/Ubuntu/C-ConvNov22
75. J. S, Dltrain used in training NN models (2018). [Link]
dltrainBook/tree/jk/DLtrain/Ubuntu/DLtrainY19
76. J. S, Dltrain used in training CNN or NN models (2019). [Link]
dltrainBook/tree/jk/Model/Save
77. J. S, Dltrain for power 9 servers (2018). [Link]
DLtrain/Power9
78. J. S, Dltrain: Docker image for x86 with ubuntu (2019). [Link]
dltrainBook/tree/jk/DLtrain/Docker
79. J. S, Dltrain : Train DL models in Windows 10 (2019). [Link]
dltrainBook/tree/jk/DLtrain/Windows
80. J. S, Setup tool chain for tensorflow (2019). [Link]
tree/jk/Tool-Set/TF
81. J. S, Mnist data set used to train NN or CNN model (2019). [Link]
dltrainBook/tree/jk/Data-Set/mnistLocal
82. J. S, Colab to train tensorflow model and also to train PyTorch model (2018). [Link]
com/DLinIoTedge/dltrainBook/tree/jk/Tool-Set/UseColab
83. J. S, Colab to train tensorflow model (2018). [Link]
84. J. S, Dltrain used in Jetson Nano to train CNN or NN models (2019). [Link]
DLinIoTedge/dltrainBook/tree/jk/DLtrain/Jetson-Nano
85. B. Shi, S.S. Iyengar, Mathematical Theories of Machine Learning—Theory and Applications,
1st edn. (Springer Nature, Berlin, 2019)
86. M.H. Amini, K.G. Boroojeni, S. Iyengar, P.M. Pardalos, F. Blaabjerg, A.M. Madni, Sus-
tainable Interdependent Networks II: From Smart Power Grids to Intelligent Transportation
Networks, 1st edn. (Springer, Berlin, 2019). ISBN-13: 978-3-319-98922-8
References 157

87. A.K. Belman, T. Paul, L. Wang, S.S. Iyengar, P. Sniatała, Authentication by mapping
keystrokes to music: the melody of typing, in AISP’20-International Conference on Artificial
Intelligence and Signal Processing (2020)
88. Iyengar, S. Sitharama, S. Gulati, J. Barhen, Smelting networks for real time cooperative
planning in the presence of uncertainties, in In Applications of Artificial Intelligence VI, vol.
937 (International Society for Optics and Photonics, New York, 1988), pp. 586–594
89. J. S, Dltrain model based inference service in Jetson Nano (2019). [Link]
home/jkevents/baranovichi/inference/jetsonnano-dltrain
90. J. S, Kanshi for TCP IP network safety (2021). [Link]
edge/kanshi
91. J. S, scapy for flow capture (2018). [Link]
Edge/Kanshi/FlowCapture/[Link]
92. J. S, PCAF file is used to train NN model (2018). [Link]
dltrainBook/blob/jk/Edge/Kanshi/DeepLearning/[Link]
93. J. S, Android ndk 3.1.2 installation issue (2018). [Link]
34353220/how-do-i-select-android-sdk-in-android-studio
94. J. S, Installation of android studio in ubuntu (2018). [Link]
android-studio-on-ubuntu-18-04/
95. J. S, Deploy trained cnn in android phone for real time inference on hand written numbers
(2018). [Link]
96. J. S, Share trained model from host pc to j7app application in android device (2018). https://
[Link]/DLinIoTedge/dltrainBook/tree/jk/Edge/Send2Phone
97. J. S, IBM Watson visual recognition service (2018). [Link]
vendors/ibm-watson-vr
98. J. S, Deploy in xilinx zynq ultrascale+ mpsoc zu3eg a484 (2018). [Link]
DLinIoTedge/dltrainBook/tree/jk/Edge/FPGA
99. J. S, Digital twin models and its association with deep learning networks model (2018).
[Link]
100. J. S, Save deep learning networks (2018). [Link]
save-dl-networks
101. J. S, Load deep learning networks (2018). [Link]
networks/load-dl-networks
102. J. S, Micro service using the flask micro framework (2018). [Link]
deploy-dl-networks/cloud-native-service/flask-micro-service
103. J. S, Javascript to run tensorflow models in browser (2018). [Link]
deploy-dl-networks/cloud-native-service/inference-via-javascript
104. J. S, Deploy deep learning network model by using serving of tensorflow (2018). [Link]
[Link]/dltrain/deploy-dl-networks/cloud-native-service/deploy-in-cloud
Index

A Convolution 3D array, 9
AI framework, 17, 83, 130 Critical information, 113
AI hardware, vii, 44–45 CUDA cores, 40, 41, 44, 83, 85, 91, 93, 121
AI model pruning and optimization, 129 CV libraries, 49
AI model quantizer, 129 Cyber physical systems, 113
Android phone, 11, 35, 44, 48, 86, 102,
116–134
Artificial intelligence (AI), vii, viii, 1–6, 8, D
11–20, 23, 25–29, 31, 35–38, 45–48, Data labelling, 49–62
50, 78, 81, 85, 89, 99, 105, 106, 108, Data set, 4, 6–8, 14–15, 17, 28, 46, 49–63, 73,
124, 127, 129, 131–134 76, 80, 84–88, 90–93, 95–97, 108, 114,
Audio, speech image, 49 115, 121, 126, 128, 135, 137–140
Autonomous vehicles, 10, 13, 100 Deep AI, 131
Deep learning model, 7, 17, 28, 51, 63, 80,
84–86, 89, 92, 100, 102, 105, 114, 115,
B 121, 127, 129, 131, 138
Bernoulli experiment, 54, 57 Deep learning networks model, 87, 131,
Boltzmann distribution, 7, 15, 68–73, 77 135–142
Boltzmann machine, 63, 64, 72–78 Deep neural networks (DNNs), 5, 6, 8, 81, 131
Brooks–Iyengar algorithm, 63, 74, 75, 78–80 Deep programming, 6–7
Deploying deep networks, 12, 109–142
Deployment, v, vii, viii, 1, 9–14, 16–21, 24,
C 28, 32, 33, 35, 45, 47–49, 63, 84, 86,
Caffe, 28, 85, 108, 124 99–138, 141, 142
CNN model, 17, 31–32, 44, 47, 48, 59, 77, Designing and machine learning training, vii,
78, 83, 84, 86–90, 92, 95–96, 101–103, 7, 15, 63, 78, 88
106, 116, 118–122, 126, 135, 138–142 Deterministic network, 74
Compression DL networks, 81 Development of learning networks, 63
Computer science, 1, 2, 78 DGX Station, 35, 45, 46
Convolutional neural networks (CNNs), 1, 7–9, DGX Station for DL networks, 46–47
15, 17, 29, 31–32, 44, 46, 53, 55, 57, Distributed deep learning (DDL), 28, 89, 93,
63, 78, 80, 81, 83–86, 88–94, 96, 97, 94, 124
101–103, 109, 118–123, 131, 135, 137, Distributed execution, 25
142, 151, 152 DL in IoT edge, 12, 17, 99–101, 138, 141–142

© The Editor(s) (if applicable) and The Author(s), under exclusive license 159
to Springer Nature Switzerland AG 2024
J. Singaram et al., Deep Learning Networks,
[Link]
160 Index

DLtrain, 1, 13, 15–17, 29–32, 44, 46, 59, M


81, 83–97, 103, 104, 108, 117, 121, Mathematical models, 2
129–131, 134, 137, 140–142 MATLAB, 13
Maxwell-Boltzmann techniques, 49
Message Queuing Telemetry Transport
E (MQTT), 19–21, 129
Edge compiler, 99, 130 Mini Projects, 120–122
Edge computing devices, 17 Multilayer neural network, 74
Edge servers, 35, 40 Multilayer perceptron (MLP), 54, 74, 75, 78
Embedded devices, 16, 17, 25, 33, 80,
101–104, 129, 130
N
Natural language processing (NLP), vii, 6–10,
G 51, 80
GeForce RTX 2070, 38–40, 84 NetFlow, S Flow, 110, 112
Gibbs sampling, 7, 68–71 Network models, 1, 15, 16, 29, 59, 62, 64, 78,
GPU for real-time, 27 83, 85–90, 93, 94, 99, 102, 108, 113,
GPU in run time, 44 120, 121, 135–137, 139, 142, 148
Neural network (NN), 5–9, 15, 17, 29, 32,
H 45, 47, 48, 53, 59, 61, 63, 74, 84–89,
Hardware for DL network, 35–48 92–96, 102, 103, 108, 109, 114, 116,
Hopfield model, 74–75 118–121, 126, 130, 132, 135–137, 140,
Human brain, 2, 5, 8, 9 147–148, 152
No-code, 12, 13, 21, 83–87
Nonlinear relationships, 8
I
IBM Watson, 18–21, 23, 28, 35, 89, 123,
125–129, 139 O
IBM Watson Virtual Recognition Services, Open-source, vii, viii, 12, 15, 23, 25, 28, 37,
124–129 83, 84, 86, 89, 91, 95, 108, 114, 124,
Image recognition, 7, 9, 10 137
Inferences by using DL training, 116 Open-source tools, 1, 13, 18, 23, 32, 85
Installing Android Studio, 117
Intrusion detection systems (IDS), 110
IoT edge devices, 27, 100, 102, 131 P
IP addresses, 20, 32, 109, 111–113, 120 Packet sniffing tool, 110
IP packets, 110, 115 Pixel normalization, 49, 60
IP stream analysis, 62, 114–115 Poisson experiment, 55
Porting PyTorch, 17
Python, API, 25
J Python-based model, 114
JupyterLab, 27, 95 Python project, 23
PyTorch, 1, 15–17, 23, 28, 46, 60, 62, 83–85,
87, 89, 96, 102, 106, 124, 129, 142
K
Keras in TensorFlow, 26
Kolmogorov complexity, 63, 75–76 R
Kubernetes, 20 Recurrent neural networks (RNNs), 7, 8, 74, 80

L S
Learning algorithms, 4 SCADA, 11, 100, 104
Log sources, 110, 113 Software tool sets, 23–33
Low-code, 11–21, 83 Statistical learning algorithms, 4
Index 161

T Training testing, 1, 13, 60


Target machine, 29–31 Tutorial, v, vii, 15, 21, 81, 135–142
TensorFlow, viii, 1, 9, 15–17, 23, 25, 26, 28,
33, 44, 46–48, 60, 62, 81, 83–85, 87,
89, 95–96, 101–102, 106–108, 124, U
129, 130, 136, 139–142 Ubuntu 22.04 O/S, 85, 101
TensorFlow-AI Platform, 25–26, 83
TensorFlow models, 17, 26, 106, 108, 140–142 V
Testing and artificial neural networks, 1, 5–9, Virtual environment, vii, 23–26, 32, 48, 95
13, 15, 59, 60, 74, 90, 114, 126, 135 Vulnerability assessment (VA), 112–113
Tool set, v, vii, 11–13, 15–17, 23–32, 60–63,
84, 85, 90, 96, 106, 110, 115, 129–131,
141 W
Trained in CNN/NN model, 47, 48, 86, 89, Watson Machine Learning Accelerator, 28, 85,
102, 103, 116, 118–120, 126, 135, 89, 124
139 Work DL, 24–26
Training of DL networks, 15, 16, 83–97 Workflow complexity, 19

You might also like