0% found this document useful (0 votes)

487 views39 pages

Intro To Deep Learning

The document discusses deep learning and GPUs. It provides an agenda that covers what deep learning is, how GPUs are used for deep learning, examples of deep learning in practice, and scaling up deep learning. It then discusses traditional machine learning approaches versus the deep learning approach using neural networks. It provides examples of deep learning applications in areas like speech recognition, computer vision, and autonomous vehicles. It also discusses how GPUs enable faster deep learning training times.

Uploaded by

hiperboreoatlantec

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

487 views39 pages

Intro To Deep Learning

Uploaded by

hiperboreoatlantec

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Deep Learning on GPUs

March 2016
What is Deep Learning?
GPUs and DL
AGENDA DL in practice
Scaling up DL

2
What is Deep Learning?

3
DEEP LEARNING EVERYWHERE

INTERNET & CLOUD MEDICINE & BIOLOGY MEDIA & ENTERTAINMENT SECURITY & DEFENSE AUTONOMOUS MACHINES
Image Classification Cancer Cell Detection Video Captioning Face Detection Pedestrian Detection
Speech Recognition Diabetic Grading Video Search Video Surveillance Lane Tracking
Language Translation Drug Discovery Real Time Translation Satellite Imagery Recognize Traffic Sign
Language Processing
Sentiment Analysis
Recommendation

4
Traditional machine perception
Hand crafted feature extractors
Classifier/
Raw data Feature extraction Result
detector

SVM,
shallow neural net,

HMM,
Speaker ID,
shallow neural net, speech transcription,

Topic classification,
machine translation,
Clustering, HMM,
LDA, LSA sentiment analysis
5

Deep learning approach
Train:
Errors
Dog

MODEL
Dog
Cat
Cat Raccoon

Honey badger

Deploy:

MODEL Dog

6
Artificial neural network
A collection of simple, trainable mathematical units that collectively
learn complex functions
Hidden layers

Input layer Output layer

Given sufficient training data an artificial neural network can approximate very complex
functions mapping raw data to output decisions
7
Artificial neurons

Biological neuron Artificial neuron

w1 w2 w3

x1 x2 x3
From Stanford cs231n lecture notes

y=F(w1x1+w2x2+w3x3)

F(x)=max(0,x)
8
Deep neural network (dnn)
Raw data Low-level features Mid-level features High-level features

Application components:

Task objective
e.g. Identify face
Training data
10-100M images
Network architecture
~10 layers
1B parameters
Input Result Learning algorithm
~30 Exaflops
~30 GPU days

9
Deep learning benefits

Robust
No need to design the features ahead of time features are automatically learned to
be optimal for the task at hand

Robustness to natural variations in the data is automatically learned

Generalizable
The same neural net approach can be used for many different applications and data
types

Scalable
Performance improves with more data, method is massively parallelizable

10
Baidu Deep Speech 2
End-to-end Deep Learning for English and Mandarin Speech Recognition

English and Mandarin speech recognition

Transition from English to Mandarin made simpler by end-to-end DL
No feature engineering or Mandarin-specifics required

More accurate than humans

Error rate 3.7% vs. 4% for human tests

http://svail.github.io/mandarin/

http://arxiv.org/abs/1512.02595

11
AlphaGo
First Computer Program to Beat a Human Go Professional

Training DNNs: 3 weeks, 340 million training steps on 50 GPUs

Play: Asynchronous multi-threaded search
Simulations on CPUs, policy and value DNNs in parallel on GPUs

Single machine: 40 search threads, 48 CPUs, and 8 GPUs

Distributed version: 40 search threads, 1202 CPUs and 176

GPUs

Outcome: Beat both European and World Go champions in

best of 5 matches
http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html
http://deepmind.com/alpha-go.html 12
Deep Learning for Autonomous vehicles

13
Deep Learning Synthesis

Texture synthesis and transfer using CNNs. Timo Aila et al., NVIDIA Research
14
THE AI RACE IS ON
IMAGENET
Accuracy Rate
100%
Traditional CV Deep Learning

90%

80%

70% IBM Watson Achieves Breakthrough Facebook Baidu Deep Speech 2

in Natural Language Processing Launches Big Sur Beats Humans

60%

50%

40%

30%

20%

10% Google Toyota Invests $1B Microsoft & U. Science & Tech, China
Launches TensorFlow in AI Labs Beat Humans on IQ
0%
2009 2010 2011 2012 2013 2014 2015 2016 15
The Big Bang in Machine Learning

DNN BIG DATA GPU

Googles AI engine also reflects how the world of computer hardware is changing.
(It) depends on machines equipped with GPUs And it depends on these chips more
than the larger tech universe realizes.

16
GPUs and DL

USE MORE PROCESSORS TO GO FASTER

17
Deep learning development cycle

18
Three Kinds of Networks

DNN all fully connected layers

CNN some convolutional layers

RNN recurrent neural network, LSTM

19
DNN
Key operation is dense M x V

Backpropagation uses dense matrix-matrix multiply starting from softmax scores 20

DNN
Batching for training and latency insensitive.
MxM

Batched operation is M x M gives re-use of weights.

Without batching, would use each element of Weight matrix once.

Want 10-50 arithmetic operations per memory fetch for modern

compute architectures.

21
CNN
Requires convolution and M x V

Filters conserved through plane

Multiply limited even without batching.

22
Other Operations
To finish building a DNN

These are not limiting factors with appropriate GPU use

Complex networks have hundreds of millions of weights. 23

Lots of Parallelism Available in a DNN

24
13x Faster Training
Caffe

Dual CPU Server

TESLA M40 GPU Server with Reduce Training Time from 13 Days to just 1 Day
Worlds Fastest Accelerator 4x TESLA M40
for Deep Learning Training
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Days

CUDA Cores 3072

Peak SP 7 TFLOPS

GDDR5 Memory 12 GB

Bandwidth 288 GB/s

Power 250W

28 Gflop/W Note: Caffe benchmark with AlexNet,

CPU server uses 2x E5-2680v3 12 Core 2.5GHz CPU, 128GB System Memory, Ubuntu 14.04
25
Comparing CPU and GPU server class
Xeon E5-2698 and Tesla M40

NVIDIA Whitepaper GPU based deep learning inference: A performance and power analysis. 26
DL in practice

27
The Engine of Modern AI
EDUCATION BIG SUR TENSORFLOW WATSON CNTK
TORCH CAFFE

THEANO MATCONVNET

MOCHA.JL PURINE START-UPS

CHAINER DL4J KERAS OPENDEEP

MINERVA MXNET*
SCHULTS
LABORATORIES VITRUVIAN

NVIDIA GPU PLATFORM

* U. Washington, CMU, Stanford, TuSimple, NYU, Microsoft, U. Alberta, MIT, NYU Shanghai 28
CUDA for Deep Learning Development

DEEP LEARNING SDK

DIGITS cuDNN cuSPARSE cuBLAS NCCL

TITAN X DEVBOX GPU CLOUD

29
GPU-accelerated Deep Learning Tiled FFT up to 2x faster than FFT
subroutines 2.5x

High performance neural network 2.0x

training 1.5x

Accelerates Major Deep Learning 1.0x

frameworks: Caffe, Theano, 0.5x
Torch, TensorFlow
0.0x
Deep Learning Primitives Up to 3.5x faster AlexNet training
in Caffe than baseline GPU

Millions of Images Trained Per Day

Accelerating
Artificial Intelligence 100

0
cuDNN 1 cuDNN 2 cuDNN 3 cuDNN 4
developer.nvidia.com/cudnn
30
Caffe Performance

6
M40+cuDNN4

CUDA BOOSTS
M40+cuDNN3

DEEP LEARNING
Performance
3

5X IN 2 YEARS
2
K40+cuDNN1
K40
1

0
11/2013 9/2014 7/2015 12/2015

AlexNet training throughput based on 20 iterations,

CPU: 1x E5-2680v3 12 Core 2.5GHz. 128GB System Memory, Ubuntu 14.04

31
NVIDIA DIGITS
Interactive Deep Learning GPU Training System
Process Data Configure DNN Monitor Progress Visualize Layers
Test Image

developer.nvidia.com/digits
32
ONE ARCHITECTURE END-TO-END AI

PC GAMING

Tesla Titan X DRIVE PX Jetson

for Cloud for PC for Auto for Embedded

33
Scaling DL

34
Scaling Neural Networks
Data Parallelism

W Sync. W
Image 1 Image 2

Machine 1 Machine 2

Notes:
Need to sync model across machines.
Largest models do not fit on one GPU.
Requires P-fold larger batch size.
Works across many nodes parameter server approach linear speedup.

Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Ng and Bryan Catanzaro 35
Multiple GPUs
Near linear scaling data parallel.

Ren Wu et al, Baidu, Deep Image: Scaling up Image Recognition. arXiv 2015 36
Scaling Neural Networks
Model Parallelism

W
Image 1

Machine 1 Machine 2

Notes:
Allows for larger models than fit on one GPU.
Requires much more frequent communication between GPUs.
Most commonly used within a node GPU P2P.
Effective for the fully connected layers.
Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Ng and Bryan Catanzaro 37
Scaling Neural Networks
Hyper Parameter Parallelism
Try many alternative neural networks in parallel on different CPU / GPU / Machines.
Probably the most obvious and effective way!

38
Deep Learning Everywhere

NVIDIA DRIVE PX

NVIDIA Tesla

NVIDIA Jetson

NVIDIA Titan X
Contact: jbarker@nvidia.com
39

Deep Learning
100% (3)
Deep Learning
32 pages
Hardware Architectures For Deep Neural Networks-MIT'16
No ratings yet
Hardware Architectures For Deep Neural Networks-MIT'16
300 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
50 pages
CUDA Image Processing Thesis
No ratings yet
CUDA Image Processing Thesis
66 pages
Deep Learning CNN
No ratings yet
Deep Learning CNN
204 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
No ratings yet
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
24 pages
Tutorial Math Deep Learning 2018 PDF
No ratings yet
Tutorial Math Deep Learning 2018 PDF
103 pages
Automatic Facial Emotion Recognition
No ratings yet
Automatic Facial Emotion Recognition
52 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Learning: Huawei AI Academy Training Materials
No ratings yet
Deep Learning: Huawei AI Academy Training Materials
47 pages
Foundations of Computer Vision
88% (8)
Foundations of Computer Vision
443 pages
TensorFlow Basics
100% (1)
TensorFlow Basics
38 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Concepts Tools and Techniques To Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299 PDF Download
100% (10)
Hands On Machine Learning With Scikit Learn and TensorFlow Concepts Tools and Techniques To Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299 PDF Download
75 pages
Deep Learning With Keras and Tensorflow
No ratings yet
Deep Learning With Keras and Tensorflow
557 pages
Deep Learning Decoding Problems
100% (1)
Deep Learning Decoding Problems
103 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
Computer Vision Methods For Fast Image Classification and Retrieval 2020
100% (5)
Computer Vision Methods For Fast Image Classification and Retrieval 2020
144 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
GANppt
100% (1)
GANppt
34 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Deep Learning in Python - Master Data Science and Machine Learning With Modern Neural Networks Written in Python, Theano, and TensorFlow (PDFDrive)
100% (3)
Deep Learning in Python - Master Data Science and Machine Learning With Modern Neural Networks Written in Python, Theano, and TensorFlow (PDFDrive)
104 pages
Neural Network Using Matlab
63% (30)
Neural Network Using Matlab
548 pages
LSTM for Touchpoint Prediction
100% (1)
LSTM for Touchpoint Prediction
73 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
Generative Adversarial Networks (GANs)
No ratings yet
Generative Adversarial Networks (GANs)
51 pages
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Philip Osborne, Kajal Singh, Matthew E. Taylor - Applying Reinforcement Learning On Real-World Data With Practical Examples in Pyth
No ratings yet
(Synthesis Lectures On Artificial Intelligence and Machine Learning) Philip Osborne, Kajal Singh, Matthew E. Taylor - Applying Reinforcement Learning On Real-World Data With Practical Examples in Pyth
105 pages
CUDA Memory for HPC Students
No ratings yet
CUDA Memory for HPC Students
27 pages
02 - Lecture Note - TensorFlow Ops
No ratings yet
02 - Lecture Note - TensorFlow Ops
21 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
Cuda 9 and Beyond
100% (1)
Cuda 9 and Beyond
45 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
Artificial Neural Network Tutorial
100% (2)
Artificial Neural Network Tutorial
69 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
24 pages
Basics of Deep Learning
100% (1)
Basics of Deep Learning
17 pages
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
0% (1)
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
4 pages
Deep Learning Basics & Applications
No ratings yet
Deep Learning Basics & Applications
6 pages
Deep Learning Resources Guide
No ratings yet
Deep Learning Resources Guide
5 pages
Computer Vision55
100% (1)
Computer Vision55
268 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Deep Learning Step by Step
No ratings yet
Deep Learning Step by Step
171 pages
3 - ANN Part One PDF
No ratings yet
3 - ANN Part One PDF
30 pages
(Addison-Wesley Data & Analytics Series) Krohn, J. - Beyleveld, G. - Bassens, A. - Deep Learning Illustrated - A Visual, Interactive Guide To Artificial Intelligence-Pearson Education (2019)
100% (4)
(Addison-Wesley Data & Analytics Series) Krohn, J. - Beyleveld, G. - Bassens, A. - Deep Learning Illustrated - A Visual, Interactive Guide To Artificial Intelligence-Pearson Education (2019)
192 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Artificial Neural Networks Overview
100% (1)
Artificial Neural Networks Overview
40 pages
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Le
100% (3)
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Le
178 pages
Vision-Language Models Intro Guide
No ratings yet
Vision-Language Models Intro Guide
76 pages
Machine Learning Is Fun 1565131730
No ratings yet
Machine Learning Is Fun 1565131730
48 pages
Intro To Deep Learning
100% (1)
Intro To Deep Learning
35 pages
Deep Learning For Image Classification: GEOINT Training
No ratings yet
Deep Learning For Image Classification: GEOINT Training
75 pages
Deep Learning Frameworks & Techniques
No ratings yet
Deep Learning Frameworks & Techniques
5 pages
Deep Learning 1737909076
No ratings yet
Deep Learning 1737909076
29 pages
High-Performance Hardware For Machine Learning - 0916
No ratings yet
High-Performance Hardware For Machine Learning - 0916
68 pages
Deep Learning Cookbook
No ratings yet
Deep Learning Cookbook
24 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
The First Artificial Neuron
No ratings yet
The First Artificial Neuron
2 pages
Sdaccel Development Environment: Release Notes, Installat On, and Licensing Guide
No ratings yet
Sdaccel Development Environment: Release Notes, Installat On, and Licensing Guide
28 pages
Global Lte Iot Starter Kit
No ratings yet
Global Lte Iot Starter Kit
2 pages
Deep Learning Xilinx
No ratings yet
Deep Learning Xilinx
11 pages
ARM-CEVA LTE-Advanced White Paper Final PDF
No ratings yet
ARM-CEVA LTE-Advanced White Paper Final PDF
13 pages
Sdaccel Development Environment: Release Notes, Installat On, and Licensing Guide
No ratings yet
Sdaccel Development Environment: Release Notes, Installat On, and Licensing Guide
28 pages
Wp493 Iiot Edge Platforms
No ratings yet
Wp493 Iiot Edge Platforms
10 pages
Installation Guideline: Guideline For Planning, Assembling and Commissioning of Ethercat Networks
No ratings yet
Installation Guideline: Guideline For Planning, Assembling and Commissioning of Ethercat Networks
70 pages
Wp502 Python
No ratings yet
Wp502 Python
12 pages
Sdsoc Environment Tutorial: Ug1028 (V2017.1) June 20, 2017
No ratings yet
Sdsoc Environment Tutorial: Ug1028 (V2017.1) June 20, 2017
64 pages
ZYBo Pines
No ratings yet
ZYBo Pines
123 pages
Ethercat Ipcore Xilinx v2 04e Datasheet V1i0
No ratings yet
Ethercat Ipcore Xilinx v2 04e Datasheet V1i0
126 pages
Esquematico de Zybo
100% (2)
Esquematico de Zybo
14 pages
FPGA GigE
No ratings yet
FPGA GigE
64 pages
Xilinx TimingClosure
No ratings yet
Xilinx TimingClosure
31 pages
Mipsology Aws f1
No ratings yet
Mipsology Aws f1
10 pages
05 Timing Clocking p6
No ratings yet
05 Timing Clocking p6
11 pages
NI Tutorial 3782 en
No ratings yet
NI Tutorial 3782 en
4 pages
Arithmetic Background
No ratings yet
Arithmetic Background
25 pages
Chapter 2
No ratings yet
Chapter 2
25 pages
1.2. 1000 Gauge Polythene Sheet
No ratings yet
1.2. 1000 Gauge Polythene Sheet
1 page
06150-Wood Decking
No ratings yet
06150-Wood Decking
5 pages
Take A Blablacar To The Stars With RKT !: Simon Lallemand
No ratings yet
Take A Blablacar To The Stars With RKT !: Simon Lallemand
44 pages
Objective: Professionals, Project Cycle, Construction Procedure
No ratings yet
Objective: Professionals, Project Cycle, Construction Procedure
16 pages
Steam Trap - Drawing
100% (1)
Steam Trap - Drawing
17 pages
Satip - S-050-01
No ratings yet
Satip - S-050-01
4 pages
Method Statement For Installation
100% (1)
Method Statement For Installation
5 pages
English Glass Chandeliers History
No ratings yet
English Glass Chandeliers History
17 pages
Creating & Texturing A Football - Soccer Ball Using 3DSMax PDF
No ratings yet
Creating & Texturing A Football - Soccer Ball Using 3DSMax PDF
11 pages
Pavements: Types of Pavement
No ratings yet
Pavements: Types of Pavement
8 pages
Tnd002279000000000000mah34102 PDF
No ratings yet
Tnd002279000000000000mah34102 PDF
337 pages
Comprehensive LMS Features Guide
No ratings yet
Comprehensive LMS Features Guide
32 pages
Site 5 Tamil Quatation
No ratings yet
Site 5 Tamil Quatation
4 pages
2 - Types of Stone Masonry
No ratings yet
2 - Types of Stone Masonry
25 pages
Group 4 - Pre Stressed Concrete Using Load Balancing Method
100% (2)
Group 4 - Pre Stressed Concrete Using Load Balancing Method
21 pages
Riverhead News-Review Service Directory: June 16, 2016
No ratings yet
Riverhead News-Review Service Directory: June 16, 2016
7 pages
Tank Design Class Notes 009
No ratings yet
Tank Design Class Notes 009
68 pages
EGX300
No ratings yet
EGX300
2 pages
Da Vinci
100% (1)
Da Vinci
55 pages
Ericsson Tools - Lte
No ratings yet
Ericsson Tools - Lte
4 pages
SF - Ct!O V '-: Gilp (1 - 1-8) From
No ratings yet
SF - Ct!O V '-: Gilp (1 - 1-8) From
10 pages
Residential Apartment Project in Bangalore - DNR Arista - DNR Group
No ratings yet
Residential Apartment Project in Bangalore - DNR Arista - DNR Group
31 pages
ITU-T M.3200: TMN Management Overview
100% (1)
ITU-T M.3200: TMN Management Overview
29 pages
Manycon Profile 2025 - Updated Qatar
No ratings yet
Manycon Profile 2025 - Updated Qatar
95 pages
Perforated Plate PDF
No ratings yet
Perforated Plate PDF
12 pages
Orig UAP Docs 200-208
No ratings yet
Orig UAP Docs 200-208
84 pages
FTTH) Construction Guidelines - 333
100% (3)
FTTH) Construction Guidelines - 333
32 pages
Dr. Madi Hermadi's Highway Materials Research
No ratings yet
Dr. Madi Hermadi's Highway Materials Research
5 pages
Aryavart Creations Aryavart Creations
No ratings yet
Aryavart Creations Aryavart Creations
1 page

Intro To Deep Learning

Uploaded by

Intro To Deep Learning

Uploaded by

Deep Learning on GPUs

Input layer Output layer

Biological neuron Artificial neuron

Robustness to natural variations in the data is automatically learned

English and Mandarin speech recognition

More accurate than humans

Training DNNs: 3 weeks, 340 million training steps on 50 GPUs

Single machine: 40 search threads, 48 CPUs, and 8 GPUs

Distributed version: 40 search threads, 1202 CPUs and 176

Outcome: Beat both European and World Go champions in

70% IBM Watson Achieves Breakthrough Facebook Baidu Deep Speech 2

DNN BIG DATA GPU

USE MORE PROCESSORS TO GO FASTER

DNN all fully connected layers

CNN some convolutional layers

RNN recurrent neural network, LSTM

Backpropagation uses dense matrix-matrix multiply starting from softmax scores 20

Batched operation is M x M gives re-use of weights.

Without batching, would use each element of Weight matrix once.

Want 10-50 arithmetic operations per memory fetch for modern

Filters conserved through plane

Multiply limited even without batching.

These are not limiting factors with appropriate GPU use

Complex networks have hundreds of millions of weights. 23

Dual CPU Server

CUDA Cores 3072

Bandwidth 288 GB/s

28 Gflop/W Note: Caffe benchmark with AlexNet,

MOCHA.JL PURINE START-UPS

NVIDIA GPU PLATFORM

DEEP LEARNING SDK

DIGITS cuDNN cuSPARSE cuBLAS NCCL

TITAN X DEVBOX GPU CLOUD

High performance neural network 2.0x

Accelerates Major Deep Learning 1.0x

Millions of Images Trained Per Day

AlexNet training throughput based on 20 iterations,

Tesla Titan X DRIVE PX Jetson

You might also like