0% found this document useful (0 votes)

255 views58 pages

Deep Learning: Yann Lecun

The document discusses deep learning and how it involves learning hierarchical representations through trainable feature transformations. Deep learning aims to learn representations at multiple levels of abstraction, from low-level to high-level features, in an end-to-end manner. It is inspired by the hierarchical structure of the mammalian visual cortex but aims to learn appropriate representations through training rather than strictly imitating biology.

Uploaded by

Anonymous t4uG4pFd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

255 views58 pages

Deep Learning: Yann Lecun

Uploaded by

Anonymous t4uG4pFd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Y LeCun

MA Ranzato

Deep Learning

Yann LeCun
Center for Data Science & Courant Institute, NYU
& Facebook AI Research
yann@cs.nyu.edu
http://yann.lecun.com
Deep Learning = Learning Representations/Features Y LeCun
MA Ranzato

The traditional model of pattern recognition (since the late 50's)

Fixed/engineered features (or fixed kernel) + trainable
classifier

hand-crafted Simple Trainable

Feature Extractor Classifier

End-to-end learning / Feature learning / Deep learning

Trainable features (or kernel) + trainable classifier

Trainable Trainable
Feature Extractor Classifier
This Basic Model has not evolved much since the 50's Y LeCun
MA Ranzato

The first learning machine: the Perceptron

Feature Extractor
Built at Cornell in 1960
The Perceptron was a linear classifier on
top of a simple feature extractor
The vast majority of practical applications
of ML today use glorified linear classifiers
A Wi

or glorified template matching. N

Designing a feature extractor requires
considerable efforts by experts.
y=sign
( W i F i ( X ) +b
i= 1
)
Y LeCun
MA Ranzato

Linear Machines
And their limitations
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Yann LeCun
Architecture of MainstreamPattern Recognition Systems Y LeCun
MA Ranzato

Modern architecture for pattern recognition

Speech recognition: early 90's 2011

MFCC Mix of Gaussians Classifier

fixed unsupervised supervised

Object Recognition: 2006 - 2012

SIFT K-means
Pooling Classifier
HoG Sparse Coding
unsupervised supervised
fixed

Low-level Mid-level
Features Features
Deep Learning = Learning Hierarchical Representations Y LeCun
MA Ranzato

It's deep if it has more than one stage of non-linear feature transformation

Low-Level Mid-Level High-Level Trainable

Feature Feature Feature Classifier

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
Trainable Feature Hierarchy Y LeCun
MA Ranzato

Hierarchy of representations with increasing level of abstraction

Each stage is a kind of trainable feature transform
Image recognition
Pixel edge texton motif part object
Text
Character word word group clause sentence story
Speech
Sample spectral band sound phone phoneme word
Learning Representations: a challenge for
Y LeCun
ML, CV, AI, Neuroscience, Cognitive Science... MA Ranzato

How do we learn representations of the perceptual

world?
How can a perceptual system build itself by
looking at the world? Trainable Feature
How much prior structure is necessary Transform

ML/AI: how do we learn features or feature hierarchies?

What is the fundamental principle? What is Trainable Feature
the learning algorithm? What is the Transform
architecture?
Neuroscience: how does the cortex learn perception? Trainable Feature
Does the cortex run a single, general
Transform
learning algorithm? (or a small number of
them)
CogSci: how does the mind learn abstract concepts on Trainable Feature
top of less abstract ones? Transform

Deep Learning addresses the problem of learning

hierarchical representations with a single algorithm
The Mammalian Visual Cortex is Hierarchical Y LeCun
MA Ranzato

The ventral (recognition) pathway in the visual cortex has multiple stages
Retina - LGN - V1 - V2 - V4 - PIT - AIT ....
Lots of intermediate representations

[picture from Simon Thorpe]

[Gallant & Van Essen]
Let's be inspired by nature, but not too much Y LeCun
MA Ranzato

It's nice imitate Nature,

But we also need to understand
How do we know which
details are important?
Which details are merely the
result of evolution, and the
constraints of biochemistry?
For airplanes, we developed
aerodynamics and compressible
fluid dynamics.
We figured that feathers and
wing flapping weren't crucial
L'Avion III de Clment Ader, 1897
QUESTION: What is the
(Muse du CNAM, Paris)
equivalent of aerodynamics for
understanding intelligence? His Eole took off from the ground in 1890,
13 years before the Wright Brothers, but you
probably never heard of it.
Trainable Feature Hierarchies: End-to-end learning Y LeCun
MA Ranzato

A hierarchy of trainable feature transforms

Each module transforms its input representation into a higher-level
one.
High-level features are more global and more invariant
Low-level features are shared among categories

Trainable Trainable Trainable

Feature Feature Classifier/
Transform Transform Predictor

Learned Internal Representations

How can we make all the modules trainable and get them to learn
appropriate representations?
Three Types of Deep Architectures Y LeCun
MA Ranzato

Feed-Forward: multilayer neural nets, convolutional nets

Feed-Back: Stacked Sparse Coding, Deconvolutional Nets

Bi-Drectional: Deep Boltzmann Machines, Stacked Auto-Encoders

Three Types of Training Protocols Y LeCun
MA Ranzato

Purely Supervised
Initialize parameters randomly
Train in supervised mode
typically with SGD, using backprop to compute gradients
Used in most practical systems for speech and image
recognition
Unsupervised, layerwise + supervised classifier on top
Train each layer unsupervised, one after the other
Train a supervised classifier on top, keeping the other layers
fixed
Good when very few labeled samples are available
Unsupervised, layerwise + global supervised fine-tuning
Train each layer unsupervised, one after the other
Add a classifier layer, and retrain the whole thing supervised
Good when label set is poor (e.g. pedestrian detection)
Unsupervised pre-training often uses regularized auto-encoders
Do we really need deep architectures? Y LeCun
MA Ranzato

Theoretician's dilemma: We can approximate any function as close as we

want with shallow architecture. Why would we need deep ones?

kernel machines (and 2-layer neural nets) are universal.

Deep learning machines

Deep machines are more efficient for representing certain classes of

functions, particularly those involved in visual recognition
they can represent more complex functions with less hardware
We need an efficient parameterization of the class of functions that are
useful for AI tasks (vision, audition, NLP...)
Why would deep architectures be more efficient?
Y LeCun
[Bengio & LeCun 2007 Scaling Learning Algorithms Towards AI] MA Ranzato

A deep architecture trades space for time (or breadth for depth)
more layers (more sequential computation),
but less hardware (less parallel computation).
Example1: N-bit parity
requires N-1 XOR gates in a tree of depth log(N).
Even easier if we use threshold gates
requires an exponential number of gates of we restrict ourselves
to 2 layers (DNF formula with exponential number of minterms).
Example2: circuit for addition of 2 N-bit binary numbers
Requires O(N) gates, and O(N) layers using N one-bit adders with
ripple carry propagation.
Requires lots of gates (some polynomial in N) if we restrict
ourselves to two layers (e.g. Disjunctive Normal Form).
Bad news: almost all boolean functions have a DNF formula with
an exponential number of minterms O(2^N).....
Which Models are Deep? Y LeCun
MA Ranzato

2-layer models are not deep (even if

you train the first layer)
Because there is no feature
hierarchy
Neural nets with 1 hidden layer are not
deep
SVMs and Kernel methods are not deep
Layer1: kernels; layer2: linear
The first layer is trained in
with the simplest unsupervised
method ever devised: using
the samples as templates for
the kernel functions.
Classification trees are not deep
No hierarchy of features. All
decisions are made in the input
space
Are Graphical Models Deep? Y LeCun
MA Ranzato

There is no opposition between graphical models and deep learning.

Many deep learning models are formulated as factor graphs
Some graphical models use deep architectures inside their factors
Graphical models can be deep (but most are not).
Factor Graph: sum of energy functions
Over inputs X, outputs Y and latent variables Z. Trainable parameters: W

log P ( X ,Y , Z /W ) E ( X , Y , Z , W )=i E i ( X ,Y , Z ,W i )
E1(X1,Y1) E3(Z2,Y1) E4(Y3,Y4)
E2(X2,Z1,Z2)
X1 Z2 Y1 Z3 Y2
Z1 X2

Each energy function can contain a deep network

The whole factor graph can be seen as a deep network
Deep Learning: A Theoretician's Nightmare? Y LeCun
MA Ranzato

Deep Learning involves non-convex loss functions

With non-convex losses, all bets are off
Then again, every speech recognition system ever deployed
has used non-convex optimization (GMMs are non convex).

But to some of us all interesting learning is non convex

Convex learning is invariant to the order in which sample are
presented (only depends on asymptotic sample frequencies).
Human learning isn't like that: we learn simple concepts
before complex ones. The order in which we learn things
matter.
Deep Learning: A Theoretician's Nightmare? Y LeCun
MA Ranzato

No generalization bounds?
Actually, the usual VC bounds apply: most deep learning
systems have a finite VC dimension
We don't have tighter bounds than that.
But then again, how many bounds are tight enough to be
useful for model selection?

It's hard to prove anything about deep learning systems

Then again, if we only study models for which we can prove
things, we wouldn't have speech, handwriting, and visual
object recognition systems today.
Deep Learning: A Theoretician's Paradise? Y LeCun
MA Ranzato

Deep Learning is about representing high-dimensional data

There has to be interesting theoretical questions there
What is the geometry of natural signals?
Is there an equivalent of statistical learning theory for
unsupervised learning?
What are good criteria on which to base unsupervised
learning?
Deep Learning Systems are a form of latent variable factor graph
Internal representations can be viewed as latent variables to
be inferred, and deep belief networks are a particular type of
latent variable models.
The most interesting deep belief nets have intractable loss
functions: how do we get around that problem?
Lots of theory at the 2012 IPAM summer school on deep learning
Wright's parallel SGD methods, Mallat's scattering transform,
Osher's split Bregman methods for sparse modeling,
Morton's algebraic geometry of DBN,....
Deep Learning and Feature Learning Today Y LeCun
MA Ranzato

Deep Learning has been the hottest topic in speech recognition in the last 2 years
A few long-standing performance records were broken with deep
learning methods
Microsoft and Google have both deployed DL-based speech
recognition system in their products
Microsoft, Google, IBM, Nuance, AT&T, and all the major academic
and industrial players in speech recognition have projects on deep
learning
Deep Learning is the hottest topic in Computer Vision
Feature engineering is the bread-and-butter of a large portion of
the CV community, which creates some resistance to feature
learning
But the record holders on ImageNet and Semantic Segmentation
are convolutional nets
Deep Learning is becoming hot in Natural Language Processing
Deep Learning/Feature Learning in Applied Mathematics
The connection with Applied Math is through sparse coding,
non-convex optimization, stochastic gradient algorithms, etc...
In Many Fields, Feature Learning Has Caused a Revolution
Y LeCun
(methods used in commercially deployed systems) MA Ranzato

Speech Recognition I (late 1980s)

Trained mid-level features with Gaussian mixtures (2-layer classifier)
Handwriting Recognition and OCR (late 1980s to mid 1990s)
Supervised convolutional nets operating on pixels
Face & People Detection (early 1990s to mid 2000s)
Supervised convolutional nets operating on pixels (YLC 1994, 2004,
Garcia 2004)
Haar features generation/selection (Viola-Jones 2001)
Object Recognition I (mid-to-late 2000s: Ponce, Schmid, Yu, YLC....)
Trainable mid-level features (K-means or sparse coding)
Low-Res Object Recognition: road signs, house numbers (early 2010's)
Supervised convolutional net operating on pixels
Speech Recognition II (circa 2011)
Deep neural nets for acoustic modeling
Object Recognition III, Semantic Labeling (2012, Hinton, YLC,...)
Supervised convolutional nets operating on pixels
SHALLOW DEEP
Y LeCun
MA Ranzato

Boosting Neural Net

RNN
Perceptron AE D-AE
Conv. Net

SVM RBM DBN DBM

Sparse
GMM Coding BayesNP

DecisionTree
SHALLOW DEEP
Y LeCun
MA Ranzato

Boosting Neural Networks Neural Net

RNN
Perceptron AE D-AE
Conv. Net

SVM RBM DBN DBM

Sparse
GMM Coding BayesNP

Probabilistic Models

DecisionTree
SHALLOW DEEP
Y LeCun
MA Ranzato

Boosting Neural Networks Neural Net

RNN
Perceptron AE D-AE
Conv. Net

SVM RBM DBN DBM

Sparse
GMM Coding BayesNP

Probabilistic Models

DecisionTree
Unsupervised
Supervised Supervised
SHALLOW DEEP
Y LeCun
MA Ranzato

Boosting Neural Net

RNN
Perceptron AE D-AE

Conv. Net
SVM RBM DBN DBM

Sparse
GMM Coding BayesNP

In this talk, we'll focus on the

DecisionTree
simplest and typically most
effective methods.
Y LeCun
MA Ranzato

What Are
Good Feature?
Discovering the Hidden Structure in High-Dimensional Data
Y LeCun
The manifold hypothesis
MA Ranzato

Learning Representations of Data:

Discovering & disentangling the independent
explanatory factors
The Manifold Hypothesis:
Natural data lives in a low-dimensional (non-linear) manifold
Because variables in natural data are mutually dependent
Discovering the Hidden Structure in High-Dimensional Data Y LeCun
MA Ranzato

Example: all face images of a person

1000x1000 pixels = 1,000,000 dimensions
But the face has 3 cartesian coordinates and 3 Euler angles
And humans have less than about 50 muscles in the face
Hence the manifold of face images for a person has <56 dimensions
The perfect representations of a face image:
Its coordinates on the face manifold
Its coordinates away from the manifold
We do not have good and general methods to learn functions that turns an
image into this kind of representation

Face/not face

[ ]
Ideal 1.2
3 Pose
Feature Lighting
0.2
Extractor 2 .. . Expression
-----
Disentangling factors of variation Y LeCun
MA Ranzato

The Ideal Disentangling Feature Extractor

View
Pixel n

Ideal
Feature
Extractor
Pixel 2

Expression
Pixel 1
Data Manifold & Invariance:
Some variations must be eliminated Y LeCun
MA Ranzato

Azimuth-Elevation manifold. Ignores lighting. [Hadsell et al. CVPR 2006]

Basic Idea for Invariant Feature Learning Y LeCun
MA Ranzato

Embed the input non-linearly into a high(er) dimensional space

In the new space, things that were non separable may become
separable
Pool regions of the new space together
Bringing together things that are semantically similar. Like
pooling.

Pooling
Non-Linear
Or
Function
Aggregation

Input
Stable/invariant
high-dim
features
Unstable/non-smooth
features
Non-Linear Expansion Pooling Y LeCun
MA Ranzato

Entangled data manifolds

Non-Linear Dim
Pooling.
Expansion,
Aggregation
Disentangling
Sparse Non-Linear Expansion Pooling Y LeCun
MA Ranzato

Use clustering to break things apart, pool together similar things

Clustering,
Pooling.
Quantization,
Aggregation
Sparse Coding
Overall Architecture:
Y LeCun
Normalization Filter Bank Non-Linearity Pooling MA Ranzato

Filter Non- feature Filter Non- feature

Norm Norm Classifier
Bank Linear Pooling Bank Linear Pooling

Stacking multiple stages of

[Normalization Filter Bank Non-Linearity Pooling].
Normalization: variations on whitening
Subtractive: average removal, high pass filtering
Divisive: local contrast normalization, variance normalization
Filter Bank: dimension expansion, projection on overcomplete basis
Non-Linearity: sparsification, saturation, lateral inhibition....
Rectification (ReLU), Component-wise shrinkage, tanh,
winner-takes-all
Pooling: aggregation over space or feature type
p 1 bX i
X i; L p: X
p
i ; PROB : log
b ( e )
i
SOFTWARE Y LeCun
MA Ranzato

Torch7: learning library that supports neural net training

http://www.torch.ch
http://code.cogbits.com/wiki/doku.php (tutorial with demos by C. Farabet)
- http://eblearn.sf.net (C++ Library with convnet support by P. Sermanet)

Python-based learning library (U. Montreal)

- http://deeplearning.net/software/theano/ (does automatic differentiation)
RNN
www.fit.vutbr.cz/~imikolov/rnnlm (language modeling)
http://sourceforge.net/apps/mediawiki/rnnl/index.php (LSTM)
CUDAMat & GNumpy
code.google.com/p/cudamat
www.cs.toronto.edu/~tijmen/gnumpy.html
Misc
www.deeplearning.net//software_links
REFERENCES Y LeCun
MA Ranzato

Convolutional Nets
LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to Document
Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998

- Krizhevsky, Sutskever, Hinton ImageNet Classification with deep convolutional neural

networks NIPS 2012

Jarrett, Kavukcuoglu, Ranzato, LeCun: What is the Best Multi-Stage Architecture for
Object Recognition?, Proc. International Conference on Computer Vision (ICCV'09),
IEEE, 2009

- Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional

Feature Hierachies for Visual Recognition, Advances in Neural Information Processing
Systems (NIPS 2010), 23, 2010

see yann.lecun.com/exdb/publis for references on many different kinds of convnets.

see http://www.cmap.polytechnique.fr/scattering/ for scattering networks (similar to

convnets but with less learning and stronger mathematical foundations)
REFERENCES Y LeCun
MA Ranzato

Applications of Convolutional Nets

Farabet, Couprie, Najman, LeCun, Scene Parsing with Multiscale Feature Learning,
Purity Trees, and Optimal Covers, ICML 2012

Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala and Yann LeCun: Pedestrian
Detection with Unsupervised Multi-Stage Feature Learning, CVPR 2013

- D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber. Deep Neural Networks

Segment Neuronal Membranes in Electron Microscopy Images. NIPS 2012

- Raia Hadsell, Pierre Sermanet, Marco Scoffier, Ayse Erkan, Koray Kavackuoglu, Urs
Muller and Yann LeCun: Learning Long-Range Vision for Autonomous Off-Road Driving,
Journal of Field Robotics, 26(2):120-144, February 2009

Burger, Schuler, Harmeling: Image Denoisng: Can Plain Neural Networks Compete
with BM3D?, Computer Vision and Pattern Recognition, CVPR 2012,
REFERENCES Y LeCun
MA Ranzato

Applications of RNNs
Mikolov Statistical language models based on neural networks PhD thesis 2012
Boden A guide to RNNs and backpropagation Tech Report 2002
Hochreiter, Schmidhuber Long short term memory Neural Computation 1997
Graves Offline arabic handwrting recognition with multidimensional neural networks
Springer 2012
Graves Speech recognition with deep recurrent neural networks ICASSP 2013
REFERENCES Y LeCun
MA Ranzato

Deep Learning & Energy-Based Models

Y. Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine
Learning, 2(1), pp.1-127, 2009.

LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in

Bakir, G. and Hofman, T. and Schlkopf, B. and Smola, A. and Taskar, B. (Eds),
Predicting Structured Data, MIT Press, 2006

M. Ranzato Ph.D. Thesis Unsupervised Learning of Feature Hierarchies NYU 2009

Practical guide
Y. LeCun et al. Efficient BackProp, Neural Networks: Tricks of the Trade, 1998

L. Bottou, Stochastic gradient descent tricks, Neural Networks, Tricks of the Trade
Reloaded, LNCS 2012.

Y. Bengio, Practical recommendations for gradient-based training of deep

architectures, ArXiv 2012

Le Cun Support
No ratings yet
Le Cun Support
77 pages
Deep Learning: Yann Le Cun The Courant Institute of Mathematical Sciences New York University
No ratings yet
Deep Learning: Yann Le Cun The Courant Institute of Mathematical Sciences New York University
69 pages
Lecun 20201027 Att
No ratings yet
Lecun 20201027 Att
72 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
YY-Deep Learning PDF
No ratings yet
YY-Deep Learning PDF
46 pages
001 Intro
No ratings yet
001 Intro
66 pages
Introduction To Deep Learning: by Gargee Sanyal
No ratings yet
Introduction To Deep Learning: by Gargee Sanyal
20 pages
Module-1 DL
No ratings yet
Module-1 DL
53 pages
Deep Learning: A Visual Guide
No ratings yet
Deep Learning: A Visual Guide
53 pages
Lec 1 Intro
No ratings yet
Lec 1 Intro
54 pages
Deep Learning
No ratings yet
Deep Learning
41 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
The Evolution of Deep Learning
No ratings yet
The Evolution of Deep Learning
53 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
Deep Learning Insights
No ratings yet
Deep Learning Insights
74 pages
Deep Learning For NLP
No ratings yet
Deep Learning For NLP
78 pages
Lecture1 ANN - Full
No ratings yet
Lecture1 ANN - Full
66 pages
UNIT I Part 1 Notes
No ratings yet
UNIT I Part 1 Notes
28 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
Deep Learning Course Overview
No ratings yet
Deep Learning Course Overview
30 pages
Deep Learning in A Nutshell - Core Concepts
No ratings yet
Deep Learning in A Nutshell - Core Concepts
11 pages
Lec 1 - Deep - Learning - Introduction
No ratings yet
Lec 1 - Deep - Learning - Introduction
34 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Unit 3
No ratings yet
Unit 3
16 pages
Deep Learning Note 21cs743
No ratings yet
Deep Learning Note 21cs743
96 pages
Deep Learning Review and Discussion of Its Future
No ratings yet
Deep Learning Review and Discussion of Its Future
7 pages
AI and ML Workshop PPTX - 250131 - 193538
No ratings yet
AI and ML Workshop PPTX - 250131 - 193538
44 pages
Deep Learning-1
No ratings yet
Deep Learning-1
20 pages
Introduction To Deep Learning: Technical Seminar by Md. Abul Fazl (14261A05A0) CSE Dept
No ratings yet
Introduction To Deep Learning: Technical Seminar by Md. Abul Fazl (14261A05A0) CSE Dept
21 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Dl-Unit 1
No ratings yet
Dl-Unit 1
12 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
14 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
Deep Learning Most Important Ideas PDF
No ratings yet
Deep Learning Most Important Ideas PDF
16 pages
Deep Learning
100% (3)
Deep Learning
32 pages
On The Origin of Deep Learning: Haohan Wang Bhiksha Raj
No ratings yet
On The Origin of Deep Learning: Haohan Wang Bhiksha Raj
72 pages
Deep Learning Module-01 Search Creators
No ratings yet
Deep Learning Module-01 Search Creators
17 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
TensorFlow Regression
No ratings yet
TensorFlow Regression
445 pages
NN DL Unit - III
No ratings yet
NN DL Unit - III
19 pages
Deep Neural Networks Explained
No ratings yet
Deep Neural Networks Explained
12 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
(Fall 2024) Deep Learning 3
No ratings yet
(Fall 2024) Deep Learning 3
54 pages
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
No ratings yet
Efficient Deep Learning (First Early Release) (Gaurav Menghani Naresh Singh) (Z-Library)
69 pages
Deep Learning Hardware Evolution
No ratings yet
Deep Learning Hardware Evolution
8 pages
AI 101: Comprehensive Guide to Deep Learning
No ratings yet
AI 101: Comprehensive Guide to Deep Learning
13 pages
Deep Learning With Tensorflow
100% (1)
Deep Learning With Tensorflow
70 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
Lecture 01
No ratings yet
Lecture 01
45 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
Deep Learning's Diminishing Returns
No ratings yet
Deep Learning's Diminishing Returns
9 pages
Unit-Ii DLL
No ratings yet
Unit-Ii DLL
19 pages
Chapter 1 - Vision AI
No ratings yet
Chapter 1 - Vision AI
40 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (17)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
RAG Architecture
100% (10)
RAG Architecture
52 pages
100 Generative AI Use Cases Examples For Industries
100% (10)
100 Generative AI Use Cases Examples For Industries
63 pages
Learn Kubernetes 5 Minutes at A Time
No ratings yet
Learn Kubernetes 5 Minutes at A Time
187 pages
Kubernetes
100% (3)
Kubernetes
139 pages
EY Generative AI Use Cases Repository
100% (4)
EY Generative AI Use Cases Repository
267 pages
Azure Solution Architect Guide
100% (2)
Azure Solution Architect Guide
1 page
Terraform From Bigginer To Master
100% (4)
Terraform From Bigginer To Master
90 pages
300 LangChain Projects
100% (2)
300 LangChain Projects
17 pages
Terraform Practice Guide
100% (14)
Terraform Practice Guide
109 pages
Terraform+Notes+PPT+ +KPLABS
No ratings yet
Terraform+Notes+PPT+ +KPLABS
410 pages
Getting GitOps
50% (2)
Getting GitOps
94 pages
Kubernetes Basic To Advance End To End
100% (7)
Kubernetes Basic To Advance End To End
295 pages
OpenShift 4 Technical Deep Dive
100% (5)
OpenShift 4 Technical Deep Dive
129 pages
Agentic AI Playbook v1.1
100% (7)
Agentic AI Playbook v1.1
19 pages
Mastering AI Agents
100% (11)
Mastering AI Agents
93 pages
Kubernetes Practicals Ebook
75% (4)
Kubernetes Practicals Ebook
187 pages
Ebook KUBERNETES Essentials PDF
100% (5)
Ebook KUBERNETES Essentials PDF
160 pages
Generative AI On AWS
100% (11)
Generative AI On AWS
208 pages
Diving Deep Into Kubernetes Networking
100% (3)
Diving Deep Into Kubernetes Networking
42 pages
Generative AI for Business Leaders
100% (19)
Generative AI for Business Leaders
80 pages
Terraform & IaC Guide for Beginners
80% (5)
Terraform & IaC Guide for Beginners
34 pages
AI Agents by Google
100% (11)
AI Agents by Google
42 pages
AZ 104 Microsoft Azure Administrator
100% (10)
AZ 104 Microsoft Azure Administrator
431 pages
Docker Made Easy PDF
100% (2)
Docker Made Easy PDF
110 pages
Top Agentic AI Architecture Design Patterns
100% (6)
Top Agentic AI Architecture Design Patterns
8 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
PWC - Agentic AI
100% (11)
PWC - Agentic AI
22 pages
AWS Course - All Slides
80% (10)
AWS Course - All Slides
879 pages
Terraform Hands On Labs Complete PDF
100% (3)
Terraform Hands On Labs Complete PDF
421 pages
Brownian Motion for Mathematicians
No ratings yet
Brownian Motion for Mathematicians
15 pages
Stickasd
No ratings yet
Stickasd
1 page
MA1102RSoln1 (Partial)
No ratings yet
MA1102RSoln1 (Partial)
3 pages
Linear Algebra I Course Outline
No ratings yet
Linear Algebra I Course Outline
3 pages
Calculus Cheat Sheet Limits
No ratings yet
Calculus Cheat Sheet Limits
2 pages
2011 NJC Prelim H2 Physics Paper 3 .QP Final
No ratings yet
2011 NJC Prelim H2 Physics Paper 3 .QP Final
21 pages
Chap6 Input Output
No ratings yet
Chap6 Input Output
34 pages
PWM Solar Battery Charger Circuit
No ratings yet
PWM Solar Battery Charger Circuit
1 page
Sciography for Architecture Students
No ratings yet
Sciography for Architecture Students
2 pages
Manoj V - 3.8years
No ratings yet
Manoj V - 3.8years
3 pages
15 Working Principle of NAT
No ratings yet
15 Working Principle of NAT
34 pages
Microprocessor Notes
No ratings yet
Microprocessor Notes
13 pages
ARPANET: Pioneering Internet Tech
No ratings yet
ARPANET: Pioneering Internet Tech
4 pages
Lesson 1
No ratings yet
Lesson 1
4 pages
PAN Application Acknowledgment Receipt (For Changes or Correction in PAN Data) (Physical Application)
No ratings yet
PAN Application Acknowledgment Receipt (For Changes or Correction in PAN Data) (Physical Application)
1 page
RBI Grade B Officer Exam Solved Question Papers - Question Papers
0% (2)
RBI Grade B Officer Exam Solved Question Papers - Question Papers
9 pages
Data Analytics Using Python
100% (6)
Data Analytics Using Python
982 pages
Engineering Design & The Design Process
No ratings yet
Engineering Design & The Design Process
16 pages
Sony Hcd-Slk1i Slk2i Ver1.0 PDF
No ratings yet
Sony Hcd-Slk1i Slk2i Ver1.0 PDF
88 pages
High Efficiency 200W Power Supply For Leds Lighting Applications
No ratings yet
High Efficiency 200W Power Supply For Leds Lighting Applications
21 pages
Java Set-1 Answers
No ratings yet
Java Set-1 Answers
9 pages
46E9AF MS91LA Service Manual
No ratings yet
46E9AF MS91LA Service Manual
76 pages
Aidyn Blackburn - COMPARE - Types of Identity Theft
No ratings yet
Aidyn Blackburn - COMPARE - Types of Identity Theft
2 pages
Joule AI
No ratings yet
Joule AI
104 pages
Analyzing Reliability in The Data Center Outline
No ratings yet
Analyzing Reliability in The Data Center Outline
5 pages
SevOne NMS Installation Guide
No ratings yet
SevOne NMS Installation Guide
18 pages
MikroTik hAP Lite RB941 2nD User Guide
No ratings yet
MikroTik hAP Lite RB941 2nD User Guide
5 pages
Determination of Band Gap Energy of A PN Junction Diode
No ratings yet
Determination of Band Gap Energy of A PN Junction Diode
4 pages
Software-Defined Vanets: Benefits, Challenges, and Future Directions
No ratings yet
Software-Defined Vanets: Benefits, Challenges, and Future Directions
17 pages
Jain (2024-25) - AI - ML Batch
No ratings yet
Jain (2024-25) - AI - ML Batch
18 pages
Ex 280 V 5
100% (1)
Ex 280 V 5
7 pages
Ocl - The Object Constraint Language in UML
No ratings yet
Ocl - The Object Constraint Language in UML
48 pages
Restaurant Management System
No ratings yet
Restaurant Management System
39 pages
EurostarHS LD E 05 2015
100% (1)
EurostarHS LD E 05 2015
7 pages
Theodore Sits Alone Talking To His Laptop Apparently On Facetime With Someone. Catherine Approaches and Theodore Closes His Laptop and Greets Her
No ratings yet
Theodore Sits Alone Talking To His Laptop Apparently On Facetime With Someone. Catherine Approaches and Theodore Closes His Laptop and Greets Her
4 pages
BCM Programming
No ratings yet
BCM Programming
1 page

Deep Learning: Yann Lecun

Uploaded by

Deep Learning: Yann Lecun

Uploaded by

Y LeCun

The traditional model of pattern recognition (since the late 50's)

hand-crafted Simple Trainable

End-to-end learning / Feature learning / Deep learning

The first learning machine: the Perceptron

or glorified template matching. N

Modern architecture for pattern recognition

MFCC Mix of Gaussians Classifier

fixed unsupervised supervised

Low-Level Mid-Level High-Level Trainable

Hierarchy of representations with increasing level of abstraction

How do we learn representations of the perceptual

ML/AI: how do we learn features or feature hierarchies?

Deep Learning addresses the problem of learning

[picture from Simon Thorpe]

It's nice imitate Nature,

A hierarchy of trainable feature transforms

Trainable Trainable Trainable

Learned Internal Representations

Feed-Forward: multilayer neural nets, convolutional nets

Feed-Back: Stacked Sparse Coding, Deconvolutional Nets

Bi-Drectional: Deep Boltzmann Machines, Stacked Auto-Encoders

Theoretician's dilemma: We can approximate any function as close as we

kernel machines (and 2-layer neural nets) are universal.

Deep machines are more efficient for representing certain classes of

2-layer models are not deep (even if

There is no opposition between graphical models and deep learning.

Each energy function can contain a deep network

Deep Learning involves non-convex loss functions

But to some of us all interesting learning is non convex

It's hard to prove anything about deep learning systems

Deep Learning is about representing high-dimensional data

Speech Recognition I (late 1980s)

Boosting Neural Net

SVM RBM DBN DBM

Boosting Neural Networks Neural Net

SVM RBM DBN DBM

Boosting Neural Networks Neural Net

SVM RBM DBN DBM

Boosting Neural Net

In this talk, we'll focus on the

Learning Representations of Data:

Example: all face images of a person

The Ideal Disentangling Feature Extractor

Azimuth-Elevation manifold. Ignores lighting. [Hadsell et al. CVPR 2006]

Embed the input non-linearly into a high(er) dimensional space

Entangled data manifolds

Use clustering to break things apart, pool together similar things

Filter Non- feature Filter Non- feature

Stacking multiple stages of

Torch7: learning library that supports neural net training

Python-based learning library (U. Montreal)

- Krizhevsky, Sutskever, Hinton ImageNet Classification with deep convolutional neural

- Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional

see yann.lecun.com/exdb/publis for references on many different kinds of convnets.

see http://www.cmap.polytechnique.fr/scattering/ for scattering networks (similar to

Applications of Convolutional Nets

- D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber. Deep Neural Networks

Deep Learning & Energy-Based Models

LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in

M. Ranzato Ph.D. Thesis Unsupervised Learning of Feature Hierarchies NYU 2009

Y. Bengio, Practical recommendations for gradient-based training of deep

You might also like