0% found this document useful (0 votes)

36 views14 pages

2004 Vijayakumar

This document provides an overview of using machine learning techniques for control. It discusses how learning internal models or control policies is a function approximation problem that can be addressed through supervised learning, adaptive control, dynamic programming, or reinforcement learning. The key topics covered include function approximation, types of internal models, learning as a regression problem, issues in machine learning design like choosing data and models, cost functions, the empirical vs generalization error, overfitting, and the bias-variance dilemma.

Uploaded by

Yang Yi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views14 pages

2004 Vijayakumar

Uploaded by

Yang Yi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Lecture IV: Learning for Control

- Inference with Data

Overview
Internal models and function approximation
Cost Function & Optimization
Generalization, Overfitting & Bias-Variance Dilemma

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Function Approximation for control

= f ( , , )

Learning Internal Models or

Control Policies is essentially
performing function
approximation
Supervised Learning
Adaptive Control
Dynamic Programming
Reinforcement Learning

sensory feedback

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Types of internal models

Learn these models from
data or observations of
input-output pairs

[Figure reproduced from Wolpert & Ghahramani, Nature Neuroscience(2000)]

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Learning as a function approx. problem

y2
y1

Learning

Data
{x1, y1}

Regression

{ x2 , y 2 }

Input

M
{ xn , y n }

Output

Function
Approx.

Lecture IV: MLSC - Dr. Sethu Vijayakumar

f : RN RM

Data and Inference

Training Data :

D = {X, t} = {x i , ti }iN=1

The outputs ti (targets) can be :

true/false (Concept Learning),
class labels (Classification) or
real numbers (Regression).

Data Generating Process : yi = f (x i , z i ) where z i is the hidden variable.

Variables which cannot
be directly measured

Observed data contaminated by noise: ti = yi +

where is the noise
Modeling the data:

y i = g (x i | )
where = parameters.

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Machine Learning Design Issues

Choosing the Training Data (Active learning)

- Decide on type of data (reward, class or real values)
- Sampling
To get a representative distribution
To select informative data

Model Selection or Target Representation

How to choose the right model g (x | ) ?

Measure of Distance (Error/Loss function)-> d(.)

L( | D) = i d ( yi , y i ) = i d ( yi , g (x i | ))

Optimization Procedure

* = arg min L( | D)

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Cost/Loss Functions(I)
For Classification y = 1, prediction = f ( x), Class prediction = sgn( f ( x))

Misclassification :
Exponential :
Binomial Deviance:
Squared Error :

Support Vector :

I (sgn( f ) y )
exp( yf )

log(1 + exp(2 yf ))
( y f ( x)) 2
(1 yf ) I ( yf > 1)

Here, I ( x) = 1 if x = TRUE
= 0 otherwise.
Exponential error loss concentrates much more on points
with large negative margins while Binomial deviance
spreads the influence over all data. Hence, Binomial
deviance is more robust in noise prone situations.
Lecture IV: MLSC - Dr. Sethu Vijayakumar

Cost/Loss Functions(II)
For Regression

2
Squared Error Loss : [ y f ( x)]
Absolute Error Loss : | y f ( x) |
Huber Loss :
[ y f ( x)] 2
for | y f ( x) |
2 (| y f ( x) | / 2), otherwise

Cost Functions

Huber Loss combines the

good properties of squared
error loss near zero and
absolute error loss when
the error is large.

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Empirical Error vs. Generalization Error

Generalization
The ability of a learning system to not only memorize training data but also to predict
reasonably well for novel inputs based on the training examples.

1
f ( x) f ( x, ) dx
2
1 N
( x , ) = 1
(
)
Lemp ( ) =
f
x

f
i
i
2 N i =1
2N
L( ) =

Usually, we only have access to the

empirical (training) error since we do not
know the true generating function.

: True Error/Generalization Error

y
i =1

y i : Empirical (training) Error

However, performing optimization only

based on empirical error does not ensure
good generalization we will see an
example of overfitting soon !!
Lecture IV: MLSC - Dr. Sethu Vijayakumar

x3
9

Overfitting
Overfitting
The tendency of the learning system (typically with too many open parameters) to
concentrate on the idiosyncrasies of the training data and noise rather than capturing
the essential features of the data generating mechanism.

Example from regression: Polynomial fitting

y = f (x, w ) = w0 + w1 x + w2 x 2 + + wn x n = wT x
How many inputs (i.e., degree of
polynomial) should be used to fit the data?

1
x
where x =

n
x
1

Lecture IV: MLSC - Dr. Sethu Vijayakumar

xn
10

Overfitting (contd)
A popular error criterion is the Mean Squared Error Criterion
1 N
2
J = ( yi y i )
N i =1
or the Normalized Mean-Squared Error
z

the normalized MSE is a measure of how much variance in the output

data was explained
2.5

1 N
2

(
)
J=
y

y
i i
N y2 i=1

1.5

0.5

Example target function :

y = x + 2 exp(16x 2 )

-0.5

-1

-1.5

-2

-1

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Overfitting with Polynomials

2.5

0.9

0.8

MSE on test set

1.5

0.7
1

MSE

0.6
0.5

0.5
0.4

-0.5

0.3

-1

0.2

-1.5

MSE on training set

0.1

-2
-2

-1.5

-1

-0.5

0.5

1.5

5
10
degree of polynomial

Observation: Just concentrating on reducing training error results in worse

generalization with novel datadue to overfitting.

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Bias-Variance Dilemma
Too few features are bad, too many are bad, thus, there should be an
optimum
1 N
1 N
2
A closer look at the MSE criterion J = (t i y i ) = t i f ( x i )
N

i =1

What we actually want to minimize is the generalization/true error but

we do not have the original function and hence, all we have access to is
the training /empirical error !!
The next best thing to do: minimize J in expectation, i.e., over
infinitely many data sets ->

min (E { J })
Lecture IV: MLSC - Dr. Sethu Vijayakumar

Bias-Variance Dilemma (II)

1
E{ J } = E
N

(t
i =1

y i )

1
=
N

E {(t
N

i =1

y i )

1
=
N

E {J }
i

i =1

Bias Variance Decomposition of Expected Error

2
2
E {J i } = 2 + (E {y i } f ( x i ) ) + E ( y i E {y i })
= var( noise ) + bias 2 + var( estimate )

Note: For derivation of the decomposition, refer to class handout.

Adobe Acrobat
Document

Usually, if we try to reduce bias in a model, it increases the variance and viceversa, resulting in the dilemma for optimal choice.

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Practical Design of Experiments: DoE Made Easy
From Everand
Practical Design of Experiments: DoE Made Easy
Colin Hardwick
4.5/5 (7)
MACT - 2222 - Sample Exam - Final
No ratings yet
MACT - 2222 - Sample Exam - Final
8 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Excellent 05 - Overfitting
No ratings yet
Excellent 05 - Overfitting
22 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Lecture2 PDF
No ratings yet
Lecture2 PDF
111 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Regression
No ratings yet
Regression
39 pages
ML 01
No ratings yet
ML 01
24 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
AIML - Unit 4 Notes
No ratings yet
AIML - Unit 4 Notes
23 pages
EE2211 Lecture 7
No ratings yet
EE2211 Lecture 7
43 pages
Class 02
No ratings yet
Class 02
42 pages
Linear Regression
No ratings yet
Linear Regression
37 pages
Machine Learning Models
No ratings yet
Machine Learning Models
52 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Notes 1
No ratings yet
Notes 1
3 pages
Chapter 08
100% (2)
Chapter 08
202 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
Learning Theory
No ratings yet
Learning Theory
19 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
Week 2
No ratings yet
Week 2
43 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
No ratings yet
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
6 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
4.machine Learning Basics (C)
No ratings yet
4.machine Learning Basics (C)
9 pages
Unit - III
No ratings yet
Unit - III
44 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
GML Slides 2024 04 29
No ratings yet
GML Slides 2024 04 29
206 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
Statistical Learning: First Steps: Sasha Rakhlin
No ratings yet
Statistical Learning: First Steps: Sasha Rakhlin
26 pages
Exam Spring 10
No ratings yet
Exam Spring 10
10 pages
DSA Module 3 Notes
No ratings yet
DSA Module 3 Notes
22 pages
Lec19 Introduction2LinearRegression
No ratings yet
Lec19 Introduction2LinearRegression
53 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
NO LINEALs
No ratings yet
NO LINEALs
61 pages
LecML - 2 Linear Regression
No ratings yet
LecML - 2 Linear Regression
33 pages
Class 01
No ratings yet
Class 01
75 pages
Unit 2
No ratings yet
Unit 2
37 pages
Regression and Generalization
No ratings yet
Regression and Generalization
67 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
No ratings yet
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
61 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
DSA Module 3
No ratings yet
DSA Module 3
30 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Documentation For Fancybox - Sty: Box Tips and Tricks For L TEX
No ratings yet
Documentation For Fancybox - Sty: Box Tips and Tricks For L TEX
30 pages
Arrays Intersection: Programming Fortran
No ratings yet
Arrays Intersection: Programming Fortran
16 pages
Subroutines and Functions
No ratings yet
Subroutines and Functions
11 pages
Intrinsic Numeric Operations: A - B A - B BAD
No ratings yet
Intrinsic Numeric Operations: A - B A - B BAD
28 pages
Input/Output
No ratings yet
Input/Output
19 pages
Quantum State Tomography Via Linear Regression Estimation: Scientific Reports
No ratings yet
Quantum State Tomography Via Linear Regression Estimation: Scientific Reports
6 pages
Parallel Programming
No ratings yet
Parallel Programming
17 pages
Parallel I/O
No ratings yet
Parallel I/O
8 pages
STAT 3022 Data Analysis Class Slides 1
No ratings yet
STAT 3022 Data Analysis Class Slides 1
16 pages
Data Analytics
100% (2)
Data Analytics
99 pages
Jpmanajemengg 160004
No ratings yet
Jpmanajemengg 160004
11 pages
Business Statistics in Practices Chap - 09
No ratings yet
Business Statistics in Practices Chap - 09
39 pages
Topic 01 - Descriptive Statistics PDF
No ratings yet
Topic 01 - Descriptive Statistics PDF
27 pages
Naive Bayes Classification Numerical Example - Coding Infinite
No ratings yet
Naive Bayes Classification Numerical Example - Coding Infinite
14 pages
Project Report
No ratings yet
Project Report
3 pages
Descriptive Statistics: Frequency Distributions and Related Statistics
No ratings yet
Descriptive Statistics: Frequency Distributions and Related Statistics
47 pages
FREE AI Code Generator - Generate Code Online in Any Language
No ratings yet
FREE AI Code Generator - Generate Code Online in Any Language
12 pages
Unit-26 - Canonical - Correlation-Cropped (2 Files Merged)
No ratings yet
Unit-26 - Canonical - Correlation-Cropped (2 Files Merged)
11 pages
Correlation Paper 1
No ratings yet
Correlation Paper 1
5 pages
6.3 Linear Regression
No ratings yet
6.3 Linear Regression
4 pages
Week 4 Detailed Answer Live Stats
No ratings yet
Week 4 Detailed Answer Live Stats
20 pages
Edexcel: Statistics 2: Yan Jiaqi
No ratings yet
Edexcel: Statistics 2: Yan Jiaqi
38 pages
Exp 9-10
No ratings yet
Exp 9-10
6 pages
Anderson Darling Test
100% (1)
Anderson Darling Test
45 pages
Reading 6 Simulation Methods
No ratings yet
Reading 6 Simulation Methods
4 pages
WS4 T-Test of Two Independent Samples
No ratings yet
WS4 T-Test of Two Independent Samples
3 pages
03 ISOD Weight Estimation (171231) Student
No ratings yet
03 ISOD Weight Estimation (171231) Student
14 pages
EDA Summary Report
No ratings yet
EDA Summary Report
2 pages
Advertising Adstock Transformation
No ratings yet
Advertising Adstock Transformation
8 pages
LECTURE 1 STAT 401-Non-Parametric ABU ZARIA
No ratings yet
LECTURE 1 STAT 401-Non-Parametric ABU ZARIA
10 pages
Stats Medic Ultimate Interpretations Practice Answer Key
No ratings yet
Stats Medic Ultimate Interpretations Practice Answer Key
2 pages
Assignment 5 MTH5411
0% (1)
Assignment 5 MTH5411
2 pages
Input Analysis
No ratings yet
Input Analysis
14 pages
Answers (Chapter 15)
No ratings yet
Answers (Chapter 15)
8 pages
Measure of Dispersion
No ratings yet
Measure of Dispersion
35 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Capstone Project Guidelines
No ratings yet
Capstone Project Guidelines
2 pages
Determining Sample Size
100% (5)
Determining Sample Size
28 pages

2004 Vijayakumar

Uploaded by

2004 Vijayakumar

Uploaded by

Lecture IV: Learning for Control

- Inference with Data

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Function Approximation for control

Learning Internal Models or

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Types of internal models

[Figure reproduced from Wolpert & Ghahramani, Nature Neuroscience(2000)]

Learning as a function approx. problem

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Data and Inference

The outputs ti (targets) can be :

Data Generating Process : yi = f (x i , z i ) where z i is the hidden variable.

Observed data contaminated by noise: ti = yi +

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Machine Learning Design Issues

Choosing the Training Data (Active learning)

Model Selection or Target Representation

Measure of Distance (Error/Loss function)-> d(.)

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Huber Loss combines the

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Empirical Error vs. Generalization Error

Usually, we only have access to the

: True Error/Generalization Error

y i : Empirical (training) Error

However, performing optimization only

Example from regression: Polynomial fitting

Lecture IV: MLSC - Dr. Sethu Vijayakumar

the normalized MSE is a measure of how much variance in the output

Example target function :

Lecture IV: MLSC - Dr. Sethu Vijayakumar

Overfitting with Polynomials

MSE on test set

MSE on training set

Observation: Just concentrating on reducing training error results in worse

Lecture IV: MLSC - Dr. Sethu Vijayakumar

What we actually want to minimize is the generalization/true error but

Bias-Variance Dilemma (II)

Bias Variance Decomposition of Expected Error

Note: For derivation of the decomposition, refer to class handout.

Lecture IV: MLSC - Dr. Sethu Vijayakumar

You might also like