[go: up one dir, main page]

0% found this document useful (0 votes)
70 views28 pages

Lecture 06 - Perceptron

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views28 pages

Lecture 06 - Perceptron

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Vietnam National University of HCMC

International University
School of Computer Science and Engineering

INTRODUCTION TO ARTIFICIAL INTELLIGENCE


(IT097IU)

LECTURE 06: SUPERVISED LEARNING –


NEURAL NETWORKS

Instructor: Nguyen Trung Ky


Machine Learning
 Up until now: model-based classification with Naive Bayes

 Machine learning: how to acquire a model from data / experience


 Learning parameters (e.g. probabilities)
 Learning structure (e.g. neural networks)
 Learning hidden concepts (e.g. clustering)
Perceptrons
Error-Driven Classification
Errors, and What to Do

 Examples of errors
Dear GlobalSCAPE Customer,
GlobalSCAPE has partnered with ScanSoft to offer you the
latest version of OmniPage Pro, for just $99.99* - the regular
list price is $499! The most common question we've received
about this offer is - Is this genuine? We would like to assure
you that this offer is authorized by ScanSoft, is genuine and
valid. You can get the . . .

. . . To receive your $30 Amazon.com promotional certificate,


click through to
http://www.amazon.com/apparel
and see the prominent link for the $30 offer. All details are
there. We hope you enjoyed receiving this message. However, if
you'd rather not receive future e-mails announcing new store
launches, please click . . .
What to Do About Errors
 Problem: there’s still spam in your inbox

 Need more features – words aren’t enough!


 Have you emailed the sender before?
 Have 1M other people just gotten the same email?
 Is the sending information consistent?
 Is the email in ALL CAPS?
 Do inline URLs point where they say they point?
 Does the email address you by (your) name?

 Naïve Bayes models can incorporate a variety of features, but tend to do


best in homogeneous cases (e.g. all features are word occurrences)
Linear Classifiers
Feature Vectors

Hello, # free : 2
YOUR_NAME : 0 SPAM
Do you want free printr MISSPELLED : 2
cartriges? Why pay more FROM_FRIEND : 0 or
when you can get them ...
ABSOLUTELY FREE! Just +

PIXEL-7,12 : 1
PIXEL-7,13
...
: 0
“2”
NUM_LOOPS : 1
...
Some (Simplified) Biology
 Very loose inspiration: human neurons
Linear Classifiers

 Inputs are feature values


 Each feature has a weight
 Sum is the activation

 If the activation is: f1


w1

 Positive, output +1

w2
f2 >0?
w3
 Negative, output -1 f3
Weights
 Binary case: compare features to a weight vector
 Learning: figure out the weight vector from examples

# free : 4
YOUR_NAME :-1
MISSPELLED : 1 # free : 2
FROM_FRIEND :-3 YOUR_NAME : 0
... MISSPELLED : 2
FROM_FRIEND : 0
...

# free : 0
YOUR_NAME : 1
MISSPELLED : 1
Dot product positive FROM_FRIEND : 1
means the positive class ...
Decision Rules
Binary Decision Rule
 In the space of feature vectors
 Examples are points
 Any weight vector is a hyper-plane
 One side corresponds to Y=+1
 Other corresponds to Y=-1

money
2

+1 = SPAM

1
BIAS : -3
free : 4
money : 2
... 0
-1 = HAM 0 1 free
Weight Updates
Learning: Binary Perceptron
 Start with weights = 0
 For each training instance:
 Classify with current weights

 If correct (i.e., y=y*), no change!

 If wrong: adjust the weight vector


Learning: Binary Perceptron
 Start with weights = 0
 For each training instance:
 Classify with current weights

 If correct (i.e., y=y*), no change!


 If wrong: adjust the weight vector by
adding or subtracting the feature
vector. Subtract if y* is -1.
Examples: Perceptron
 Separable Case
Multiclass Decision Rule

 If we have multiple classes:


 A weight vector for each class:

 Score (activation) of a class y:

 Prediction highest score wins

Binary = multiclass where the negative class has weight zero


Learning: Multiclass Perceptron

 Start with all weights = 0


 Pick up training examples one by one
 Predict with current weights

 If correct, no change!
 If wrong: lower score of wrong answer,
raise score of right answer
Example: Multiclass Perceptron

“win the vote”


“win the election”
“win the game”

BIAS : 1 BIAS : 0 BIAS : 0


win : 0 win : 0 win : 0
game : 0 game : 0 game : 0
vote : 0 vote : 0 vote : 0
the : 0 the : 0 the : 0
... ... ...
Properties of Perceptrons
Separable
 Separability: true if some parameters get the training set
perfectly correct

 Convergence: if the training is separable, perceptron will


eventually converge (binary case)

 Mistake Bound: the maximum number of mistakes (binary Non-Separable


case) related to the margin or degree of separability
Examples: Perceptron
 Non-Separable Case
Improving the Perceptron
Problems with the Perceptron

 Noise: if the data isn’t separable,


weights might thrash
 Averaging weight vectors over time
can help (averaged perceptron)

 Trivial generalization: finds a


“barely” separating solution

 Over-training: test / validation


accuracy usually rises, then falls
 Overtraining is a kind of over-fitting
validation
Fixing the Perceptron
 Idea: adjust the weight update to mitigate these effects

 MIRA*: choose an update size that fixes the current


mistake…
 … but, minimizes the change to w

 The +1 helps to generalize

* Margin Infused Relaxed Algorithm


Minimum Correcting Update

min not =0, or would not have


made an error, so min will be
where equality holds
Maximum Step Size
 In practice, it’s also bad to make updates that are too large
 Example may be labeled incorrectly
 You may not have enough features
 Solution: cap the maximum possible value of  with some
constant C

 Corresponds to an optimization that assumes non-separable data


 Usually converges faster than perceptron
 Usually better, especially on noisy data
Linear Separators

 Which of these linear separators is optimal?

You might also like