[go: up one dir, main page]

0% found this document useful (0 votes)
27 views43 pages

01 Introduction Overview

The document outlines the course CE-717: Machine Learning at Sharif University of Technology, taught by Mahdieh Soleymani in Fall 2016. It includes course information, textbooks, marking schemes, and an overview of machine learning concepts, including supervised, unsupervised, and reinforcement learning. The course will focus on supervised learning, particularly regression and classification, while also covering learning theory and advanced topics.

Uploaded by

ghomashchihamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views43 pages

01 Introduction Overview

The document outlines the course CE-717: Machine Learning at Sharif University of Technology, taught by Mahdieh Soleymani in Fall 2016. It includes course information, textbooks, marking schemes, and an overview of machine learning concepts, including supervised, unsupervised, and reinforcement learning. The course will focus on supervised learning, particularly regression and classification, while also covering learning theory and advanced topics.

Uploaded by

ghomashchihamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Course Overview and Introduction

CE-717 : Machine Learning


Sharif University of Technology

M. Soleymani
Fall 2016
Course Info
 Instructor: Mahdieh Soleymani
 Email: soleymani@sharif.edu

 Lectures: Sun-Tue (13:30-15:00)

 Website: http://ce.sharif.edu/cources/95-96/1/ce717-2

2
Text Books
 Pattern Recognition and Machine Learning, C. Bishop, Springer, 2006.
 Machine Learning,T. Mitchell, MIT Press,1998.
 Additional readings: will be made available when appropriate.

 Other books:
 The elements of statistical learning, T. Hastie, R. Tibshirani, J. Friedman,
Second Edition, 2008.
 Machine Learning: A Probabilistic Perspective, K. Murphy, MIT Press,
2012.

3
Marking Scheme
 Midterm Exam: 25%
 Final Exam: 30%
 Project: 5-10%
 Homeworks (written & programming) : 20-25%
 Mini-exams: 15%

4
Machine Learning (ML) and Artificial Intelligence (AI)
 ML appears first as a branch of AI

 ML is now also a preferred approach to other subareas of


AI
 Computer Vision, Speech Recognition, …
 Robotics
 Natural Language Processing

 ML is a strong driver in Computer Vision and NLP

5
A Definition of ML

 Tom Mitchell (1998):Well-posed learning problem


 “A computer program is said to learn from experience E
with respect to some task T and some performance
measure P, if its performance on T, as measured by P,
improves with experience E”.

 Using the observed data to make better decisions


 Generalizing from the observed data

6
ML Definition: Example
 Consider an email program that learns how to filter spam
according to emails you do or do not mark as spam.

 T: Classifying emails as spam or not spam.


 E: Watching you label emails as spam or not spam.
 P: The number (or fraction) of emails correctly classified as
spam/not spam.

7
The essence of machine learning

 A pattern exist

 We do not know it mathematically

 We have data on it

8
Example: Home Price
 Housing price prediction

400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500
Size in feet2

Figure adopted from slides of Andrew Ng,


Machine Learning course, Stanford.
9
Example: Bank loan
 Applicant form as the input:

 Output: approving or denying the request


10
Components of (Supervised) Learning
 Unknown target function: 𝑓: 𝒳 → 𝒴
 Input space: 𝒳
 Output space: 𝒴

 Training data: 𝒙1 , 𝑦1 , 𝒙2 , 𝑦2 , … , (𝒙𝑁 , 𝑦𝑁 )

 Pick a formula 𝑔: 𝒳 → 𝒴 that approximates the target


function 𝑓
 selected from a set of hypotheses ℋ

11
Training data: Example
Training data
x2
𝑥1 𝑥2 𝑦
0.9 2.3 1
3.5 2.6 1
2.6 3.3 1
2.7 4.1 1
1.8 3.9 1
6.5 6.8 -1
7.2 7.5 -1
7.9 8.3 -1
6.9 8.3 -1
8.8 7.9 -1
9.1 6.2 -1
x1
12
Components of (Supervised) Learning

Learning
model

13
Solution Components
 Learning model composed of:
 Learning algorithm
 Hypothesis set

 Perceptron example

14
Perceptron classifier
 Input 𝒙 = 𝑥1 , … , 𝑥𝑑 x2

 Classifier:
 If 𝑑𝑖=1 𝑤𝑖 𝑥𝑖 > threshold then output 1
 else output −1

 The linear formula 𝑔 ∈ ℋ can be written: x1


𝑑
𝑔 𝒙 = sign 𝑤𝑖 𝑥𝑖 − threshold
+ 𝑤0
𝑖=1

If we add a coordinate 𝑥0 = 1 to the input:


𝑑
Vector form
𝑔 𝒙 = sign 𝑤𝑖 𝑥𝑖
𝑖=0
𝑔 𝒙 = sign 𝒘𝑇 𝒙
15
Perceptron learning algorithm:
linearly separable data
1 1 𝑁
 Give the training data 𝒙 ,𝑦 , … , (𝒙 , 𝑦 (𝑁) )

𝑛 𝑛
 Misclassified data 𝒙 ,𝑦 :
sign(𝒘𝑇 𝒙 𝑛 ) ≠ 𝑦 (𝑛)

Repeat
Pick a misclassified data 𝒙 𝑛 , 𝑦 𝑛 from training data and
update 𝒘:
𝒘 = 𝒘 + 𝑦 (𝑛) 𝒙(𝑛)
Until all training data points are correctly classified by 𝑔

16
Perceptron learning algorithm:
Example of weight update

x2 x2

17 x1 x1
Experience (E) in ML
 Basic premise of learning:
 “Using a set of observations to uncover an underlying
process”

 We have different types of (getting) observations in


different types or paradigms of ML methods

18
Paradigms of ML
 Supervised learning (regression, classification)
 predicting a target variable for which we get to see examples.
 Unsupervised learning
 revealing structure in the observed data
 Reinforcement learning
 partial (indirect) feedback, no explicit guidance
 Given rewards for a sequence of moves to learn a policy and
utility functions

 Other paradigms: semi-supervised learning, active learning,


online learning, etc.

19
Supervised Learning:
Regression vs. Classification
 Supervised Learning
 Regression: predict a continuous target variable
 E.g., 𝑦 ∈ [0,1]

 Classification: predict a discrete target variable


 E.g.,𝑦 ∈ {1,2, … , 𝐶}

20
Data in Supervised Learning
 Data are usually considered as vectors in a 𝑑 dimensional
space
 Now, we make this assumption for illustrative purpose
 We will see it is not necessary

𝑥1 𝑥2 ... 𝑥𝑑 𝑦
(Target)

Columns: Sample1
Features/attributes/dimensions
Sample
Rows: 2
Data/points/instances/examples/samples …

Y column: Sample
Target/outcome/response/label n-1

Sample
21 n
Regression: Example
 Housing price prediction

400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500
Size in feet2

Figure adopted from slides of Andrew Ng


22
Classification: Example
 Weight (Cat, Dog)

1(Dog)

0(Cat)
weight

weight

23
Supervised Learning vs. Unsupervised
Learning
 Supervised learning
 Given:Training set
𝑁
 labeled set of 𝑁 input-output pairs 𝐷 = 𝒙 𝑖 ,𝑦 𝑖
𝑖=1
 Goal: learning a mapping from 𝒙 to 𝑦

 Unsupervised learning
 Given:Training set
𝑖 𝑁
 𝒙 𝑖=1
 Goal: find groups or structures in the data
 Discover the intrinsic structure in the data

24
Supervised Learning: Samples

x2

Classification

x1
25
Unsupervised Learning: Samples

x2 Type II
Type I

Clustering

Type III

x1
26
Sample Data in Unsupervised Learning
 Unsupervised Learning:

𝑥1 𝑥2 ... 𝑥𝑑
Sample1

Columns:
Sample
Features/attributes/dimensions 2

Rows: …
Data/points/instances/examples/s
Sample
amples n-1

Sample
n

27
Unsupervised Learning: Example
Applications
 Clustering docs based on their similarities
 Grouping new stories in the Google news site

 Market segmentation: group customers into different


market segments given a database of customer data.

 Social network analysis

28
Reinforcement
 Provides only an indication as to whether an action is
correct or not

Data in supervised learning:


(input, correct output)
Data in Reinforcement Learning:
(input, some output, a grade of reward for this output)

29
Reinforcement Learning
 Typically, we need to get a sequence of decisions
 it is usually assumed that reward signals refer to the entire sequence

30
Is learning feasible?
 Learning an unknown function is impossible.
 The function can assume any value outside the data we have.

 However, it is feasible in a probabilistic sense.

31
Example

32
Generalization

 We don’t intend to memorize data but need to figure out


the pattern.

 A core objective of learning is to generalize from the


experience.
 Generalization: ability of a learning algorithm to perform
accurately on new, unseen examples after having experienced.

33
Components of (Supervised) Learning

Learning
model

34
Main Steps of Learning Tasks
 Selection of hypothesis set (or model specification)
 Which class of models (mappings) should we use for our data?

 Learning: find mapping 𝑓 (from hypothesis set) based on the


training data
 Which notion of error should we use? (loss functions)
 Optimization of loss function to find mapping 𝑓

 Evaluation: how well 𝑓 generalizes to yet unseen examples


 How do we ensure that the error on future data is minimized?
(generalization)

35
Some Learning Applications
 Face, speech, handwritten character recognition
 Document classification and ranking in web search
engines
 Photo tagging
 Self-customizing programs (recommender systems)
 Database mining (e.g., medical records)
 Market prediction (e.g., stock/house prices)
 Computational biology (e.g., annotation of biological
sequences)
 Autonomous vehicles

36
ML in Computer Science
 Why ML applications are growing?
 Improved machine learning algorithms
 Availability of data (Increased data capture, networking, etc)
 Demand for self-customization to user or environment
 Software too complex to write by hand

37
Handwritten Digit Recognition Example
 Data: labeled samples
0
1
2
3
4
5
6
7
8
9

38
Example: Input representation

39
Example: Illustration of features

40
Example: Classification boundary

41
Main Topics of the Course
 Supervised learning
 Regression Most of the lectures
 Classification (our main focus) are on this topic

 Learning theory
 Unsupervised learning
 Reinforcement learning
 Some advanced topics & applications

42
Resource

 Yaser S. Abu-Mostafa, Malik Maghdon-Ismail, and Hsuan


Tien Lin,“Learning from Data”, 2012.

43

You might also like