Selected Topics in CS
Chapter 10 Introduction to Machine
Learning
Introduction
Machine learning is a subfield of computer science that
evolved from the study of pattern recognition and
computational learning theory in artificial intelligence.
Machine learning explores the construction and study of
algorithms that can learn from and make predictions on
data.
Such algorithms operate by building a model from
example inputs in order to make data-driven
predictions or decisions, rather than following strictly
static program instructions.
Machine learning is closely related to and often overlaps
with computational statistics; a discipline that also
specializes in prediction-making. It has strong ties to
mathematical optimization, which deliver methods,
theory and application domains to the field.
Applications
Machine learning is employed in a range of
computing tasks where designing and programming
explicit algorithms is infeasible.
Example applications include spam filtering, optical
character recognition (OCR), search engines and
computer vision.
Machine learning is sometimes conflated with data
mining, although that focuses more on exploratory
data analysis. Machine learning and pattern
recognition “can be viewed as two facets of the
same field
Learning
Tom M. Mitchell provided a widely
quoted, more formal definition: “A
computer program is said to learn from
experience E (dataset) with respect to
some class of tasks T (classification,
clustering, etc) and performance
measure P (gauges accuracy), if its
performance at tasks in T, as measured
by P, improves with experience E”.
ML Tasks
Machine learning tasks are typically classified into three broad
categories, depending on the nature of the learning “signal” or
“feedback” available to a learning system. These are:
Supervised learning: The computer is presented with example
inputs and their desired outputs, given by a “teacher”, and the
goal is to learn a general rule that maps inputs to outputs.
Unsupervised learning: No labels are given to the learning
algorithm, leaving it on its own to find structure in its input.
Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end.
Reinforcement learning: A computer program interacts with a
dynamic environment in which it must perform a certain goal (such
as driving a vehicle), without a teacher explicitly telling it whether
it has come close to its goal or not. Another example is learning to
play a game by playing against an opponent. Penalty/Reward
exists
ML Tasks
Another categorization of machine learning tasks
arises when one considers the desired output of a
machine learned system:
In classification, inputs are divided into two or
more classes, and the learner must produce a
model that assigns unseen inputs to one (or multi-
label classification) or more of these classes. This is
typically tackled in a supervised way. Spam filtering
is an example of classification, where the inputs are
email (or other) messages and the classes are
“spam” and “not spam”.
In regression, also a supervised problem, the
outputs are continuous rather than discrete.
ML Tasks
In clustering, a set of inputs is to be divided into
groups. Unlike in classification, the groups are not
known beforehand, making this typically an
unsupervised task.
Density estimation finds the distribution of inputs
in some space.
Dimensionality reduction simplifies inputs by
mapping them into a lower-dimensional space. Topic
modeling is a related problem, where a program is
given a list of human language documents and is
tasked to find out which documents cover similar
topics.
ML vs DM
Machine learning and data mining often employ the
same methods and overlap significantly. They can
be roughly distinguished as follows:
Machine learning focuses on prediction, based on
known properties learned from the training data.
Data mining focuses on the discovery of
(previously) unknown properties in the data. This is
the analysis step of Knowledge Discovery in
Databases.
Computational Science
Computational science (also scientific
computing or scientific computation) is
concerned with constructing mathematical models
and quantitative analysis techniques and using
computers to analyze and solve scientific problems.
In practical use, it is typically the application of
computer simulation and other forms of
computation from numerical analysis and
theoretical computer science to problems in various
scientific disciplines.
Learning Systems / Algorithms
Support Vector Machines
Logistic Regression
Kernel Methods
Bayesian Networks
Decision Trees
Gradient Descent
Newton Method
Naïve Bayes
Deep Learning Models
Multilayer Perceptron
Convolution Neural Networks
Recurrent Neural Networks
Restricted Boltzmann Machines
Long Short Term Memory