Machine Learning
Block 1- Unit 1
Introduction to Machine Learning
S.SAMPATH
October 8, 2023
Contents
1 Introduction 2
2 Artificial Intelligence 2
3 Machine Learning 3
4 Branches of Machine Learning 6
5 Conclusion 9
6 Test Yourself 9
1
1 Introduction
This unit is meant for explaining the motivation behind the development of
Machine Learning methodologies. It highlights the limitations of artificial in-
telligence thereby helping us to understand the need for machine learning tech-
niques.
2 Artificial Intelligence
• Artificial Intelligence Artificial intelligence is the capacity of a machine
to imitate intelligent human behaviour. A concise definition of the field
would be as follows: ”the effort to automate intellectual tasks nor-
mally performed by humans”
• Accomplishing AI AI is accomplished by studying how human brain
thinks, and how it learns, decides, and works while trying to solve a prob-
lem. Outcomes of this study are used as a basis of developing intelligent
software and systems.
• Major goal of AI The major goal is to develop a system that can imitate
a human behavior.
• Applications of AI Speech recognition, Understanding Natural Lan-
guage, Image Recognition, Self Driving Cars, Recommendations Systems
(You Tube recommendations)
2
3 Machine Learning
Symbolic AI proves suitable for solving well-defined logical problems, such as
playing chess. It turns out to be intractable to figure out explicit rules for
solving more complex fuzzy problems. Examples of such problems are Image
Classification, Speech Recognition and Language Translation. A new approach
has been developed to take symbolic AI’s place namely Machine Learning.
Machine Learning can be treated as a subset of AI.
Machine Learning Machine Learning is a type of artificial intelligence that
provides computers with the ability to learn without being explicitly programmed.
For example, consider the iris data set consisting of three different classes
(species). A classifier (an ML algorithm) learns from the data set and gets
the ability to identify the species with out any intervention. While developing
a machine we give as input the data, allow the machine to get trained and use
the trained machine for classification when a new input comes.
ML is a new approach meant to take symbolic AI’s place. While training
the machines, humans give as input the data as well as the expected answers.
Machines in turn produce certain rules which can be used when need arises.
That is, Machine learning discovers rules to execute a data-processing
task. These rules can then be applied to new data in order to get the answers.
So, to do machine learning, we need three things as shown in Figure 1 and
explained below.
3
Figure 1: Machine Learning Flow
• Input data points For instance, if the task is speech recognition, these
data points could be sound files of people speaking. If the task is image
tagging,they could be pictures.
• Examples of the expected output In a speech-recognition task, these
could be human-generated transcripts of sound files. In an image task,
expected outputs could be tags such as “dog,” “cat,” and so on.
• A way to measure whether the algorithm is doing a good job This
is necessary in order to determine the difference between the algorithm’s
current output and its expected output. The measurement is used as a
feedback signal to adjust the way in which the algorithm works. This
adjustment step is referred to as learning.
Different Types of Machine Learning Various types of machine learning
are
• Supervised Learning: SL is where we have the input variables (x) and
an output variable (y) and we use an algorithm to learn the mapping
function from the input to the output. Build the model, test the data and
4
validate.
• Unsupervised Learning : USL is training of the model using data that
is neither classified nor labeled. For example. clustering the input data on
the basis of statistical properties is an unsupervised learning methodology.
• Reinforcement Learning: RL is learning by interacting with space or
an environment. A RL agent learns from the consequences of its actions,
rather than from being taught explicitly. It selects its actions on the basis
of its past experiences (exploitation) and also by new choices (exploration).
Machine Learning vs Statistics Machine learning is tightly related to Mathe-
matical Statistics, but it differs from statistics in several important ways. Unlike
statistics, machine learning tends to deal with large, complex data sets (such as
a dataset of millions of images, each consisting of tens of thousands of pixels)
for which classical statistical analysis such as Bayesian analysis would be im-
practical. As a result, machine learning, and especially deep learning, exhibits
comparatively little mathematical theory—maybe too little—and is engineering
oriented. It’s a hands-on discipline in which ideas are proven empirically more
often than theoretically.
Limitations of ML ML is not useful while working with high dimensional
data. That is, where we have large number of inputs and outputs. It can not
solve important AI problems like NLP, Image Recognition etc. One of the big
challenges with ML is feature extraction. This poses a huge problem specifically
when we face object recognition or handwriting recognition etc.
5
4 Branches of Machine Learning
Three important branches of data Machine Learning are given below.
1. Clustering: Clustering is a process of identifying natural subgroups
present in a data set. A cluster is basically a group of data objects.
While performing clustering, we partition the given data set into groups
based on data similarity and then assign labels to the groups
2. Classification Classification is a process of assigning membership of an
object in a database with respect to one of the predefined classes. The
process of classifying the input instances based on their corresponding
class labels is known as classification.
3. Association Analysis Association analysis is a machine learning task
that discovers the co-occurrence of items in a collection. The relationships
between co-occurring items are expressed as association rules. Association
rules are often used to analyze sales transaction.
Applications of cluster analysis
The following are the applications of clustering
1. Marketing Cluster Analysis is primarily used in marketing for market
segmentation. With the help of customer database consisting of a pri-
mary information and their purchasing behavior, one can identify groups
of customers possessing similar characteristics. This information can be
used while launching a new product, so that failure to capture potential
customers could be minimized.
6
2. Image Processing Segmentation of images using clustering finds appli-
cations in various fields like medicine, remote sensing, etc. Segmentation
of a mammogram can be used for the purpose of identifying healthy and
cancerous regions.
3. Biology Genes play crusial role in the functioning of a human body.
Grouping of genes having similar functions finds applications in various
fields of health domain
4. Web Mining : Clustering can be used effectively in retrival of information
from data available in web. Clustering of documents into homogeneous
groups based on their contents facilitates faster information retrieval. For
example, when a query is placed to a search engine clustered groups of
documents helps in getting results quickly.
Applications of Classification Classifiers find applications in several do-
mains.
1. Medicine Classifers can be used in medical diagnostics process. With the
help of a data base containing data related to various clinical symptoms
and pathological results one can identify the nature of illness
2. Document Classification Based on the data related to occurrence of
keywords one can classify the class of domain to which a document belongs.
For example, the nature of documents can be sports, international politics,
religious etc
7
3. Bankruptcy Prediction A classifier can be designed to predict whether
a company become bankrupt based on the data related to some financial
ratios
4. Archaeology Based on the features of an item recovered from the exca-
vation of an archaeological exploration, one can assign the time period to
which the item belongs.
Applications of Association Anaylysis Association Analysis aims to dis-
covery of rules meant for analyzing the association between items. It is mainly
meant for dealing with transaction data sets. The rules obtained using as-
sociation analysis methodologies find applications in shop floor planning, text
analysis, medical diagnostics etc.
Clustering Vs Classification Clustering is a process of identifying the pres-
ence of natural subgroups in a data set whereas classification deals with the
assignment of an object to any one of the predefined classes. For example, in
banking industry clustering identifies various groups of customers, namely ”reg-
ular”, ”defaulters” and ”partially regular” and so on. With the help of historical
data a classifier identifies the group to which a new customer is likely to fit in. In
the same way, in a medical database, groups of patients who are homogeneous
with respect to the type of disease they have can be identified with the help of
their clinical and laboratory observations with the help of clustering. Using a
well trained data set one can easily identify the type of ailment a new patient
is likely to suffer with the help of his data. Satellite images can be studied
and regions shown in such images can be analyzed with the help of clustering
8
algorithms to identify the nature of regions like, forest area, hill terrain, water
bodies etc. Such information can be used to identify the nature of a region with
the help of classifiers.
5 Conclusion
Thus in this unit we have considered the following:
• Explain the need for machine learning methods
• Different components of ML are considered
• Different branches of ML are briefly introduced
• Applications of clustering and classification are listed
6 Test Yourself
Short Questions
1. What is meant by artificial intelligence?
2. What are the subsets of AI?
3. What is meant by supervised learning?
4. Mention the difference between statistics and machine learning.
5. What is meant by unsupervised learning? Give an example.
9
6. Mention any two limitations of machine learning
Long Questions
7. Explain the difference between machine learning and statistics with ex-
amples
8. Explain the need for machine learning with examples
9. Explain the limitations of AI and ML
10. Explain various approaches of machine learning with examples
11. Explain various components of ML solution
12. Compare classification and clustering with examples.
13. Give any four applications of cluster analysis.
14. Explain any four applications of classification with examples.
MCQ
1. Choose the correct statement
Different types of machine learning are
(1) supervised learning
(2) unsupervised learning
(3) reinforcement learning
(4) all the above are correct
2. Type of machine learning that expects both input and output variables is
(1) unsupervised learning
10
(2) supervised learning
(3) reinforcement learning
(4) all the above three
3. Choose the correct statement
(1) artificial intelligence is a branch of machine learning
(2) statistics is a branch of machine learning
(3) machine learning is a branch of statsitcs learning
(4) machine learning is a branch of statistics
4. To do machine learning we need
(1) input data points
(2) examples of the expected output
(3) measure for assessing the algorithm
(4) all the above three are correct
5. Choose the correct statement
(1) machine learning requires feature extraction
(2) machine learning does not require feature extraction
(3) feature extraction selects important data points
(4) first and third statements are correct
11